LSI or LSA is an obvious method of retrieving relevant pages is by matching the terms of a search query with the same text found in all web pages.
- Documents are represented as "bags of words", where the order of the words in a document is not important, only how many times each word appears in a document.
- Concepts are represented as patterns of words that usually appear together in documents. For example "leash", "treat", and "obey" might usually appear in documents about dog training.
- Words are assumed to have only one meaning. This is clearly not the case (banks could be river banks or financial banks) but it makes the problem tractable.