Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

Patent Number: 8,290,961
Issued: 10/16/2012
Official Filing: View the Complete Patent
Abstract: A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Filed: 1/13/2009
Application Number: 12/352,621
Government Interests: STATEMENT OF GOVERNMENT INTEREST This invention was made with Government support under Contract No. DE-NA0003525 awarded by the United States Department of Energy/National Nuclear Security Administration. The Government has certain rights in the invention.