Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

DWPI Title: Computer implemented information retrieving method, involves generating lower rank approximation matrix by factorizing weighted matrix, and retrieving information with reference to lower rank approximation matrix

Abstract: A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.

Use: Computer implemented method for retrieving information.

Advantage: The method enables efficiently improving information retrieval performance.

Novelty: The method involves parsing a corpus to identify a number of wordform instances within documents of the corpus. A morpheme-by-document matrix is generated based on the instances, where the matrix enumerates instances of stems and affixes. A weighting function is applied to attribute-values within the matrix to generate a weighted morpheme-by-document matrix. A lower rank approximation matrix is generated by factorizing the weighted matrix. Information is retrieved with reference to the lower rank approximation matrix.

Filed: 1/13/2009

Application Number: US2009352621A

Tech ID: SD 11033.0

This invention was made with Government support under Contract No. DE-NA0003525 awarded by the United States Department of Energy/National Nuclear Security Administration. The Government has certain rights in the invention.

Data from Derwent World Patents Index, provided by Clarivate
All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.