Abstract: A technique for information retrieval includes parsing a corpus to
identify a number of wordform instances within each document of the
corpus. A weighted morpheme-by-document matrix is generated based at
least in part on the number of wordform instances within each document of
the corpus and based at least in part on a weighting function. The
weighted morpheme-by-document matrix separately enumerates instances of
stems and affixes. Additionally or alternatively, a term-by-term
alignment matrix may be generated based at least in part on the number of
wordform instances within each document of the corpus. At least one lower
rank approximation matrix is generated by factorizing the weighted
morpheme-by-document matrix and/or the term-by-term alignment matrix. |
Filed: 1/13/2009 |
Application Number: 12/352621 |
This invention was made with Government support under Contract No. DE-NA0003525 awarded by the United States Department of Energy/National Nuclear Security Administration. The Government has certain rights in the invention. |
Attribution for Derwent World Patents Index Records published on Sandia ® echo date('Y'); ?> Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license. |