Computation of term dominance in text documents
| DWPI Title: Computer implemented method for characterizing corpus of text document, involves storing information characterizing corpus to data storage unit coupled to processor based on generated set of dominance metrics |
| Abstract: An improved entropy-based term dominance metric useful for characterizing a corpus of text documents, and is useful for comparing the term dominance metrics of a first corpus of documents to a second corpus having a different number of documents. |
| Use: Computer implemented method for characterizing corpus of text document. |
| Advantage: The term dominance metrics of corpus of documents is compared with the corpus associated with a different number of documents, so that the entropy-based term dominance metric can be improved. |
| Novelty: The method involves generating (10) a set of dominance metrics for a corpus with documents. The entropy value for the respective term is performed based on a respective sum of product values for each document of corpus. The information characterizing the corpus is stored to a data storage unit communicatively coupled to the processor based on the generated set of dominance metrics. |
| Filed: 2/3/2009 |
| Application Number: US2009364753A |
| Tech ID: SD 10392.0 |
| This invention was made with Government support under Contract No. DE-NA0003525 awarded by the United States Department of Energy/National Nuclear Security Administration. The Government has certain rights in the invention. |
| Data from Derwent World Patents Index, provided by Clarivate All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license. |