A text analysis service to find out what any text is about by extracting the most relevant Wikipedia’s categories through a patented NLP technology
With this Ontology Based Topic Detection you can instantly discover what millions of texts are about by extracting the most relevant Wikipedia’s categories.
This advanced NLP API is particularly effective for analyzing and classified specialized articles in a field of human knowledge: scientific research, historical document, geography, economic press …
This Topic Detection API is based on a patented algorithm developed by Proxem and presented in 2013 for the main French Natural Language Processing annual conference. The published paper’s title is “Cross-lingual and generic text categorization”.
About the paper
Text categorization usually requires a significant investment, which must often be associated to a field adaptation. The approach we propose here allows to finely associate a graph of Wikipedia categories to any text written in a given language. Moreover, the interlingual index of the online encyclopedia allows to get a subset of this graph in most other languages.
Keywords: categorization, machine learning, information retrieval, Wikipedia, graph
François-Régis Chaumartin. Apprentissage d'une classification thématique générique et cross-langue à partir des catégories de la Wikipédia. TALN - Traitement Automatique des Langues Naturelles - 2013, Jun 2013, Les Sables d'Olonne, France, pp.659-666.
See more or download the paper (in French):