Turing Centre KnowItAll Project: TextRunner

Information Extraction systems which apply methods from Natural Language Processing are close rivals of Semantic Web search applications. One research institute developing such systems is the Turing Center in the Department of Computer Science and Engineering in the University of Washington. The research conducted in the Center is a combination of various disciplines, such as Semantic Web, Data Mining, and Natural Language Processing.

One of the Center’s research projects is KnowItAll, a project focused on information extraction from the Web in order to provide more efficient search results for the user, using NLP. The specific data extraction applications produced by the project include KnowItAll, which carries out domain-independent large-scale information extraction from Web content, Opine, a sentiment analysis system, and more recently, TextRunner, for which a demo is available at the following link: http://www.cs.washington.edu/research/textrunner/.

TextRunner is a search engine which sorts the results of a query according to probability. It permits specification of search parameters according to the headings Nutrition, History of Science and General Knowledge. It applies a novel information extraction algorithm entitled Open Information Extraction (OIE), which uses a linguistic parser to label extracted data as trustworthy or untrustworthy, which are used as input to a Naïve Bayes classifier.

Tags: , , , ,

Related posts

Leave a comment

Please be polite and on topic. Your e-mail will never be published.

You must be logged in to post a comment.