Computational Linguistics Laboratory

You are in Research -> Semantic text processing

Research

GETARUNS: GEneral Text And Reference UNderstanding System

For people interested in using our system, we make available the executable under SWI Prolog for Linux Ubuntu systems, together with some compressed test texts. Other systems may be supported: send an email to the author and owner of the system, at delmont@unive.it.

Also people interested in the updates that will be produced every 6 months, send an email.

To run the system, please install SWI Prolog on your computer, available at http://www.swi-prolog.org/ link esterno

Deep GETARUNS is a complete system for text processing, understanding, summarizing and question answering. It is addressed to the domain of small text (1000 words) for second language learning. It can answer questions on the basis of the Discourse Model that is produced at the end of the analysis. So in order to ask questions, a text should be analysed first. The system can work in two modalities: top-down, bottom-up. The top-down modality is used for closed domain small texts with short sentences - typically below 25 words. The bottom-up version can be used with any other text. In case of failure, the system will use the default "partial" modality which is less constrained and has a simpler semantics.

The menu of "deep_getaruns" has the usual File and Help submenus. In addition there are four scrollable menus: Parse, Question, Focalized_QA, Analyze_Summarize.

Parse, can only be used with previously tokenized sentences. This calls the top-down version.
Question, can be used to ask natural language questions basically of the factoid type. Also why and how questions can be asked but not always answers will be found. The menu makes also available a subcall for "Questions from database" which is now only restricted to two demo texts included in the "coretexts" folder, i.e. Restaurant and Maple Syrup.
Focalized_QA is a modality that allows generic Predicate-Argument search on the Discourse Model to find all events and properties related to a given entity that is the object of the question. Questions may only be of the kind: "what has happened to X ?" where X is the name or the linguistic description of one of the entities of the text. The system differentiates factive from nonfactive events if there are any. Again, in the "coretext" folder there are three small texts where the system differentiates what happened from what the entity thought.
The main call is under the menu Analyze_Summarize, which parses the text sentence by sentence and produces a Discourse Model at first at sentence level. Then at the end of the computation all entities of the world are presented ordered according to the ontology - inds, sets, classes, locs. Within each type, the system grades the most relevant entities first, the score being computed on the basis of the output of the Centering (fully revised) algorithm.

You can also compare link esterno our semantic representation with that of other systems which participated in the STEP 2008 shared task.

More on the system can be found in the books:

Delmonte, R. (2007). Computational Linguistic Text Processing - Logical Form, Semantic Interpretation, Discourse Relations and Question Answering. New York: Nova Science Publishers. ISBN: 1:60021-700-1.
Delmonte, R. (2008). Computational Linguistic Text Processing - Lexicon, Grammar, Parsing and Anaphora Resolution. New York: Nova Science Publishers. ISBN: 978-1-60456-749-6.