information retrieval - Question Answering with Lucene -


for toy project, want implement automated question answering system lucene , i'm trying figure out reasonable way implement it. basic operation follows:

1) user enter question.

2) system identify keywords in question.

3) keywords searched in large knowledgebase , matching sentences shown answers.

my knowledgebase (i.e., corpus) not structured. large, continuous text (say, user manual without chapters). mean structure sentences , paragraphs identified.

i plan treat each sentence or paragraph separate document. present answer in context, may consider keeping 1 sentence/paragraph before/after indexed 1 payload. know if makes sense. also, i'm wondering if there other tried , well-known approaches kind of systems. example, approach comes mind index large chunks of corpus documents token positions, process vicinity of found keywords construct answers.

i appreciate direct recommendations based on experience or intuition, tutorials or introductory materials question-answering systems lucene in mind.

thanks.

indexing each sentence document give problems. you've pointed out one: need store surrounding texts payloads. means you'll need store each sentence 3 times (before, during , after), , you'll have manually payload.

if want go route of each sentence being document, recommend coming id each sentence , storing separate field. can display [id-1, id, id+1] in each result.

the bigger question though is: how should break text documents? identifying semantically related areas seems difficult, doing sentence/paragraph might way go. better way if find text header of section, , put in section document.

you might want use index (if corpus has one). terms there boosted, presumably more important.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -