java - instant searching in petabyte of data -

- March 15, 2013

i need search on petabyte of data in csv formate files. after indexing using lucene, size of indexing file doubler original file. possible reduce indexed file size??? how distribute lucene index files in hadoop , how use in searching environment? or necessary, should use solr distribute lucene index??? requirement doing instant search on petabyte of files....

any decent off shelf search engine (like lucene) should able provide search functionality on size of data have. may have bit of work front design indexes , configure how search works, config.

you won't instant results might able very quick results. speed depend on how set , kind of hardware run on.

you mention indexes larger original data. expected. indexing includes form of denormalisation. size of indexes trade off speed; more ways slice , dice data in advance, quicker find references.

lastly mention distributing indexes, not want do. practicalities of distributing many petabytes of data pretty daunting. want have indexes sat on big fat computer somewhere , provide search services on data (bring query data, don't take data query).

Search This Blog

Sohocode

java - instant searching in petabyte of data -

Comments

Post a Comment

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

sql server - python to mssql encoding problem -

windows - Python Service Installation - "Could not find PythonClass entry" -