text parsing - Where should I store a list of stop words? -


my function parses texts , removes short words, such "a", "the", "in", "on", "at", etc.

the list of these words might modified in future. also, switching between different lists (i.e., different languages) might option.

so, should store such list?

  • about 50-200 words
  • many reads every minute
  • almost no writes (modifications) - example, once in few months

i have these options in mind:

  1. a list inside code (fastest, doesn't sound practise)
  2. a seperate file "stop_words.txt" (how fast reading file? should read same data same file every few seconds call same function?)
  3. a database table. efficient, when list of words supposed static?

i using ruby on rails (if makes difference).

if it's 50-200 words, i'd store in memory in data structure supports fast lookup, such hash map (i don't know such structure called in ruby).

you use option 2 or 3 (persist data in file or database table, depending on what's easier you), read data memory @ start of application. store time @ data read , re-read persistent storage if request comes in , data hasn't been updated x minutes.

that's cache. might possible ruby on rails provides such mechanism, know little answer that.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -