server side set intersection in mongodb -


in application working on, requirement massive set intersection, tune of 10-1,000,000 items or so. items intersecting objectid's.

so instance there boxes document , inside boxes document there item_ids array. item_ids array each box holds 10-1,000,000 objectid's.

the end goal here say, given box objectid 4d3dc3898951498107000005, , box b objectid 4d3dc3898951498107000002, item_ids have in common?

here how im doing it:

db.boxes.distinct("item_ids", {'_id' : {$in : [objectid("4d3dc3898951498107000005"), objectid("4d3dc3898951498107000002")]}}) 

firstly curious if seems sane approach. in research far seems map reduce common suggestion large intersections, not recommended realtime queries.

secondly, curious how behave in sharded environment? mongos run chunk of query on mongod's needs , aggregate result magically?

lastly, if above sane, sane do:

db.items.find({'_id' : { $in : db.eval(function() {return db.boxes.distinct("item_ids", {_id:{$in:[objectid("4d3dc3898951498107000005"), objectid("4d3dc3898951498107000002")]}}); }) }})  

which finding items both box , box b have in common, , materializing them objects in 1 server side query. appears work .limit , .skip implement paging of data set.

anyhow, feedback valuable, thanks!

i think may want reconsider schema. if have 1,000,000 objectids in array @ 12 bytes each 12mb not counting bson overhead can significant large arrays* (probably 8mb or so). in 1.8 raising max document size 4mb 16mb, won't enough objects looking store.

*for historical reasons store stingified index each element in array fine when have <100 elements, adds when need 6 or 7 digits.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -