database - Find duplicates in app engine datastore -


i've duplicated elements in datastore (not whole row, of fields on it) in app engine.

what's best way find them?

i've both integer , string fields duplicated (in case comparing 1 faster other).

thanks!

an stupid quick approach take fields care about, concatenate them long string , store them key of db_unique entity references original entity. each time db_unique.get_or_insert() should verify reference correct original entity, otherwise, have duplicate. should done in map reduce.

something like:

class db_unique(db.model):   r = db.referenceproperty()  class db_obj(db.model):   = db.integerproperty()   b = db.stringproperty()   c = db.stringproperty()  # executed each db_obj... def mapreduce(entity):   key = '%s_%s_%s' % (entity.a,entity.b,entity.c)   res = db_unique.get_or_insert(key, r=entity)   if db_unique.r.get_value_for_datastore(res) != entity.key():     # have possible collision, verify , delete?     # out 2 entities res , entity 

there couple of edge cases might creep up, such if have 2 entities b , c equal ('a_b', '') , ('a','b_') respectively, concatenation 'a_b_' both. use character know not in strings instead of '_', or have db_unique.r list of references , compare of them.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

windows - Python Service Installation - "Could not find PythonClass entry" -

Determine if a XmlNode is empty or null in C#? -