database - Find duplicates in app engine datastore -
i've duplicated elements in datastore (not whole row, of fields on it) in app engine.
what's best way find them?
i've both integer , string fields duplicated (in case comparing 1 faster other).
thanks!
an stupid quick approach take fields care about, concatenate them long string , store them key of db_unique
entity references original entity. each time db_unique.get_or_insert()
should verify reference correct original entity, otherwise, have duplicate. should done in map reduce.
something like:
class db_unique(db.model): r = db.referenceproperty() class db_obj(db.model): = db.integerproperty() b = db.stringproperty() c = db.stringproperty() # executed each db_obj... def mapreduce(entity): key = '%s_%s_%s' % (entity.a,entity.b,entity.c) res = db_unique.get_or_insert(key, r=entity) if db_unique.r.get_value_for_datastore(res) != entity.key(): # have possible collision, verify , delete? # out 2 entities res , entity
there couple of edge cases might creep up, such if have 2 entities b , c equal ('a_b', '') , ('a','b_') respectively, concatenation 'a_b_' both. use character know not in strings instead of '_', or have db_unique.r
list of references , compare of them.
Comments
Post a Comment