templates - Python dictionaries, find similarities -
i have python dictionary thousand items. each item is, itself, dictionary. i'm looking clean , elegant way parse through each item, , find & create templates.
here's simplified example of individual dictionaries' structure:
{'id': 1, 'template': none, 'height': 80, 'width': 120, 'length': 75, 'weight': 100}
from this, want pass through once, , if, 500 of 1000 share same height , width, determine that, can build template off data, , assign template id 'template'. can build gigantic reference hash, i'm hoping there's cleaner more elegant way accomplish this.
the actual data includes closer 30 keys, of small subset need excluded template checking.
@eumiro had excellent core idea, namely of using itertools.groupby()
arrange items common values in batches. besides neglecting sort things first using same key function @jochen ritzel pointed-out (and mentioned in documentation), didn't address several other things mentioned wanting do.
below more complete , longer answer. determines templates , assigns them in 1 pass thought dict-of-dicts. this, after first creating sorted list of items, uses groupby()
batch them, , if there enough in each group, creates template , assigns id each member.
inventory = { 'item1': {'id': 1, 'template': none, 'height': 80, 'width': 120, 'length': 75, 'weight': 100}, 'item2': {'id': 2, 'template': none, 'height': 30, 'width': 40, 'length': 20, 'weight': 20}, 'item3': {'id': 3, 'template': none, 'height': 80, 'width': 100, 'length': 96, 'weight': 150}, 'item4': {'id': 4, 'template': none, 'height': 30, 'width': 40, 'length': 60, 'weight': 75}, 'item5': {'id': 5, 'template': none, 'height': 80, 'width': 100, 'length': 36, 'weight': 33} } import itertools itools def print_inventory(): print 'inventory:' key in sorted(inventory.iterkeys()): print ' {}: {}'.format(key, inventory[key]) print "-- before --" print_inventory() threshold = 2 allkeys = ['template', 'height', 'width', 'length', 'weight'] excludedkeys = ['template', 'length', 'weight'] includedkeys = [key key in allkeys if key not in excludedkeys] # determines keys make template sortby = lambda item, keys=includedkeys: tuple(item[key] key in keys) templates = {} templateid = 0 sortedinventory = sorted(inventory.itervalues(), key=sortby) templatetuple, similariter in itools.groupby(sortedinventory, sortby): similaritems = list(similariter) if len(similaritems) >= threshold: # create , assign template templateid += 1 templates[templateid] = templatetuple # tuple of values of includedkeys item in similaritems: item['template'] = templateid print print "-- after --" print_inventory() print print 'templates:', templates print
when run it, following output:
-- before -- inventory: item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': none, 'id': 1} item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': none, 'id': 2} item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': none, 'id': 3} item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': none, 'id': 4} item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': none, 'id': 5} -- after -- inventory: item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': none, 'id': 1} item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2} item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3} item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4} item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5} templates: {1: (30, 40), 2: (80, 100)}
Comments
Post a Comment