How to bin many category combinations (72 variables) in python lists -

March 15, 2012

i have data stored in list of lists organized so:

lst = [       ['fhcontrol', g, a]       ['mnhdosed', g, c]       ]

for row in lst: row[0] there total of 12 categories (i've listed 2 in sample code above). row[1] , row[2] concerned 6 of combinations of these letters. therefore, have 72 possible combinations of data per row in lst , need count instances of each combination without having write dozens of nested if loops.

i attempting in creating 2 functions parse through these lists , bin incidences of these 72 combinations. how can use 2 function beginning write below update these variables? need construct dictionaries class variables can continue update them iterate through both functions? guidance great!

here code have initializes 72 variables 6 dictionaries (for 6 combinations of letters in row[1] , row[2]):

def baseparser(lst):     temp = dict.fromkeys('fhdosed fhcontrol fnhdosed fnhcontrol '                          'ftdosed ftcontrol mhdosed mhcontrol '                          'mnhdosed mnhcontrol mtdosed mtcontrol'.split(), 0)     tri_1, tri_2, trv_1, trv_2, trv_3, trv_4 = ([dict(temp) in range(6)])      row in lst:         if row[0] == 'fhdosed':             binner(row[0], row[1], row[2])         if row[0] == 'fhcontrol':             binner(row[0], row[1], row[2])         etc.  def binner(key, q, s):     if (q == 'g' , s == 'a') or (q =='c' , s =='t'):         tri_1[key] += 1     elif (q == 'a' , s == 'g') or (q =='t' , s =='c'):         tri_2[key] += 1     elif (q == 'g' , s == 't') or (q =='c' , s =='a'):         trv_1[key] += 1     elif (q == 'g' , s == 'c') or (q =='c' , s =='g'):         trv_1[key] += 1     elif (q == 'a' , s == 't') or (q =='t' , s =='a'):         trv_1[key] += 1     elif (q == 'a' , s == 'c') or (q =='t' , s =='g'):         trv_1[key] += 1

your code simplified to:

temp = dict.fromkeys('''fhdosed fhcontrol fnhdosed fnhcontrol ftdosed ftcontrol mhdosed                        mhcontrol mnhdosed mnhcontrol mtdosed mtcontrol'''.split(), 0) tri_1, tri_2, trv_1, trv_2, trv_3, trv_4 = [temp.copy() in range(6)]  dmap = {     ('g', 'a'): tri_1,     ('c', 't'): tri_1,     ('a', 'g'): tri_2,     ('t', 'c'): tri_2,             ('g', 'c'): trv_1,     ('c', 'g'): trv_1,             ('a', 't'): trv_1,     ('t', 'a'): trv_1,             }  row in lst:     key, q, s = row     dmap[q, s][key] += 1

another possiblity use one dict of dicts instead of 6 dicts:

temp = dict.fromkeys('''fhdosed fhcontrol fnhdosed fnhcontrol ftdosed ftcontrol mhdosed                        mhcontrol mnhdosed mnhcontrol mtdosed mtcontrol'''.split(), 0) tr = {key:temp.copy() key in ('tri_1', 'tri_2', 'trv_1', 'trv_2', 'trv_3', 'trv_4')}   dmap = {     ('g', 'a'): 'tri_1',     ('c', 't'): 'tri_1',     ('a', 'g'): 'tri_2',     ('t', 'c'): 'tri_2',      ('g', 'c'): 'trv_1',     ('c', 'g'): 'trv_1',      ('a', 't'): 'trv_1',     ('t', 'a'): 'trv_1',             }  lst = [       ['fhcontrol', 'g', 'a'],       ['mnhdosed', 'g', 'c']       ]  row in lst:     key, q, s = row     tr[dmap[q, s]][key] += 1  print(tr)

the advantage of doing way have fewer dicts in namespace, , may easier refactor code later using dict of dicts instead of hard-coding 6 dicts.

following on midnighter's suggestion, if have pandas, replace dict of dicts dataframe. frequency of pairs computed using pd.crosstabs this:

import pandas pd  dmap = {     'ga': 'tri_1',     'ct': 'tri_1',     'ag': 'tri_2',     'tc': 'tri_2',      'gc': 'trv_1',     'cg': 'trv_1',      'at': 'trv_1',     'ta': 'trv_1',             }  lst = [       ['fhcontrol', 'g', 'a'],       ['mnhdosed', 'g', 'c']       ]  df = pd.dataframe(lst, columns=['key', 'q', 's']) df['tr'] = (df['q']+df['s']).map(dmap)  print(df) #          key  q  s     tr # 0  fhcontrol  g   tri_1 # 1   mnhdosed  g  c  trv_1  print(pd.crosstab(rows=[df['key']], cols=[df['tr']]))

yields

tr         tri_1  trv_1 key                     fhcontrol      1      0 mnhdosed       0      1

Search This Blog

My

How to bin many category combinations (72 variables) in python lists -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -