How to bin many category combinations (72 variables) in python lists -
i have data stored in list of lists organized so:
lst = [ ['fhcontrol', g, a] ['mnhdosed', g, c] ]
for row in lst: row[0] there total of 12 categories (i've listed 2 in sample code above). row[1] , row[2] concerned 6 of combinations of these letters. therefore, have 72 possible combinations of data per row in lst , need count instances of each combination without having write dozens of nested if loops.
i attempting in creating 2 functions parse through these lists , bin incidences of these 72 combinations. how can use 2 function beginning write below update these variables? need construct dictionaries class variables can continue update them iterate through both functions? guidance great!
here code have initializes 72 variables 6 dictionaries (for 6 combinations of letters in row[1] , row[2]):
def baseparser(lst): temp = dict.fromkeys('fhdosed fhcontrol fnhdosed fnhcontrol ' 'ftdosed ftcontrol mhdosed mhcontrol ' 'mnhdosed mnhcontrol mtdosed mtcontrol'.split(), 0) tri_1, tri_2, trv_1, trv_2, trv_3, trv_4 = ([dict(temp) in range(6)]) row in lst: if row[0] == 'fhdosed': binner(row[0], row[1], row[2]) if row[0] == 'fhcontrol': binner(row[0], row[1], row[2]) etc. def binner(key, q, s): if (q == 'g' , s == 'a') or (q =='c' , s =='t'): tri_1[key] += 1 elif (q == 'a' , s == 'g') or (q =='t' , s =='c'): tri_2[key] += 1 elif (q == 'g' , s == 't') or (q =='c' , s =='a'): trv_1[key] += 1 elif (q == 'g' , s == 'c') or (q =='c' , s =='g'): trv_1[key] += 1 elif (q == 'a' , s == 't') or (q =='t' , s =='a'): trv_1[key] += 1 elif (q == 'a' , s == 'c') or (q =='t' , s =='g'): trv_1[key] += 1
your code simplified to:
temp = dict.fromkeys('''fhdosed fhcontrol fnhdosed fnhcontrol ftdosed ftcontrol mhdosed mhcontrol mnhdosed mnhcontrol mtdosed mtcontrol'''.split(), 0) tri_1, tri_2, trv_1, trv_2, trv_3, trv_4 = [temp.copy() in range(6)] dmap = { ('g', 'a'): tri_1, ('c', 't'): tri_1, ('a', 'g'): tri_2, ('t', 'c'): tri_2, ('g', 'c'): trv_1, ('c', 'g'): trv_1, ('a', 't'): trv_1, ('t', 'a'): trv_1, } row in lst: key, q, s = row dmap[q, s][key] += 1
another possiblity use one dict of dicts instead of 6 dicts:
temp = dict.fromkeys('''fhdosed fhcontrol fnhdosed fnhcontrol ftdosed ftcontrol mhdosed mhcontrol mnhdosed mnhcontrol mtdosed mtcontrol'''.split(), 0) tr = {key:temp.copy() key in ('tri_1', 'tri_2', 'trv_1', 'trv_2', 'trv_3', 'trv_4')} dmap = { ('g', 'a'): 'tri_1', ('c', 't'): 'tri_1', ('a', 'g'): 'tri_2', ('t', 'c'): 'tri_2', ('g', 'c'): 'trv_1', ('c', 'g'): 'trv_1', ('a', 't'): 'trv_1', ('t', 'a'): 'trv_1', } lst = [ ['fhcontrol', 'g', 'a'], ['mnhdosed', 'g', 'c'] ] row in lst: key, q, s = row tr[dmap[q, s]][key] += 1 print(tr)
the advantage of doing way have fewer dicts in namespace, , may easier refactor code later using dict of dicts instead of hard-coding 6 dicts.
following on midnighter's suggestion, if have pandas, replace dict of dicts dataframe. frequency of pairs computed using pd.crosstabs this:
import pandas pd dmap = { 'ga': 'tri_1', 'ct': 'tri_1', 'ag': 'tri_2', 'tc': 'tri_2', 'gc': 'trv_1', 'cg': 'trv_1', 'at': 'trv_1', 'ta': 'trv_1', } lst = [ ['fhcontrol', 'g', 'a'], ['mnhdosed', 'g', 'c'] ] df = pd.dataframe(lst, columns=['key', 'q', 's']) df['tr'] = (df['q']+df['s']).map(dmap) print(df) # key q s tr # 0 fhcontrol g tri_1 # 1 mnhdosed g c trv_1 print(pd.crosstab(rows=[df['key']], cols=[df['tr']]))
yields
tr tri_1 trv_1 key fhcontrol 1 0 mnhdosed 0 1
Comments
Post a Comment