pandas - numpy: aggregate 4D array by groups -


i have numpy array shape [t, z, x, y] epresenting hourly time series of three-d data. axes of array time, vertical coordinate, horizontal coordinate 1, horizontal coordinate 2. there t-element list of hourly datetime.datetime timestamps.

i want calculate daily mid-day means each day. [nday, z, x, y] array.

i'm trying find pythonic way this. i've written bunch of loops works seems slow, inflexible, , verbose.

it appears me pandas not solution me because time series data three-dimensional. i'd happy proven wrong.

i've come this, using itertools, find mid-day timestamps , group them date, , i'm coming short trying apply imap find means.

import numpy np import pandas pd import itertools  # create 72 hours of pseudo-data 3 vertical levels , 4 4 # horizontal grid. data = np.zeros((72, 3, 4, 4)) t = pd.date_range(datetime(2008,7,1), freq='1h', periods=72) in range(data.shape[0]):     data[i,...] =  # find timestamps "midday" in north america.  we'll # define midday between 15:00 , 23:00 utc, 10:00 est # 15:00 pst. def is_midday(this_t):     return ((this_t.hour >= 15) , (this_t.hour <= 23))  # group midday timestamps date dt, grp in itertools.groupby(itertools.ifilter(is_midday, t),                                  key=lambda x: x.date()):     print 'date ' + str(dt)     g in grp:         print g  # find means of mid-day data date  data_list = np.split(data, data.shape[0]) grps = itertools.groupby(itertools.ifilter(is_midday, t),                          key=lambda x: x.date()) # how apply itertools.imap (or else) data_list , # grps?  or somehow split data along axis 0 according grps?  

you can shove pretty object pandas structure. not recommended, in case might work you.

create series indexed time, each element 3-d numpy array

in [117]: s = series([data[i] in range(data.shape[0])],index=t)  in [118]: s out[118]:  2008-07-01 00:00:00    [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], ... 2008-07-01 01:00:00    [[[1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0], ... 2008-07-01 02:00:00    [[[2.0, 2.0, 2.0, 2.0], [2.0, 2.0, 2.0, 2.0], ... 2008-07-01 03:00:00    [[[3.0, 3.0, 3.0, 3.0], [3.0, 3.0, 3.0, 3.0], ... 2008-07-01 04:00:00    [[[4.0, 4.0, 4.0, 4.0], [4.0, 4.0, 4.0, 4.0], ... 2008-07-01 05:00:00    [[[5.0, 5.0, 5.0, 5.0], [5.0, 5.0, 5.0, 5.0], ... 2008-07-01 06:00:00    [[[6.0, 6.0, 6.0, 6.0], [6.0, 6.0, 6.0, 6.0], ... 2008-07-01 07:00:00    [[[7.0, 7.0, 7.0, 7.0], [7.0, 7.0, 7.0, 7.0], ... 2008-07-01 08:00:00    [[[8.0, 8.0, 8.0, 8.0], [8.0, 8.0, 8.0, 8.0], ... 2008-07-01 09:00:00    [[[9.0, 9.0, 9.0, 9.0], [9.0, 9.0, 9.0, 9.0], ... 2008-07-01 10:00:00    [[[10.0, 10.0, 10.0, 10.0], [10.0, 10.0, 10.0,... 2008-07-01 11:00:00    [[[11.0, 11.0, 11.0, 11.0], [11.0, 11.0, 11.0,... 2008-07-01 12:00:00    [[[12.0, 12.0, 12.0, 12.0], [12.0, 12.0, 12.0,... 2008-07-01 13:00:00    [[[13.0, 13.0, 13.0, 13.0], [13.0, 13.0, 13.0,... 2008-07-01 14:00:00    [[[14.0, 14.0, 14.0, 14.0], [14.0, 14.0, 14.0,... ... 2008-07-03 09:00:00    [[[57.0, 57.0, 57.0, 57.0], [57.0, 57.0, 57.0,... 2008-07-03 10:00:00    [[[58.0, 58.0, 58.0, 58.0], [58.0, 58.0, 58.0,... 2008-07-03 11:00:00    [[[59.0, 59.0, 59.0, 59.0], [59.0, 59.0, 59.0,... 2008-07-03 12:00:00    [[[60.0, 60.0, 60.0, 60.0], [60.0, 60.0, 60.0,... 2008-07-03 13:00:00    [[[61.0, 61.0, 61.0, 61.0], [61.0, 61.0, 61.0,... 2008-07-03 14:00:00    [[[62.0, 62.0, 62.0, 62.0], [62.0, 62.0, 62.0,... 2008-07-03 15:00:00    [[[63.0, 63.0, 63.0, 63.0], [63.0, 63.0, 63.0,... 2008-07-03 16:00:00    [[[64.0, 64.0, 64.0, 64.0], [64.0, 64.0, 64.0,... 2008-07-03 17:00:00    [[[65.0, 65.0, 65.0, 65.0], [65.0, 65.0, 65.0,... 2008-07-03 18:00:00    [[[66.0, 66.0, 66.0, 66.0], [66.0, 66.0, 66.0,... 2008-07-03 19:00:00    [[[67.0, 67.0, 67.0, 67.0], [67.0, 67.0, 67.0,... 2008-07-03 20:00:00    [[[68.0, 68.0, 68.0, 68.0], [68.0, 68.0, 68.0,... 2008-07-03 21:00:00    [[[69.0, 69.0, 69.0, 69.0], [69.0, 69.0, 69.0,... 2008-07-03 22:00:00    [[[70.0, 70.0, 70.0, 70.0], [70.0, 70.0, 70.0,... 2008-07-03 23:00:00    [[[71.0, 71.0, 71.0, 71.0], [71.0, 71.0, 71.0,... freq: h, length: 72 

define aggregating function. need access values returns inside object; concatenating coerces actual numpy array, aggregate (mean in case)

in [119]: def f(g,grp):    .....:     return np.concatenate(grp.values).mean()    .....:  

since not sure end output should like, create time-based grouper manually (this resample), doesn't final results (its list of aggregated values)

in [121]: [ f(g,grp) g, grp in s.groupby(pd.grouper(freq='d')) ] out[121]: [11.5, 35.5, 59.5] 

you can reasonable fancy here , return pandas object (and potentially concat them).


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -