pandas - numpy: aggregate 4D array by groups -
i have numpy array shape [t, z, x, y] epresenting hourly time series of three-d data. axes of array time, vertical coordinate, horizontal coordinate 1, horizontal coordinate 2. there t-element list of hourly datetime.datetime timestamps.
i want calculate daily mid-day means each day. [nday, z, x, y] array.
i'm trying find pythonic way this. i've written bunch of loops works seems slow, inflexible, , verbose.
it appears me pandas not solution me because time series data three-dimensional. i'd happy proven wrong.
i've come this, using itertools, find mid-day timestamps , group them date, , i'm coming short trying apply imap find means.
import numpy np import pandas pd import itertools # create 72 hours of pseudo-data 3 vertical levels , 4 4 # horizontal grid. data = np.zeros((72, 3, 4, 4)) t = pd.date_range(datetime(2008,7,1), freq='1h', periods=72) in range(data.shape[0]): data[i,...] = # find timestamps "midday" in north america. we'll # define midday between 15:00 , 23:00 utc, 10:00 est # 15:00 pst. def is_midday(this_t): return ((this_t.hour >= 15) , (this_t.hour <= 23)) # group midday timestamps date dt, grp in itertools.groupby(itertools.ifilter(is_midday, t), key=lambda x: x.date()): print 'date ' + str(dt) g in grp: print g # find means of mid-day data date data_list = np.split(data, data.shape[0]) grps = itertools.groupby(itertools.ifilter(is_midday, t), key=lambda x: x.date()) # how apply itertools.imap (or else) data_list , # grps? or somehow split data along axis 0 according grps?
you can shove pretty object pandas structure. not recommended, in case might work you.
create series indexed time, each element 3-d numpy array
in [117]: s = series([data[i] in range(data.shape[0])],index=t) in [118]: s out[118]: 2008-07-01 00:00:00 [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], ... 2008-07-01 01:00:00 [[[1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0], ... 2008-07-01 02:00:00 [[[2.0, 2.0, 2.0, 2.0], [2.0, 2.0, 2.0, 2.0], ... 2008-07-01 03:00:00 [[[3.0, 3.0, 3.0, 3.0], [3.0, 3.0, 3.0, 3.0], ... 2008-07-01 04:00:00 [[[4.0, 4.0, 4.0, 4.0], [4.0, 4.0, 4.0, 4.0], ... 2008-07-01 05:00:00 [[[5.0, 5.0, 5.0, 5.0], [5.0, 5.0, 5.0, 5.0], ... 2008-07-01 06:00:00 [[[6.0, 6.0, 6.0, 6.0], [6.0, 6.0, 6.0, 6.0], ... 2008-07-01 07:00:00 [[[7.0, 7.0, 7.0, 7.0], [7.0, 7.0, 7.0, 7.0], ... 2008-07-01 08:00:00 [[[8.0, 8.0, 8.0, 8.0], [8.0, 8.0, 8.0, 8.0], ... 2008-07-01 09:00:00 [[[9.0, 9.0, 9.0, 9.0], [9.0, 9.0, 9.0, 9.0], ... 2008-07-01 10:00:00 [[[10.0, 10.0, 10.0, 10.0], [10.0, 10.0, 10.0,... 2008-07-01 11:00:00 [[[11.0, 11.0, 11.0, 11.0], [11.0, 11.0, 11.0,... 2008-07-01 12:00:00 [[[12.0, 12.0, 12.0, 12.0], [12.0, 12.0, 12.0,... 2008-07-01 13:00:00 [[[13.0, 13.0, 13.0, 13.0], [13.0, 13.0, 13.0,... 2008-07-01 14:00:00 [[[14.0, 14.0, 14.0, 14.0], [14.0, 14.0, 14.0,... ... 2008-07-03 09:00:00 [[[57.0, 57.0, 57.0, 57.0], [57.0, 57.0, 57.0,... 2008-07-03 10:00:00 [[[58.0, 58.0, 58.0, 58.0], [58.0, 58.0, 58.0,... 2008-07-03 11:00:00 [[[59.0, 59.0, 59.0, 59.0], [59.0, 59.0, 59.0,... 2008-07-03 12:00:00 [[[60.0, 60.0, 60.0, 60.0], [60.0, 60.0, 60.0,... 2008-07-03 13:00:00 [[[61.0, 61.0, 61.0, 61.0], [61.0, 61.0, 61.0,... 2008-07-03 14:00:00 [[[62.0, 62.0, 62.0, 62.0], [62.0, 62.0, 62.0,... 2008-07-03 15:00:00 [[[63.0, 63.0, 63.0, 63.0], [63.0, 63.0, 63.0,... 2008-07-03 16:00:00 [[[64.0, 64.0, 64.0, 64.0], [64.0, 64.0, 64.0,... 2008-07-03 17:00:00 [[[65.0, 65.0, 65.0, 65.0], [65.0, 65.0, 65.0,... 2008-07-03 18:00:00 [[[66.0, 66.0, 66.0, 66.0], [66.0, 66.0, 66.0,... 2008-07-03 19:00:00 [[[67.0, 67.0, 67.0, 67.0], [67.0, 67.0, 67.0,... 2008-07-03 20:00:00 [[[68.0, 68.0, 68.0, 68.0], [68.0, 68.0, 68.0,... 2008-07-03 21:00:00 [[[69.0, 69.0, 69.0, 69.0], [69.0, 69.0, 69.0,... 2008-07-03 22:00:00 [[[70.0, 70.0, 70.0, 70.0], [70.0, 70.0, 70.0,... 2008-07-03 23:00:00 [[[71.0, 71.0, 71.0, 71.0], [71.0, 71.0, 71.0,... freq: h, length: 72
define aggregating function. need access values returns inside object; concatenating
coerces actual numpy array, aggregate (mean in case)
in [119]: def f(g,grp): .....: return np.concatenate(grp.values).mean() .....:
since not sure end output should like, create time-based grouper manually (this resample), doesn't final results (its list of aggregated values)
in [121]: [ f(g,grp) g, grp in s.groupby(pd.grouper(freq='d')) ] out[121]: [11.5, 35.5, 59.5]
you can reasonable fancy here , return pandas object (and potentially concat
them).
Comments
Post a Comment