python - How can I divide up a pandas dataframe? -
i have enormous timeseries of functions stored in pandas dataframe in hdf5 store , want make plots of transform of every function in timeseries. since number of plots large, , plotting them takes long, i've used fork() , numpy.array_split() break indices , run several plots in parallel.
doing things way means every process has copy of whole timeseries. since limits how many processes can run total amount of memory use, able have each process store it's own chunk of dataframe.
how can split pandas dataframe?
np.array_split works pretty usecase.
[40]: df = dataframe(np.random.randn(5,10)) in [41]: df out[41]: 0 1 2 3 4 5 6 7 8 9 0 -1.998163 -1.973708 0.461369 -0.575661 0.862534 -1.326168 1.164199 -1.004121 1.236323 -0.339586 1 -0.591188 -0.162782 0.043923 0.101241 0.120330 -1.201497 -0.108959 -0.033221 0.145400 -0.324831 2 0.114842 0.200597 2.792904 0.769636 -0.698700 -0.544161 0.838117 -0.013527 -0.623317 -1.461193 3 1.309628 -0.444961 0.323008 -1.409978 -0.697961 0.132321 -2.851494 1.233421 -1.540319 1.107052 4 0.436368 0.627954 -0.942830 0.448113 -0.030464 0.764961 -0.241905 -0.620992 1.238171 -0.127617 just pretty-printing list of 3 elements here.
in [43]: dfs in np.array_split(df,3,axis=1): ....: print dfs, "\n" ....: 0 1 2 3 0 -1.998163 -1.973708 0.461369 -0.575661 1 -0.591188 -0.162782 0.043923 0.101241 2 0.114842 0.200597 2.792904 0.769636 3 1.309628 -0.444961 0.323008 -1.409978 4 0.436368 0.627954 -0.942830 0.448113 4 5 6 0 0.862534 -1.326168 1.164199 1 0.120330 -1.201497 -0.108959 2 -0.698700 -0.544161 0.838117 3 -0.697961 0.132321 -2.851494 4 -0.030464 0.764961 -0.241905 7 8 9 0 -1.004121 1.236323 -0.339586 1 -0.033221 0.145400 -0.324831 2 -0.013527 -0.623317 -1.461193 3 1.233421 -1.540319 1.107052
Comments
Post a Comment