data type conversion - Pandas not detecting the datatype of a Series properly -
i'm running bit frustrating pandas series. have dataframe several columns, numeric , non-numeric data. reason, however, pandas thinks of numeric columns non-numeric, , ignores them when try run aggregating functions .describe(). problem, since pandas raises errors when try run analyses on these columns.
i've copied commands terminal example. when slice 'nd_offset' column (the problematic column in question), pandas tags dtype of object. yet, when call .describe(), pandas tags dtype float64 (which should be). 'dwell' column, on other hand, works should, pandas giving float64 both times.
does know why i'm getting behavior?
in [83]: subject.phrases['nd_offset'][:3] out[83]: submittime 2014-06-02 22:44:44 0.3607049 2014-06-02 22:44:44 0.2145484 2014-06-02 22:44:44 0.4031347 name: nd_offset, dtype: object in [84]: subject.phrases['nd_offset'].describe() out[84]: count 1255.000000 unique 432.000000 top 0.242308 freq 21.000000 dtype: float64 in [85]: subject.phrases['dwell'][:3] out[85]: submittime 2014-06-02 22:44:44 111 2014-06-02 22:44:44 81 2014-06-02 22:44:44 101 name: dwell, dtype: float64 in [86]: subject.phrases['dwell'].describe() out[86]: count 1255.000000 mean 99.013546 std 30.109327 min 21.000000 25% 81.000000 50% 94.000000 75% 111.000000 max 291.000000 dtype: float64 and when use .groupby function group data attribute (when these series part of dataframe), dataerror: no numeric types aggregate error when try call .agg(np.mean) on group. when try call .agg(np.sum) on same data, on other hand, things work fine.
it's bit bizarre -- can explain what's going on?
thank you!
it might because nd_offset column (what call a below) contains non-numeric value such empty string. example,
import numpy np import pandas pd df = pd.dataframe({'a': [0.36, ''], 'b': [111, 81]}) print(df['a'].describe()) # count 2.00 # unique 2.00 # top 0.36 # freq 1.00 # dtype: float64 try: print(df.groupby(['b']).agg(np.mean)) except exception err: print(err) # no numeric types aggregate print(df.groupby(['b']).agg(np.sum)) # # b # 81 # 111 0.36 aggregation using np.sum works because
in [103]: np.sum(pd.series([''])) out[103]: '' whereas np.mean(pd.series([''])) raises
typeerror: not convert numeric to debug problem, try find non-numeric value(s) using:
for val in df['a']: if not isinstance(val, float): print('error: val = {!r}'.format(val))
Comments
Post a Comment