data type conversion - Pandas not detecting the datatype of a Series properly -
i'm running bit frustrating pandas series. have dataframe several columns, numeric , non-numeric data. reason, however, pandas thinks of numeric columns non-numeric, , ignores them when try run aggregating functions .describe()
. problem, since pandas raises errors when try run analyses on these columns.
i've copied commands terminal example. when slice 'nd_offset' column (the problematic column in question), pandas tags dtype of object
. yet, when call .describe()
, pandas tags dtype float64
(which should be). 'dwell' column, on other hand, works should, pandas giving float64
both times.
does know why i'm getting behavior?
in [83]: subject.phrases['nd_offset'][:3] out[83]: submittime 2014-06-02 22:44:44 0.3607049 2014-06-02 22:44:44 0.2145484 2014-06-02 22:44:44 0.4031347 name: nd_offset, dtype: object in [84]: subject.phrases['nd_offset'].describe() out[84]: count 1255.000000 unique 432.000000 top 0.242308 freq 21.000000 dtype: float64 in [85]: subject.phrases['dwell'][:3] out[85]: submittime 2014-06-02 22:44:44 111 2014-06-02 22:44:44 81 2014-06-02 22:44:44 101 name: dwell, dtype: float64 in [86]: subject.phrases['dwell'].describe() out[86]: count 1255.000000 mean 99.013546 std 30.109327 min 21.000000 25% 81.000000 50% 94.000000 75% 111.000000 max 291.000000 dtype: float64
and when use .groupby
function group data attribute (when these series part of dataframe), dataerror: no numeric types aggregate
error when try call .agg(np.mean)
on group. when try call .agg(np.sum)
on same data, on other hand, things work fine.
it's bit bizarre -- can explain what's going on?
thank you!
it might because nd_offset
column (what call a
below) contains non-numeric value such empty string. example,
import numpy np import pandas pd df = pd.dataframe({'a': [0.36, ''], 'b': [111, 81]}) print(df['a'].describe()) # count 2.00 # unique 2.00 # top 0.36 # freq 1.00 # dtype: float64 try: print(df.groupby(['b']).agg(np.mean)) except exception err: print(err) # no numeric types aggregate print(df.groupby(['b']).agg(np.sum)) # # b # 81 # 111 0.36
aggregation using np.sum
works because
in [103]: np.sum(pd.series([''])) out[103]: ''
whereas np.mean(pd.series(['']))
raises
typeerror: not convert numeric
to debug problem, try find non-numeric value(s) using:
for val in df['a']: if not isinstance(val, float): print('error: val = {!r}'.format(val))
Comments
Post a Comment