data type conversion - Pandas not detecting the datatype of a Series properly -


i'm running bit frustrating pandas series. have dataframe several columns, numeric , non-numeric data. reason, however, pandas thinks of numeric columns non-numeric, , ignores them when try run aggregating functions .describe(). problem, since pandas raises errors when try run analyses on these columns.

i've copied commands terminal example. when slice 'nd_offset' column (the problematic column in question), pandas tags dtype of object. yet, when call .describe(), pandas tags dtype float64 (which should be). 'dwell' column, on other hand, works should, pandas giving float64 both times.

does know why i'm getting behavior?

in [83]: subject.phrases['nd_offset'][:3] out[83]:  submittime 2014-06-02 22:44:44    0.3607049 2014-06-02 22:44:44    0.2145484 2014-06-02 22:44:44    0.4031347 name: nd_offset, dtype: object  in [84]: subject.phrases['nd_offset'].describe() out[84]:  count     1255.000000 unique     432.000000 top          0.242308 freq        21.000000 dtype: float64  in [85]: subject.phrases['dwell'][:3] out[85]:  submittime 2014-06-02 22:44:44    111 2014-06-02 22:44:44     81 2014-06-02 22:44:44    101 name: dwell, dtype: float64  in [86]: subject.phrases['dwell'].describe() out[86]:  count    1255.000000 mean       99.013546 std        30.109327 min        21.000000 25%        81.000000 50%        94.000000 75%       111.000000 max       291.000000 dtype: float64 

and when use .groupby function group data attribute (when these series part of dataframe), dataerror: no numeric types aggregate error when try call .agg(np.mean) on group. when try call .agg(np.sum) on same data, on other hand, things work fine.

it's bit bizarre -- can explain what's going on?

thank you!

it might because nd_offset column (what call a below) contains non-numeric value such empty string. example,

import numpy np import pandas pd  df = pd.dataframe({'a': [0.36, ''], 'b': [111, 81]}) print(df['a'].describe()) # count     2.00 # unique    2.00 # top       0.36 # freq      1.00 # dtype: float64  try:     print(df.groupby(['b']).agg(np.mean)) except exception err:     print(err)     # no numeric types aggregate  print(df.groupby(['b']).agg(np.sum)) #         # b         # 81        # 111  0.36 

aggregation using np.sum works because

in [103]: np.sum(pd.series([''])) out[103]: '' 

whereas np.mean(pd.series([''])) raises

typeerror: not convert  numeric 

to debug problem, try find non-numeric value(s) using:

for val in df['a']:     if not isinstance(val, float):         print('error: val = {!r}'.format(val)) 

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -