python - pandas: groupby and unstack to create feature vector for classification -

April 15, 2015

i have pandas dataframe displaying users' performance on test questions. looks this:

userid     questionid   correct -------------------------------   1             1          1   1             5          1   1             6          0   1             8          0   1             10         1   2             3          1   2             5          1   2             6          0   .             .          .   .             .          .   .             .          .

i want make feature vector each user saying whether or not got each question right, looks this:

questionid     1     2      3     4     5     6     ... userid       -------------------------------------------------   1            1    nan   nan    nan    1     0     ...   2           nan   nan    1     nan    1     0     ...   .           ...   .           ...   .

each user gets shown subset of questions, it's sparse matrix.

how can make above table in pandas?

i wanted below - grouping userid , questionid , unstacking, i'm not sure how should work.

df = df.groupby(['user_id','question_id']) df.unstack()

thanks help.

you're looking pivot:

in [11]: df.pivot(values='correct', index='userid', columns='questionid') out[11]:  questionid  1   3   5   6   8   10 userid                             1            1 nan   1   0   0   1 2          nan   1   1   0 nan nan

you might reindex columns (based on questions) if you're not surjective.

in [12]: _.reindex_axis(np.arange(1, 10), 1) out[12]:           1   2   3   4  5  6   7   8   9 userid                                   1        1 nan nan nan  1  0 nan   0 nan 2      nan nan   1 nan  1  0 nan nan nan

note: answer suggested pivot_table (which uses aggfunc on repeated values, default mean, , that's not want here - @u2ef1 points out), offers other additional features on pivot little slower:

df.pivot_table(values='correct', rows='userid', cols='questionid')

i have feeling in older versions of pandas, pivot sensitive nan had use pivot_table...

Search This Blog

My

python - pandas: groupby and unstack to create feature vector for classification -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

postgresql - how to get points from linestring postgis -