python - pandas: groupby and unstack to create feature vector for classification -


i have pandas dataframe displaying users' performance on test questions. looks this:

userid     questionid   correct -------------------------------   1             1          1   1             5          1   1             6          0   1             8          0   1             10         1   2             3          1   2             5          1   2             6          0   .             .          .   .             .          .   .             .          .    

i want make feature vector each user saying whether or not got each question right, looks this:

questionid     1     2      3     4     5     6     ... userid       -------------------------------------------------   1            1    nan   nan    nan    1     0     ...   2           nan   nan    1     nan    1     0     ...   .           ...   .           ...   .             

each user gets shown subset of questions, it's sparse matrix.

how can make above table in pandas?

i wanted below - grouping userid , questionid , unstacking, i'm not sure how should work.

df = df.groupby(['user_id','question_id']) df.unstack() 

thanks help.

you're looking pivot:

in [11]: df.pivot(values='correct', index='userid', columns='questionid') out[11]:  questionid  1   3   5   6   8   10 userid                             1            1 nan   1   0   0   1 2          nan   1   1   0 nan nan 

you might reindex columns (based on questions) if you're not surjective.

in [12]: _.reindex_axis(np.arange(1, 10), 1) out[12]:           1   2   3   4  5  6   7   8   9 userid                                   1        1 nan nan nan  1  0 nan   0 nan 2      nan nan   1 nan  1  0 nan nan nan 

note: answer suggested pivot_table (which uses aggfunc on repeated values, default mean, , that's not want here - @u2ef1 points out), offers other additional features on pivot little slower:

df.pivot_table(values='correct', rows='userid', cols='questionid') 

i have feeling in older versions of pandas, pivot sensitive nan had use pivot_table...


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -