python - Label encoding across multiple columns in scikit-learn -


i'm trying use scikit-learn's labelencoder encode pandas dataframe of string labels. dataframe has many (50+) columns, want avoid creating labelencoder object each column; i'd rather have 1 big labelencoder objects works across all columns of data.

throwing entire dataframe labelencoder creates below error. please bear in mind i'm using dummy data here; in actuality i'm dealing 50 columns of string labeled data, need solution doesn't reference columns name.

import pandas sklearn import preprocessing   df = pandas.dataframe({'pets':['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'], 'owner':['champ', 'ron', 'brick', 'champ', 'veronica', 'ron'], 'location':['san_diego', 'new_york', 'new_york', 'san_diego', 'san_diego', 'new_york']}) le = preprocessing.labelencoder()  le.fit(df) traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 103, in fit     y = column_or_1d(y, warn=true)   file "/users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 306, in column_or_1d     raise valueerror("bad input shape {0}".format(shape)) valueerror: bad input shape (6, 3) 

any thoughts on how around problem?

you can though,

df.apply(labelencoder().fit_transform) 

edit:

since answer on year ago, , generated many upvotes (including bounty), should extend further.

for inverse_transform , transform, have little bit of hack.

from collections import defaultdict d = defaultdict(labelencoder) 

with this, retain columns labelencoder dictionary.

# encoding variable fit = df.apply(lambda x: d[x.name].fit_transform(x))  # inverse encoded fit.apply(lambda x: d[x.name].inverse_transform(x))  # using dictionary label future data df.apply(lambda x: d[x.name].transform(x)) 

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -