python - Label encoding across multiple columns in scikit-learn -
i'm trying use scikit-learn's labelencoder encode pandas dataframe of string labels. dataframe has many (50+) columns, want avoid creating labelencoder object each column; i'd rather have 1 big labelencoder objects works across all columns of data.  
throwing entire dataframe labelencoder creates below error.  please bear in mind i'm using dummy data here; in actuality i'm dealing 50 columns of string labeled data, need solution doesn't reference columns name. 
import pandas sklearn import preprocessing   df = pandas.dataframe({'pets':['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'], 'owner':['champ', 'ron', 'brick', 'champ', 'veronica', 'ron'], 'location':['san_diego', 'new_york', 'new_york', 'san_diego', 'san_diego', 'new_york']}) le = preprocessing.labelencoder()  le.fit(df) traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 103, in fit     y = column_or_1d(y, warn=true)   file "/users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 306, in column_or_1d     raise valueerror("bad input shape {0}".format(shape)) valueerror: bad input shape (6, 3) any thoughts on how around problem?
you can though,
df.apply(labelencoder().fit_transform) edit:
since answer on year ago, , generated many upvotes (including bounty), should extend further.
for inverse_transform , transform, have little bit of hack.
from collections import defaultdict d = defaultdict(labelencoder) with this, retain columns labelencoder dictionary.
# encoding variable fit = df.apply(lambda x: d[x.name].fit_transform(x))  # inverse encoded fit.apply(lambda x: d[x.name].inverse_transform(x))  # using dictionary label future data df.apply(lambda x: d[x.name].transform(x)) 
Comments
Post a Comment