Python Pandas Reading CSV file with Specific Line Terminators -
i trying create dataframe below sample csv i've been given getting error tokenizing data. c error: eof inside string starting @ line 0. haven't had practise treating bad lines learn best way handle this. have attempted many different options in read_csv such error_bad_line=false has not worked either.
cparsererror: error tokenizing data. c error: eof inside string starting @ line 0
i guessing line terminators of ," causing issue , guessing best way loop through each line , process came below generator different , hoping close. learn how use generator , yield also.
sample data:
"usnc3255","27","us","nc","lands end","72305006","knjm","knca","knkt","t72305006","","","ncc031","ncz095","","545","28594","america/new_york","34.65266","-77.07661","7","rdu","893727"," "usnc3256","27","us","nc","landsdown","72314058","keho","kakh","kipj","t72314058","","","ncc045","ncz068","sc007","517","28150","america/new_york","35.29374","-81.46537","797","clt","317845","
i have crafted below removes last 2 characters not sure hot produce dataframe lines:
def big_table_generator(filename): open(filename, 'rt') f: line in f: yield line[:-3] gen = big_table_generator('../data/test_sun_file.csv') pd.dataframe(gen)
i had similar error. fixed using option quoting=csv.quote_none in read_csv.
for example:
df = pd.read_csv(csvfile, header = none, delimiter="\t", quoting=csv.quote_none, encoding='utf-8')
some info why in second comment here: https://github.com/pydata/pandas/issues/5500
Comments
Post a Comment