python - How to support multiple file formats and field delimiters? -
a file contents in table form can exported in @ least 3 formats (utf-8, utf-16le, ascii), have columns tab-separated, pilcrow separated, or other, , have quotes/thorns/etc. around each item. following function reads in table utf-8, separated pilcrows, , each item surrounded thorns.
def read_app_dat(app_export): """ reads , parses dat exported app assumes delimiters concordance. args: app_export: str, file path dat exported app returns: dictionary id mapped list first tuple uri id """ app_dict = {} f = codecs.open(app_export, encoding='utf-8') line in f: each_row = re.sub(r'\xfe', "", line).split("\x14") if "id" in each_row[0] or "uri" in each_row[1]: pass else: app_dict[each_row[0]] = each_row[1] return app_dict
as it's written, need define each row differently each scenario.
each_row = re.sub(r'\xfe', "", line).split("\x14")
that's not pythonic thing do. how better deal separators, in case pilcrows , thorns, call them parameter? codecs module has been helpful far.
thank time.
Comments
Post a Comment