c# - Single RegEx expressiong to decode CSV with embedded dobule quotes and Commas -
i have lots of csv data trying decode using regex. tried build on existing code base other people/projects hit , dont want risk breaking data flows refactoring class much. so, wondering if possible decode text single regex (which how class works currently):
f1,f2,f3,f4,f5,f6,f7 ,"clean text","with,embedded,commas.","with""embedded""double""quotes",,"6.1",
first row header. if save xxx.csv , open in excel, decompiles read (note space between fields cell breaks):
f1 f2 f3 f4 f5 f6 f7 clean text with,embedded,commas. with"embedded"double"quotes 6.1
but when try in .net, stuck on regex. have this:
string regexp = "(((?<x>(?=[,\\r\\n]+))|\"(?<x>([^\"]|\"\")+)\"|(?<x>[^,\\r\\n]+)),?)";
you can see in action here:
which results in this:
<start> clean text with,embedded,commas. with""embedded""double""quotes 6.1 <end>
this close not replace escaped double-double quotes single-double quote excel does. not come regex worked better. can done?
maybe can somehow manage match string using regular-expression-conditionals following constructors:
- if-then sentence
(?(?=regex)then|else)
- multiple if-then sentences
(?(?=condition)(then1|then2|then3)|(else1|else2|else3))
i came following pattern in order match body of text: ([^\,]+(?(?=[^\,])([^\"]+")|([^\,]+,)))
, however, need put effort in order create completly matching expression text or end using file parser. if so, can take @ filehelpers, pretty neat library parsing text files.
sources:
Comments
Post a Comment