python - Stumped with a regular expression -
i have lines in file so:
l_12_interval j_10_int length:100 min. :-2120803808 class :character 1st qu.: -992076064 mode :character median : 263935522 mean : -33801580 3rd qu.: 896644601 max. : 1890084945 na's :53
i want parse out i'll call last "major column":
j_10_int min. :-2120803808 1st qu.: -992076064 median : 263935522 mean : -33801580 3rd qu.: 896644601 max. : 1890084945 na's :53
the columns aligned can't depend on the last major column start. heading not problem , trying compose regular expression python's re.sub()
function strip off preceding label. thought including label , colon in regular expression subexpression , replace matching expression subexpression. easier said done! closest i've gotten:
>>> line ' length:100 min. :-2120803808' >>> re.sub(r"^.*([a-z1-9][a-z1-9.' ]*:)", r"\1", line, re.ignorecase) 'n. :-2120803808' >>>
i thought toss whitespace before beginning of subexpression that's not working:
>>> re.sub(r"^.*\s([a-z1-9][a-z1-9.' ]*:)", r"\1", line, re.ignorecase) ' length:100 min. :-2120803808' >>> re.sub(r"^.* ([a-z1-9][a-z1-9.' ]*:)", r"\1", line, re.ignorecase) ' length:100 min. :-2120803808' >>> re.sub(r"^.*( [a-z1-9][a-z1-9.' ]*:)", r"\1", line, re.ignorecase) ' length:100 min. :-2120803808' >>> re.sub(r"^.*(\w[a-z1-9][a-z1-9.' ]*:)", r"\1", line, re.ignorecase) 'in. :-2120803808'
as can see, tried pulling whitespace inside subexpression... acceptable. i'm still not closer complete solution.
does have suggestions?
this 1 bases quite few assumptions format of names , of values of first column, works example:
^(?:[a-z][a-z]+\s*:[a-z0-9]*|)\s*([a-z0-9].*)$
probably needs bit more work based on know formats of different names , values.
Comments
Post a Comment