python - Group m groups with regex -
i have regex replaces letter n (\w{1,}) -- meaning word can stand in letter n. want make group out of m instances of (\w{1,}) -- i.e add parens around m instances of (\w{1,})
, this:
"(" + "(\w{1,}), (\w{1,}), (\w{1,}) .... (\w{1,})" + ")", (\w{1,}) occurs m times
how can that? know like
re.sub(\w{1,}){2,}, inputstring, "(" + many instances of (\w{1,}) pattern able match + "))
how express, in regex, pattern matched m times? (so can replace, surrounded parenthesis).
if understand question correctly, you're writing 1 regex produce regex. is, you're using regex replacement build pattern regex search. input includes kind of wildcard value (e.g. "n"
) need replace create search pattern. in search pattern, adjacent wildcard values should combined single capturing group (so "n n bacon n"
give 2 capturing groups, 1 first 2 words , 1 more last). think can if first capture adjacent wildcards, substitute individual instances within larger group.
here's code that:
import re def make_pattern(template, wildcard="n"): replacement_pattern = r"\b{0}\b(?:\s+{0}\b)*".format(wildcard) def replacement_func(match): return "(" + re.sub(wildcard, r"\w+", match.group()) + ")" return re.sub(replacement_pattern, replacement_func, template)
the \b
escape sequences in replacement_pattern
necessary prevent occurrences of wildcard
being treated such if part of larger word (like "n"
@ end of "bacon"
). closure replacement_func
uses additional regex replacement swap out wildcards, while preserving spacing between them (so template "n n n n"
match differently "n n n n"
). suppose regular string replacement (with str.replace
) instead, if wanted to. couldn't resist 3 levels of regexing in 1 solution.
here's example run:
>>> make_pattern("n n bacon n") '(\\w+\\s+\\w+) bacon (\\w+)' >>> re.findall(make_pattern("n n bacon n"), "spam spam eggs bacon , spam") [('spam eggs', 'and')]
Comments
Post a Comment