regex - Using multiple regexes to capture matching nested xml tags -


suppose have xml file contains tags nested inside themselves, eg

<tag>one<tag>two</tag>one</tag> 

from this page, have 2 examples of regex expressions don't match string, eg get

<tag>one<tag>two</tag> 

which not balanced. according google, it's not possible find regex parse html correctly, eg here or here.

entire html parsing not possible regular expressions, since depends on matching opening , closing tag not possible regexps.

regular expressions can match regular languages html context-free language. thing can regexps on html heuristics not work on every condition. should possible present html file matched wrongly regular expression.

that's nice clear-cut theoretical answer, got me thinking: would possible programmatically, using multiple regexes and/or loops?

here's simple recursive descent xml parser, i'm making right rough , ready, writing in ruby didn't specify language. not use in production (or anywhere really, curiosities sake):

string = "<tag>one<other_tag>two</other_tag>one</tag>" regex_xml_parser = -> string {   stuff_before = []   matches = []   stuff_after = []   while string =~ />/     stuff_before << string[ /^[^<]*/ ]     string.sub!(/^[^<]*/, '')     matches << string.match(/<([^>]+)>(.*)<\/\1>/)     string.sub!(/<([^>]+)>(.*)<\/\1>/, '')     stuff_after << string[ /[^>]*$/ ]     string.sub!(/[^>]*$/, '')     p [ stuff_after, "stuff_after" ]   end   values = stuff_before + stuff_after + [string]   return_value = values.clone   matching_nodes = matches.map { |match| make_matches[match]}   {values: return_value.select { |x| x != "" },   nodes: matching_nodes} }  make_matches = -> match_item {   {match_item[1] => regex_xml_parser[match_item[2]]} }  regex_xml_parser[string] 

remember, building parser here, think goes without saying using parser exists easier.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -