java - how to implement build a selector for HTML DOM elements by its class name using regexp -
i have question here. if have html file here.
<!doctype html public "-//w3c//dtd html 4.01 transitional//en" "http://www.w3.org/tr/html4/loose.dtd"> <html> <head> <title> new document </title> <meta name="generator" content="editplus"> <meta name="author" content=""> <meta name="keywords" content=""> <meta name="description" content=""> </head> <body> <h1>welcome homepage</h1> <p class="intro">my name donald.</p> <h1 class="intro"><p class="important">note important paragraph.</p> </h1> <div class="intro important"><p class="apple">i live in apple.</p></div> <div class="intro important">i apple.</p></div> <p>i live in duckburg.</p> </body> </html>
right want html element class name. if class name ".intro", should return:
my name donald. <p class="important">note important paragraph.</p>
if class name ".intro.important" should return:
note important paragraph.
if class name ".intro.important>.apple", should return:
i live in apple.
i know jquery has class selector function, want implement function. can use java regexp this? seems class name single string ok. if class name has child class name, make hard. 1 more question, can java dom structure of html?
can use java regexp this?
you can create regex selects nested content within tag specific class name. can give regex finds content within tag doesn't care of class name:
<([a-z][a-z0-9]*+)[^>]*>.*?</\\1>
but if class name has child class name, make hard.
in such case easier use java string.
can java dom structure of html?
yes, can done jsoup @ jsoup.org.
Comments
Post a Comment