java - How to remove < and > in XMLthat is part of the XML message -


i have xml follows:

<starttag>     <myvaluetag>and value contains < bracket makes xml invalid</myvaluetag> </starttag> 

the xml contains '<' character makes xml invalid.

now easiest way fix source of xml unfortunately don't have control on xml creation. has messages “ value < 10” suppose “less than”.

is there anyway how can check xml things , escape characters it?

i tried looking @ post guy indicated should use jtidy. when tried it doesn't remove <:

tidy tidy = new tidy(); tidy.setinputencoding("utf-8"); tidy.setoutputencoding("utf-8"); tidy.setwraplen(integer.max_value); tidy.setprintbodyonly(true); tidy.setxmlout(true); tidy.setsmartindent(true); bytearrayinputstream inputstream = new bytearrayinputstream(data.getbytes("utf-8")); bytearrayoutputstream outputstream = new bytearrayoutputstream(); tidy.parsedom(inputstream, outputstream); 

the fact xml invalid means aren't going able use valid xml parser read , fix it. if can't authors of software writes file fix bug, have come application specific solution.

for example, if knew stray < char occurs in text of <myvalue> element, , if knew no other elements occur children of <myvalue>, pretty easy write program recognizes start , end tags, , replaces < characters occur between them &#60;

of course, if problem isn't simple, solution won't simple; hopefully, can make simpler solving general problem xml.

after you've fixed few files "by hand," stop , ask yourself, "how did know < char needed escaped?" write program operates on same knowledge.


Comments