c# - How to get text from html nodes and solve character encoding issue? -


i'm trying innertext in site http://www.hurriyet.com.tr/yazarlar/22933964.asp

with htmlagilitypack. html structure

<div class="detailtext"> <span class="yzrarticledate">30 mart 2014</span> <h1 class="yazararticletitle">31 mart sabahı için acil ihtiyaç listesi</h1> <p></p><p><p  >akıl.<br  />sağduyu.<br  />barış.<br  /> Özgürlük.<br  />kardeşlik.<br  />vicdan.<br  />huzur............. 

and current code

string htmlcontent = getsource(s); htmlagilitypack.htmldocument document = new htmlagilitypack.htmldocument(); document.loadhtml(htmlcontent); var noa =document.documentnode.selectsinglenode("*//div[@class='detailtext']").innertext; 

problem gets heading , date. mean "30 mart 2014" , "31 mart sabahı için acil ihtiyaç listesi".

i want part begins

<*p><*/p><*p><p*  >akıl.<*br " 

i tried different variation

var noa =document.documentnode.selectsinglenode("*//div[@class='detailtext']").innerhtml;      var noa = document.documentnode.selectsinglenode("*//div[@class='detailtext']").nextsibling.nextsibling.innertext; var noa = document.documentnode.selectsinglenode("*//div[@class='detailtext']").lastsibling.innertext; 

my second question ; if manage text text ll faced character encoding problem, how can fix this

the easiest solution remove nodes don't want , innerhtml/innertext covered in remove html node htmldocument :htmlagilitypack.

var noa =document.documentnode.selectsinglenode("*//div[@class='detailtext']") noa.removechild(noa.selectsinglenode("span"));  // remove rest too... var result = noa.innertext; 

there should no encoding problem unless site reports invalid encoding c# strings unicode (utf16).


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -

jquery - Keeping Kendo Datepicker in min/max range -