Spark Streaming not distributing task to nodes on cluster -
i have 2 node standalone cluster spark stream processing. below sample code demonstrate process executing.
sparkconf.setmaster("spark://rsplws224:7077") val ssc=new streamingcontext() println(ssc.sparkcontext.master) val indstream = ssc.receiverstream //batch of 500 ms have 1 sec latency val filtereddstream = indstream.filter // filtering unwanted tuples val keydstream = filtereddstream.map // converting pair dstream val statestream = keydstream .updatestatebykey //updating state history statestream.checkpoint(milliseconds(2500)) // remove long lineage , meterilizing state stream statestream.count() val withhistory = keydstream.join(statestream) //joining state wit input stream further processing val alertstream = withhistory.filter // decision taken comparing history state , current tuple data alertstream.foreach // notification other system
my problem spark not distributing state rdd multiple nodes or not distributing task other node , causing high latency in response, input load around 100,000 tuples per seconds.
i have tried below things nothing working
1) spark.locality.wait
1 sec
2) reduce memory allocated executer process check weather spark distribute rdd or task if goes beyond memory limit of first node (m1) drive running.
3) increased spark.streaming.concurrentjobs 1 (default) 3
4) have checked in streaming ui storage there around 20 partitions state dstream rdd located on local node m1.
if run sparkpi 100000 spark able utilize node after few seconds (30-40) sure cluster configuration fine.
edit
one thing have noticed rdd if set storage level memory_and_disk_ser_2 in app ui storage shows memory serialized 1x replicated
if not hitting cluster , jobs run locally means spark master in sparkconf
set local uri not master uri.
Comments
Post a Comment