Spark Streaming not distributing task to nodes on cluster -


i have 2 node standalone cluster spark stream processing. below sample code demonstrate process executing.

sparkconf.setmaster("spark://rsplws224:7077")  val ssc=new streamingcontext() println(ssc.sparkcontext.master) val indstream = ssc.receiverstream  //batch of 500 ms have 1 sec latency  val filtereddstream = indstream.filter  // filtering unwanted tuples  val keydstream = filtereddstream.map    // converting pair dstream  val statestream = keydstream .updatestatebykey //updating state history   statestream.checkpoint(milliseconds(2500))  // remove long lineage , meterilizing state stream  statestream.count()  val withhistory = keydstream.join(statestream) //joining state wit input stream further processing  val alertstream = withhistory.filter // decision taken comparing history state , current tuple data alertstream.foreach // notification other system  

my problem spark not distributing state rdd multiple nodes or not distributing task other node , causing high latency in response, input load around 100,000 tuples per seconds.

i have tried below things nothing working

1) spark.locality.wait 1 sec

2) reduce memory allocated executer process check weather spark distribute rdd or task if goes beyond memory limit of first node (m1) drive running.

3) increased spark.streaming.concurrentjobs 1 (default) 3

4) have checked in streaming ui storage there around 20 partitions state dstream rdd located on local node m1.

if run sparkpi 100000 spark able utilize node after few seconds (30-40) sure cluster configuration fine.

edit

one thing have noticed rdd if set storage level memory_and_disk_ser_2 in app ui storage shows memory serialized 1x replicated

if not hitting cluster , jobs run locally means spark master in sparkconf set local uri not master uri.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -