javascript - Node.js process.exit() does not exit cleanly, and the dangers of async fs.writeFile -


tl;dr:

calling asynchronous fs.writefile asynchronous events (and perhaps plain old loop) , calling process.exit() opens files fails flush data files. callbacks given writefile not chance run before process exits. is expected behavior?

regardless of whether process.exit() failing perform cleanup, call question whether should node's duty @ least attempt work file writes schedule, because may case deallocation of huge buffers depends on writing them out disk.

details

i have conceptually basic piece of node.js code performs transformation on large data file. happens lidar sensor's data file, should not relevant. dataset quite large owing nature of existence. structurally simple. sensor sends data on network. task script produce separate file each rotating scan. details of logic irrelevant well.

the basic idea use node_pcap read huge .pcap file using method given task node_pcap, "offline mode".

what means instead of asynchronously catching network packets appear, appears rather dense stream of asynchronous events representing packets "generated".

so, main structure of program consists of few global state variables, , single callback pcap session. initialize globals, assign callback function pcap session. callback packet event work.

part of work writing out large array of data files. once in while packet indicate condition means should move on writing next data file. increment data filename index, , call fs.writefile() again begin writing new file. since writing only, seems natural let node decide when time begin writing.

basically, both fs.writefilesync , fs.writefile should end calling os's write() system call on respective files in asynchronous fashion. not bother me because writing, asynchronous nature of write can affect access patterns not matter me since not access. difference in writefilesync forces node event loop block until such time write() syscall completes.

as program progresses, when use writefile (the js-asynchronous version), hundreds of output files created, no data written them. not one. first data file still open when hundredth data file created.

this conceptually fine. reason node busy crunching new data, , happily holding on increasing number of file descriptors in order write files' data in. meanwhile has keep inside of memory eventual contents of files. run out, let's ignore ram size limitation moment. bad thing happen here running out of ram , crashing program. node smart , realize needs schedule file writes , can free bunch of buffers...

if stick statement in middle of call process.exit(), expect node clean , flush pending writefile writes before exiting.

but node not this.

changing writefilesync fixes problem obviously. changing , truncating input data such process.exit() not explicitly called results in files getting written (and completion callback given writefile run) @ end when input events done pumping.

this seems indicate me cleanup being improperly performed process.exit().

question: there alternative exiting event loop cleanly in middle? note had manually truncate large input file, because terminating process.exit() caused file writes not complete.

this node v0.10.26 installed while ago on os x homebrew.

continuing thought process, behavior seeing here calls question fundamental purpose of using writefile. it's supposed improve things able flexibly write file whenever node deems fit. however, apparently if node's event loop pumped hard enough, "get behind" on workload.

it event loop has inbox , outbox. in analogy, outbox represents temp variables containing data writing files. assumption lazy productive programmer me wants make inbox , outbox interfaces can use , flexible , system manage me. if feed inbox @ high rate, node can't keep up, , start piling data outbox without having time flush because 1 reason or another, scheduling such incoming events have processed first. in turn defers garbage collection of outbox's contents, , quite deplete system's ram. quite hard-to-find bug when pattern used in complex system. glad took modular approach project.

i mean, yes, clearly, obviously, beyond doubt answer use writefilesync every single time write files node.

what, then, value in having writefile? @ point trading potential small increase in parallel processing increased possibility if (for reason) machine's processing capability drops (whether it's thermal throttling or os level scheduling or don't pay iaas bills on time, or other reason), can potentially lead snowballing memory explosion?

perhaps getting @ core of solving rather complex problems inherent in streaming data processing systems, , cannot realistically expect event-based processing model step , elegantly solve these problems automatically. maybe should satisfied gets me half of way robust. maybe projecting wishes onto , unreasonable me assume node needs less deterministically "improve" scheduling of event loop.

i'm not node expert seems problem can simplified using streams. streams let pause , resume , provide other neat functionality. suggest take @ @ chapter 9 of professional nodejs pedro teixeira. can find online copy reading purposes. provides detailed , explained examples on how use streams read , write data , prevent potential memory leaks , loss of data.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -