Generating very large 2D-array in Python? -
i'd generate large 2d-array (or, in other terms, matrix) using list of lists. each element should float.
so, give example, let's assume have following code:
import numpy np n = 32000 def largemat(): m = [] in range(n): l = list(np.ones(n)) m.append(l) if % 1000 == 0: print return m m = largemat()
i have 12gb of ram, code reaches 10000-th line of matrix, ram full. now, if i'm not wrong, each float 64-bit large (or 8 byte), total occupied ram should be:
32000 * 32000 * 8 / 1 mb = 8192 mb
why python fill whole ram , start allocate swap?
python not store list items in compact form, lists require pointers next item, etc. side effect of having data type allows deletes, inserts, etc. simple two-way linked list usage 2 pointers plus value, in 64-bit machine 24 octets per float item in list. in practice implementation not stupid, there still overhead.
if want have concise format, i'd suggest using numpy.array
take many bytes think it'd take (plus small overhead).
edit oops. not necessarily. explanation wrong, suggestion valid. numpy
right tool numpy.array
exists reason. however, problem else. computer run procedure though takes lot of time (appr. 2 minutes). also, quitting python after takes long time (actually, hung). memory use of python process (as reported top
) peaks @ 10 000 mb , falls down below 9 000 mb. allocated numpy
arrays not garbage collected fast.
but raw data size in machine:
>>> import sys >>> l = [0.0] * 1000000 >>> sys.getsizeof(l) 8000072
so there seems fixed overhead of 72 octets per list.
>>> listoflists = [ [1.0*i] * 1000000 in range(1000)] >>> sys.getsizeof(listoflists) 9032 >>> sum([sys.getsizeof(l) l in listoflists]) 8000072000
so, expected.
on other hand, reserving , filling long list of lists takes while (about 10 s). also, quitting python takes while. same numpy:
>>> = numpy.empty((1000,1000000)) >>> a[:] = 1.0 >>> a.nbytes 8000000000
(the byte count not entirely reliable, object takes space metadata, etc. there has pointer start of memory block, data type, array shape, etc.)
this takes less time. creation of array instantaneous, inserting numbers takes maybe second or two. allocating , freeing lot of small memory chunks time consuming , while not cause fragmentation problems in 64-bit machine, still easier allocate big chunk of data.
if have lot of data can put array, need reason not using numpy
.
Comments
Post a Comment