split data based on cumulative value of column in r -
i have following type of data:
myd <- data.frame (group = c(rep(1, 15), rep(2, 15)), distance = c(0, 4, 8,9,11, 14,18,19,23, 24, 29,30,35,40, 43, 0, 8,9,9,12, 13,14,15,16, 18, 23,24,28, 29, 30), var1 = c(1:15, 11:25), var2 = 1:30, var3 = 1:30) myd group distance var1 var2 var3 1 1 0 1 1 1 2 1 4 2 2 2 3 1 8 3 3 3 4 1 9 4 4 4 5 1 11 5 5 5 6 1 14 6 6 6 7 1 18 7 7 7 8 1 19 8 8 8 9 1 23 9 9 9 10 1 24 10 10 10 11 1 29 11 11 11 12 1 30 12 12 12 13 1 35 13 13 13 14 1 40 14 14 14 15 1 43 15 15 15 16 2 0 11 16 16 17 2 8 12 17 17 18 2 9 13 18 18 19 2 9 14 19 19 20 2 12 15 20 20 21 2 13 16 21 21 22 2 14 17 22 22 23 2 15 18 23 23 24 2 16 19 24 24 25 2 18 20 25 25 26 2 23 21 26 26 27 2 24 22 27 27 28 2 28 23 28 28 29 2 29 24 29 29 30 2 30 25 30 30
i have multiple group levels (than 2 shown above). each distance (say mile posts in highway) starts 0 , cumulative end group. want split data (make bins) in such way each group approximately of distance 10. resulting split data like:
data group1subset1 group distance var1 var2 var3 1 1 0 1 1 1 2 1 4 2 2 2 3 1 8 3 3 3 4 1 9 4 4 4 data group1subset2 5 1 11 5 5 5 6 1 14 6 6 6 7 1 18 7 7 7 8 1 19 8 8 8 data group1subset3 9 1 23 9 9 9 10 1 24 10 10 10 11 1 29 11 11 11 12 1 30 12 12 12 data group1subset4 13 1 35 13 13 13 14 1 40 14 14 14 data group1subset5 15 1 43 15 15 15 ===== data group2subset1 16 2 0 11 16 16 17 2 8 12 17 17 18 2 9 13 18 18 19 2 9 14 19 19 data group2subset2 20 2 12 15 20 20 21 2 13 16 21 21 22 2 14 17 22 22 23 2 15 18 23 23 24 2 16 19 24 24 25 2 18 20 25 25 data group2subset3 26 2 23 21 26 26 27 2 24 22 27 27 28 2 28 23 28 28 29 2 29 24 29 29 30 2 30 25 30 30
i need automize process real data big. please suggest how can it?
i'd use cut
accomplish this:
maxd <- (max(myd$distance) %/% 10 * 10) + 10 transform(myd,cutdist = cut(distance, breaks = seq(0,maxd, = 10), include.lowest = true)) group distance var1 var2 var3 cumdist cutdist 1 1 0 1 1 1 0 [0,10] 2 1 4 2 2 2 4 [0,10] 3 1 8 3 3 3 12 [0,10] 4 1 9 4 4 4 21 [0,10] 5 1 11 5 5 5 32 (10,20] 6 1 14 6 6 6 46 (10,20] 7 1 18 7 7 7 64 (10,20] 8 1 19 8 8 8 83 (10,20] 9 1 23 9 9 9 106 (20,30] 10 1 24 10 10 10 130 (20,30] 11 1 29 11 11 11 159 (20,30] 12 1 30 12 12 12 189 (20,30] 13 1 35 13 13 13 224 (30,40] 14 1 40 14 14 14 264 (30,40] 15 1 43 15 15 15 307 (40,50] 16 2 0 11 16 16 307 [0,10] 17 2 8 12 17 17 315 [0,10] 18 2 9 13 18 18 324 [0,10] 19 2 9 14 19 19 333 [0,10] 20 2 12 15 20 20 345 (10,20] 21 2 13 16 21 21 358 (10,20] 22 2 14 17 22 22 372 (10,20] 23 2 15 18 23 23 387 (10,20] 24 2 16 19 24 24 403 (10,20] 25 2 18 20 25 25 421 (10,20] 26 2 23 21 26 26 444 (20,30] 27 2 24 22 27 27 468 (20,30] 28 2 28 23 28 28 496 (20,30] 29 2 29 24 29 29 525 (20,30] 30 2 30 25 30 30 555 (20,30]
there's no need calculate cumulative distance, since want keep them in groups of multiples of 10
Comments
Post a Comment