twitter - How to check condition for observations that belong to a specific category in R? -
i’m sorry if question confusing, couldn’t find better way express it.
i have dataset tweets, user ids , dates when tweets created.
userid tweet date 1132622143 bla bla 2014-04-23 1132622143 bla bla 2014-05-23 1132622143 ... ... 1132622143 1132622145
i want make subset of dataset contains data users posted on twitter recently, in may or later, users have @ least 1 tweet date > 2014-05-01 (i want have recent , old tweets these active users in new dataset).
i think need create function this:
for each distinct userid find rows same userid value , put them in list each row in list if there exists row date>2014-05-01 select rows userid
i'm not sure if correct logic , if yes, how program this. grateful help.
ave
of use here.
here's generalization of problem, can extend actual data.
first, sample data. i'm assuming know how convert "date" variable actual time/date variable.
mydf <- data.frame( id = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4), text = c("a", "b", "c", "a", "b", "c", "d", "e", "a", "b", "c", "a"), time = c(1, 1, 2, 2, 3, 4, 4, 5, 3, 5, 5, 1) )
with ave
, can create logical vector group. here, check see if "time" greater or equal 4 within each set of "id"s. if there any, returns true
rows of "id".
that information can used directly extract relevant rows.
as.logical(with(mydf, ave(time, id, fun = function(x) any(x > 4)))) # [1] false false false true true true true true true true true false mydf[as.logical(with(mydf, ave(time, id, fun = function(x) any(x > 4)))), ] # id text time # 4 2 2 # 5 2 b 3 # 6 2 c 4 # 7 2 d 4 # 8 2 e 5 # 9 3 3 # 10 3 b 5 # 11 3 c 5
Comments
Post a Comment