dataframe - keep most common factor levels in R -
i used "dummies" package create 42 dummy variables 42 levels of factor variable in data-frame. want keep 5 dummies represent 5 common factor levels. used:
counts <- colsums(dummy_variables) rank <- sort(counts) to figure out levels are, want able reference common ones , keep them in data frame. new r - can't figure out syntax this.
filter out top 5 variables, , subset columns.
rank <- sort(counts)[(length(counts)-4):length(counts)] dummy_variables <- dummy_variables[names(dummy_variables) %in% names(rank)] or in 1 line commenter suggested,
dummy_variables[names(dummy_variables) %in% names(tail(sort(colsums(dummy_variables)),5))]
Comments
Post a Comment