r - How to subset a dataset such that the test set contains -


i built linear regression model (lm.full) , i'm trying test model on test data set. i'm running issue due feature / predictor many unique values when try predict based on test data. troublesome feature cbsa (core based statistical area).

the train , test have same unique values. i'm not sure issue is, because if each of levels of factor variable fit in training model, think should able predict value test.

i divided data here test , training sets:

sample.size<-floor(0.95*nrow(tvwm)) # make sure seeds different set.seed(15) tvwm_train_ind <- sample(seq_len(nrow(tvwm)), size = sample.size) tvwm_train <- tvwm[tvwm_train_ind,] tvwm_test <- tvwm[-tvwm_train_ind,] 

and here prediction:

> predict(object=lm.full, newdata=tvwm_test, type = "response")  error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) :    factor factor(cbsa_name) has new levels boston-cambridge-newton, ma-nh, detroit-warren-livonia, mi, virginia beach-norfolk-newport news, va-nc 

try

all(levels(tvwm_test$cbsa_name) %in% levels(tvwm_train$cbsa_name))  all(levels(tvwm_train$cbsa_name) %in% levels(tvwm_test$cbsa_name)) 

and make sure both true. or, gregor suggested below in comment, can in 1 statement:

identical(levels(tvwm_test$cbsa_name), levels(tvwm_train$cbsa_name)) 

if not both true, , both training set , test set have same factor levels in data, run following reset levels:

tvwm_train$cbsa_name <- factor(tvwm_train$cbsa_name) tvwm_test$cbsa_name <- factor(tvwm_test$cbsa_name)  

Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

jsf - How to ajax update an item in the footer of a PrimeFaces dataTable? -

jquery - Keeping Kendo Datepicker in min/max range -