python - Fitting a Poisson distribution to data in statsmodels -
i trying fit poisson distribution data using statsmodels confused results getting , how use library.
my real data series of numbers think should able describe having poisson distribution plus outliers robust fit data.
however testing purposes, create dataset using scipy.stats.poisson
samp = scipy.stats.poisson.rvs(4,size=200)
so fit using statsmodels think need have constant 'endog'
res = sm.poisson(samp,np.ones_like(samp)).fit()
print res.summary()
poisson regression results ============================================================================== dep. variable: y no. observations: 200 model: poisson df residuals: 199 method: mle df model: 0 date: fri, 27 jun 2014 pseudo r-squ.: 0.000 time: 14:28:29 log-likelihood: -404.37 converged: true ll-null: -404.37 llr p-value: nan ============================================================================== coef std err z p>|z| [95.0% conf. int.] ------------------------------------------------------------------------------ const 1.3938 0.035 39.569 0.000 1.325 1.463 ==============================================================================
ok, doesn't right, if do
res.predict()
i array of 4.03 (which mean test sample). basically, firstly confused how interpret result statsmodel , secondly should being doing different if i'm interested in robust parameter estimation of distribution rather fitting trends how should go doing that?
edit should have given more detail in order answer second part of question.
i have event occurs random time after starting time. when plot histogram of delay times many events, see distribution looks scaled poisson distribution plus several outlier points caused issues in underlying system. wanted find expected time delay dataset, excluding outliers. if not outliers, find mean time. suppose exclude them manually thought find more exacting.
edit on further reflection, considering other distributions instead of sticking poissonion , details of issue distraction original question i've left them here anyway.
the poisson model, other models in generalized linear model families or other discrete data, assumes have transformation bounds prediction in appropriate range.
poisson works nonnegative numbers , transformation exp
, model estimated assumes expected value of observation, conditional on explanatory variables is
e(y | x) = exp(x dot params)
to lambda parameter of poisson distribution, need use exp, i.e.
>>> np.exp(1.3938) 4.0301355071650118
predict
default, can request linear part (x dot params)
keyword argument.
btw: statsmodels' controversial terminology endog y exog x (has x in it) (http://statsmodels.sourceforge.net/devel/endog_exog.html )
outlier robust estimation
the answer last part of question there no outlier robust estimation in python poisson or other count models, far know.
for overdispersed data, variance larger mean, can use negativebinomial regression. outliers in poisson have use r/rpy or manual trimming of outliers. outlier identification based on 1 of standardized residuals.
it not available in statsmodels time, unless contributing this.
Comments
Post a Comment