python - Fitting a Poisson distribution to data in statsmodels -


i trying fit poisson distribution data using statsmodels confused results getting , how use library.

my real data series of numbers think should able describe having poisson distribution plus outliers robust fit data.

however testing purposes, create dataset using scipy.stats.poisson

samp = scipy.stats.poisson.rvs(4,size=200) 

so fit using statsmodels think need have constant 'endog'

res = sm.poisson(samp,np.ones_like(samp)).fit() 

print res.summary()

                          poisson regression results ============================================================================== dep. variable:                      y   no. observations:                  200 model:                        poisson   df residuals:                      199 method:                           mle   df model:                            0 date:                fri, 27 jun 2014   pseudo r-squ.:                   0.000 time:                        14:28:29   log-likelihood:                -404.37 converged:                       true   ll-null:                       -404.37                                         llr p-value:                       nan ==============================================================================                  coef    std err          z      p>|z|      [95.0% conf. int.] ------------------------------------------------------------------------------ const          1.3938      0.035     39.569      0.000         1.325     1.463 ============================================================================== 

ok, doesn't right, if do

res.predict() 

i array of 4.03 (which mean test sample). basically, firstly confused how interpret result statsmodel , secondly should being doing different if i'm interested in robust parameter estimation of distribution rather fitting trends how should go doing that?

edit should have given more detail in order answer second part of question.

i have event occurs random time after starting time. when plot histogram of delay times many events, see distribution looks scaled poisson distribution plus several outlier points caused issues in underlying system. wanted find expected time delay dataset, excluding outliers. if not outliers, find mean time. suppose exclude them manually thought find more exacting.

edit on further reflection, considering other distributions instead of sticking poissonion , details of issue distraction original question i've left them here anyway.

the poisson model, other models in generalized linear model families or other discrete data, assumes have transformation bounds prediction in appropriate range.

poisson works nonnegative numbers , transformation exp, model estimated assumes expected value of observation, conditional on explanatory variables is

 e(y | x) = exp(x dot params) 

to lambda parameter of poisson distribution, need use exp, i.e.

>>> np.exp(1.3938) 4.0301355071650118 

predict default, can request linear part (x dot params) keyword argument.

btw: statsmodels' controversial terminology endog y exog x (has x in it) (http://statsmodels.sourceforge.net/devel/endog_exog.html )

outlier robust estimation

the answer last part of question there no outlier robust estimation in python poisson or other count models, far know.

for overdispersed data, variance larger mean, can use negativebinomial regression. outliers in poisson have use r/rpy or manual trimming of outliers. outlier identification based on 1 of standardized residuals.

it not available in statsmodels time, unless contributing this.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -