Re: mixed model with proportion data

classic Classic list List threaded Threaded
2 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mixed model with proportion data

Dixon, Philip M [STAT]
Mariano,

There is a huge and important difference between the two approaches suggested for your data.  The log ratio of proportions (i.e. the empirical logit of the Yes proportion) estimates the residual variance.  The binomial model assumes the residual variance is determined by the arbitrary (and made-up) sample size of 20 "tries" per response, in combination with the estimated mean proportions.  To see the arbitrariness, if you don't already, re-express your proportions out of 200, instead of 20, because 0/200, 10/200, ... 200/200 also give your observed responses.  The coefficient estimates will be the approximately same but their variances will not.  (If you didn't have additional random effects in the model, the coefficient estimates would be exactly the same but the variances would be 1/10's those from N=20).

If you are going to use the binomial GLM, I believe you must add overdispersion to the model.  Either as an individual random effect, or by using a quasibinomial response distribution.  Overdispersion is not necessary for the log proportion response because the residual error variance conceptually estimates that overdispersion.

Philip

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mixed model with proportion data

Cade, Brian
Mariano:  Just as a follow up on Phil Dixon's comment that is I think spot
on, you probably are better off modeling the response as the logit of the
proportions.  But to more easily deal with true zeros or ones, and to avoid
the back-transformation bias associated with means on nonlinear
transformations like the logit, you might want to consider estimating your
models with logistic quantile regression (see Bottai et al. 2010.
Statistics in Medicine 29: 309-317) rather than some mean regression model.
This is easily done with a fixed-effects model from the quantreg package.
There also are mixed-effects variants of quantile regression but I've not
tried to use them in the logistic quantile framework.  Some other poster
suggested beta regression, which also might be reasonable.  In my
experience, the logistic quantile regression model has greater flexibility
to handle true zeros and ones and odd dispersion patterns than beta
regression.  And of course, you can back-transform the quantile estimates
in the logit scale to the proportion scale without bias.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  [hidden email] <[hidden email]>
tel:  970 226-9326


On Wed, Mar 8, 2017 at 6:20 AM, Dixon, Philip M [STAT] <[hidden email]>
wrote:

> Mariano,
>
> There is a huge and important difference between the two approaches
> suggested for your data.  The log ratio of proportions (i.e. the empirical
> logit of the Yes proportion) estimates the residual variance.  The binomial
> model assumes the residual variance is determined by the arbitrary (and
> made-up) sample size of 20 "tries" per response, in combination with the
> estimated mean proportions.  To see the arbitrariness, if you don't
> already, re-express your proportions out of 200, instead of 20, because
> 0/200, 10/200, ... 200/200 also give your observed responses.  The
> coefficient estimates will be the approximately same but their variances
> will not.  (If you didn't have additional random effects in the model, the
> coefficient estimates would be exactly the same but the variances would be
> 1/10's those from N=20).
>
> If you are going to use the binomial GLM, I believe you must add
> overdispersion to the model.  Either as an individual random effect, or by
> using a quasibinomial response distribution.  Overdispersion is not
> necessary for the log proportion response because the residual error
> variance conceptually estimates that overdispersion.
>
> Philip
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Loading...