mixed model on proportion data

classic Classic list List threaded Threaded
4 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

mixed model on proportion data

Mariano Devoto
Hello again. This is a query which follows up on the very useful answer
provided by Drew Tyre who suggested creating new variables and adding the
random term you'll see in the model below.
Quick reminder: the analysis aims at understanding how the presence of
"bodyguard" ants regulates leaf damage caused by lepidopteran herbivores on
a focal plant species.
For this, I set up fourteen pairs of plants in a nature reserve and then
assigned each plant in each pair to one of two treatments, either "with
ants" or "without ants", by applying a physical barrier at the base of the
plant.
In the following nine weeks I measured four response variables once a week
on a subsample of ten randomly chosen leaves of each plant. I thus have
nine repeated measures on each plant.

>From analyses done over the weekend I know excluding ants from the plants
(variable "treat" in the dataset) has a positive effect on the number of
butterfly eggs and larvae found on plants.
Now I want to know if this results in a reduced damage to the leafs
(variable "dam" in the data set), which is measured as the percentage of
the area eaten by herbivores.

My response variables are:
ants: number of ants (this was measured just to check the physical barrier
had worked OK)
eggs: number of butterfly eggs
larve: number of butterfly larvae
dam: percent leaf damage (percentage eaten by larvae)

My explanatory variables are:
treat: treatment (two levels: "con" and "sin" mean with and without ants,
respectively)
pair: plant pair
date

#I provide a workable example below
require(lme4); require(tidyverse); require(lubridate)

#read data from Google drive
id <- "0Bzd8I1jr8z_iU0h6R0hxaElseDA" # google file ID
gaston <- read.table(sprintf("
<a href="https://docs.google.com/uc?id=%s&export=download">https://docs.google.com/uc?id=%s&export=download", id), head=T)

#create variables following suggestion by Drew Tyre
gaston <- mutate(gaston,
                 plant = paste0(pair,treat),
                 date = dmy(date),
                 day = as.numeric((date - min(date))))

#After a few hours of scavenging through books, articles, blogs and R-lists
I ended up with the following model which turns the % damage into a
proportion and then logit transforms it so I can use lmer.

M1 <- lmer(logit(dam/100) ~ treat + day + (1|pair/plant), data=gaston)
summary(M1)

#Am I headed in the right direction? Any advice would be greatly
appreciated.

Best,

Mariano

*Dr. Mariano Devoto*

Profesor Adjunto - Cátedra de Botánica General, Facultad de Agronomía de la
UBA
Investigador Adjunto del CONICET

Av. San Martín 4453 - C1417DSE - C. A. de Buenos Aires - Argentina
+5411 4524-8069
http://www.agro.uba.ar/users/mdevoto/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mixed model on proportion data

Thierry Onkelinx
Dear Mariano,

The logit transformation will fail in case the damage is 0% or 100%. The
correct distribution for ratios between 0 and 1 is a beta distribution.
When 0 or 1 is present you'll need a zero and/or one-inflated beta
distribution. This is currently non available in lme4.
It looks like you measure the damage in steps of 5%. So you use the
binomial distribution as a workaround. That is: assume that each leave has
20 'trials' (20 * 5% = 100%), each 5% damage is one successful trial.

glmer(cbind(dam / 5, 20 - dam / 5) ~  treat + day + (1|pair/plant),
data=gaston, family = binomial)

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-03-06 21:17 GMT+01:00 Mariano Devoto <[hidden email]>:

> Hello again. This is a query which follows up on the very useful answer
> provided by Drew Tyre who suggested creating new variables and adding the
> random term you'll see in the model below.
> Quick reminder: the analysis aims at understanding how the presence of
> "bodyguard" ants regulates leaf damage caused by lepidopteran herbivores on
> a focal plant species.
> For this, I set up fourteen pairs of plants in a nature reserve and then
> assigned each plant in each pair to one of two treatments, either "with
> ants" or "without ants", by applying a physical barrier at the base of the
> plant.
> In the following nine weeks I measured four response variables once a week
> on a subsample of ten randomly chosen leaves of each plant. I thus have
> nine repeated measures on each plant.
>
> >From analyses done over the weekend I know excluding ants from the plants
> (variable "treat" in the dataset) has a positive effect on the number of
> butterfly eggs and larvae found on plants.
> Now I want to know if this results in a reduced damage to the leafs
> (variable "dam" in the data set), which is measured as the percentage of
> the area eaten by herbivores.
>
> My response variables are:
> ants: number of ants (this was measured just to check the physical barrier
> had worked OK)
> eggs: number of butterfly eggs
> larve: number of butterfly larvae
> dam: percent leaf damage (percentage eaten by larvae)
>
> My explanatory variables are:
> treat: treatment (two levels: "con" and "sin" mean with and without ants,
> respectively)
> pair: plant pair
> date
>
> #I provide a workable example below
> require(lme4); require(tidyverse); require(lubridate)
>
> #read data from Google drive
> id <- "0Bzd8I1jr8z_iU0h6R0hxaElseDA" # google file ID
> gaston <- read.table(sprintf("
> <a href="https://docs.google.com/uc?id=%s&export=download">https://docs.google.com/uc?id=%s&export=download", id), head=T)
>
> #create variables following suggestion by Drew Tyre
> gaston <- mutate(gaston,
>                  plant = paste0(pair,treat),
>                  date = dmy(date),
>                  day = as.numeric((date - min(date))))
>
> #After a few hours of scavenging through books, articles, blogs and R-lists
> I ended up with the following model which turns the % damage into a
> proportion and then logit transforms it so I can use lmer.
>
> M1 <- lmer(logit(dam/100) ~ treat + day + (1|pair/plant), data=gaston)
> summary(M1)
>
> #Am I headed in the right direction? Any advice would be greatly
> appreciated.
>
> Best,
>
> Mariano
>
> *Dr. Mariano Devoto*
>
> Profesor Adjunto - Cátedra de Botánica General, Facultad de Agronomía de la
> UBA
> Investigador Adjunto del CONICET
>
> Av. San Martín 4453 - C1417DSE - C. A. de Buenos Aires - Argentina
> +5411 4524-8069
> http://www.agro.uba.ar/users/mdevoto/
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mixed model on proportion data

essi
Hi Mariano,

You could use a log ratio between undamaged and damage proportions [1]
and obtain a response variable not constrained between 0 and 1. If so
you will have to replace zeros by a detection limit. [2] used 65% of
the detection limit as a basis for replacement, but there are many
options available for imputing zeros [3]. To make things simpler here,
I replace zeros by 1% and 100% by 99% (no 100% damaged though). Then I
compute a log ratio between undamaged and damaged. A positive log ratio
means the damage is less than 50%, and conversely with 0 meaning 50%
damaged.

dam_imp = gaston$dam
dam_imp[gaston$dam == 0] = 1
dam_imp[gaston$dam == 100] = 99
gaston$dam_lr = log((100-dam_imp) / dam_imp)
hist(gaston$dam)
hist(gaston$dam_lr)

There are many 0% and many 95%.

M1 <- lmer(dam_lr ~ treat + day + (1|pair/plant), data=gaston)
summary(M1)
hist(residuals(M1))

The distribution of residuals seems not so bad.

[1]
http://www.sediment.uni-goettingen.de/staff/tolosana/extra/CoDaNutshell.pdf
[2] http://dx.doi.org/10.1016/j.gexplo.2013.09.003
[3] https://cran.r-project.org/web/packages/zCompositions/index.html

--
Serge-Étienne Parent, ing., Ph.D.
Professeur adjoint
Département des sols et de génie agroalimentaire
Faculté des sciences de l'agriculture et de l'alimentation
Pavillon Paul-Comtois
2425, rue de l'Agriculture
Local 2301
Québec (Québec) G1V 0A6
Canada
+1 418-656-2131 poste 3798



Le Tue 7 Mar 2017 à 4:56, Thierry Onkelinx <[hidden email]>
a écrit :

> Dear Mariano,
>
> The logit transformation will fail in case the damage is 0% or 100%.
> The
> correct distribution for ratios between 0 and 1 is a beta
> distribution.
> When 0 or 1 is present you'll need a zero and/or one-inflated beta
> distribution. This is currently non available in lme4.
> It looks like you measure the damage in steps of 5%. So you use the
> binomial distribution as a workaround. That is: assume that each
> leave has
> 20 'trials' (20 * 5% = 100%), each 5% damage is one successful trial.
>
> glmer(cbind(dam / 5, 20 - dam / 5) ~  treat + day + (1|pair/plant),
> data=gaston, family = binomial)
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for
> Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no
> more
> than asking him to perform a post-mortem examination: he may be able
> to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
> 2017-03-06 21:17 GMT+01:00 Mariano Devoto <[hidden email]>:
>
>>  Hello again. This is a query which follows up on the very useful
>> answer
>>  provided by Drew Tyre who suggested creating new variables and
>> adding the
>>  random term you'll see in the model below.
>>  Quick reminder: the analysis aims at understanding how the presence
>> of
>>  "bodyguard" ants regulates leaf damage caused by lepidopteran
>> herbivores on
>>  a focal plant species.
>>  For this, I set up fourteen pairs of plants in a nature reserve and
>> then
>>  assigned each plant in each pair to one of two treatments, either
>> "with
>>  ants" or "without ants", by applying a physical barrier at the base
>> of the
>>  plant.
>>  In the following nine weeks I measured four response variables once
>> a week
>>  on a subsample of ten randomly chosen leaves of each plant. I thus
>> have
>>  nine repeated measures on each plant.
>>
>>  >From analyses done over the weekend I know excluding ants from the
>> plants
>>  (variable "treat" in the dataset) has a positive effect on the
>> number of
>>  butterfly eggs and larvae found on plants.
>>  Now I want to know if this results in a reduced damage to the leafs
>>  (variable "dam" in the data set), which is measured as the
>> percentage of
>>  the area eaten by herbivores.
>>
>>  My response variables are:
>>  ants: number of ants (this was measured just to check the physical
>> barrier
>>  had worked OK)
>>  eggs: number of butterfly eggs
>>  larve: number of butterfly larvae
>>  dam: percent leaf damage (percentage eaten by larvae)
>>
>>  My explanatory variables are:
>>  treat: treatment (two levels: "con" and "sin" mean with and without
>> ants,
>>  respectively)
>>  pair: plant pair
>>  date
>>
>>  #I provide a workable example below
>>  require(lme4); require(tidyverse); require(lubridate)
>>
>>  #read data from Google drive
>>  id <- "0Bzd8I1jr8z_iU0h6R0hxaElseDA" # google file ID
>>  gaston <- read.table(sprintf("
>>  <a href="https://docs.google.com/uc?id=%s&export=download">https://docs.google.com/uc?id=%s&export=download", id), head=T)
>>
>>  #create variables following suggestion by Drew Tyre
>>  gaston <- mutate(gaston,
>>                   plant = paste0(pair,treat),
>>                   date = dmy(date),
>>                   day = as.numeric((date - min(date))))
>>
>>  #After a few hours of scavenging through books, articles, blogs and
>> R-lists
>>  I ended up with the following model which turns the % damage into a
>>  proportion and then logit transforms it so I can use lmer.
>>
>>  M1 <- lmer(logit(dam/100) ~ treat + day + (1|pair/plant),
>> data=gaston)
>>  summary(M1)
>>
>>  #Am I headed in the right direction? Any advice would be greatly
>>  appreciated.
>>
>>  Best,
>>
>>  Mariano
>>
>>  *Dr. Mariano Devoto*
>>
>>  Profesor Adjunto - Cátedra de Botánica General, Facultad de
>> Agronomía de la
>>  UBA
>>  Investigador Adjunto del CONICET
>>
>>  Av. San Martín 4453 - C1417DSE - C. A. de Buenos Aires - Argentina
>>  +5411 4524-8069
>>  http://www.agro.uba.ar/users/mdevoto/
>>
>>          [[alternative HTML version deleted]]
>>
>>  _______________________________________________
>>  R-sig-ecology mailing list
>>  [hidden email]
>>  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mixed model on proportion data

Mariano Devoto
Thank you, Thierry and Serge-Étienne, for your kind replies. They are two
clever ways of dealing with the analysis. I tried both alternatives in a
model that includes an interaction term day*treatment and the solution
suggested by Thierry pics up a significant interaction, whereas
Serge-Étienne's doesn't. An interaction makes sense considering that damage
by herbivores other than larvae seemed to increase towards the end of the
experiment, probably blurring differences between treatments. Thanks again
for your help!

Best,

Mariano

*Dr. Mariano Devoto*

Profesor Adjunto - Cátedra de Botánica General, Facultad de Agronomía de la
UBA
Investigador Adjunto del CONICET

Av. San Martín 4453 - C1417DSE - C. A. de Buenos Aires - Argentina
+5411 4524-8069
http://www.agro.uba.ar/users/mdevoto/

On 7 March 2017 at 12:14, Serge-Étienne Parent <[hidden email]> wrote:

> Hi Mariano,
>
> You could use a log ratio between undamaged and damage proportions [1] and
> obtain a response variable not constrained between 0 and 1. If so you will
> have to replace zeros by a detection limit. [2] used 65% of the detection
> limit as a basis for replacement, but there are many options available for
> imputing zeros [3]. To make things simpler here, I replace zeros by 1% and
> 100% by 99% (no 100% damaged though). Then I compute a log ratio between
> undamaged and damaged. A positive log ratio means the damage is less than
> 50%, and conversely with 0 meaning 50% damaged.
>
> dam_imp = gaston$dam
> dam_imp[gaston$dam == 0] = 1
> dam_imp[gaston$dam == 100] = 99
> gaston$dam_lr = log((100-dam_imp) / dam_imp)
> hist(gaston$dam)
> hist(gaston$dam_lr)
>
> There are many 0% and many 95%.
>
> M1 <- lmer(dam_lr ~ treat + day + (1|pair/plant), data=gaston)
> summary(M1)
> hist(residuals(M1))
>
> The distribution of residuals seems not so bad.
>
> [1] http://www.sediment.uni-goettingen.de/staff/tolosana/
> extra/CoDaNutshell.pdf
> [2] http://dx.doi.org/10.1016/j.gexplo.2013.09.003
> [3] https://cran.r-project.org/web/packages/zCompositions/index.html
>
> --
> Serge-Étienne Parent, ing., Ph.D.
> Professeur adjoint
> Département des sols et de génie agroalimentaire
> Faculté des sciences de l'agriculture et de l'alimentation
> Pavillon Paul-Comtois
> 2425, rue de l'Agriculture
> Local 2301
> Québec (Québec) G1V 0A6
> Canada
> +1 418-656-2131 poste 3798
>
>
>
> Le Tue 7 Mar 2017 à 4:56, Thierry Onkelinx <[hidden email]> a
> écrit :
>
> Dear Mariano, The logit transformation will fail in case the damage is 0%
> or 100%. The correct distribution for ratios between 0 and 1 is a beta
> distribution. When 0 or 1 is present you'll need a zero and/or one-inflated
> beta distribution. This is currently non available in lme4. It looks like
> you measure the damage in steps of 5%. So you use the binomial distribution
> as a workaround. That is: assume that each leave has 20 'trials' (20 * 5% =
> 100%), each 5% damage is one successful trial. glmer(cbind(dam / 5, 20 -
> dam / 5) ~ treat + day + (1|pair/plant), data=gaston, family = binomial)
> Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg /
> team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht
> Belgium To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be able
> to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural
> of anecdote is not data. ~ Roger Brinner The combination of some data and
> an aching desire for an answer does not ensure that a reasonable answer can
> be extracted from a given body of data. ~ John Tukey 2017-03-06 21:17
> GMT+01:00 Mariano Devoto <[hidden email]>:
>
> Hello again. This is a query which follows up on the very useful answer
> provided by Drew Tyre who suggested creating new variables and adding the
> random term you'll see in the model below. Quick reminder: the analysis
> aims at understanding how the presence of "bodyguard" ants regulates leaf
> damage caused by lepidopteran herbivores on a focal plant species. For
> this, I set up fourteen pairs of plants in a nature reserve and then
> assigned each plant in each pair to one of two treatments, either "with
> ants" or "without ants", by applying a physical barrier at the base of the
> plant. In the following nine weeks I measured four response variables once
> a week on a subsample of ten randomly chosen leaves of each plant. I thus
> have nine repeated measures on each plant. >From analyses done over the
> weekend I know excluding ants from the plants (variable "treat" in the
> dataset) has a positive effect on the number of butterfly eggs and larvae
> found on plants. Now I want to know if this results in a reduced damage to
> the leafs (variable "dam" in the data set), which is measured as the
> percentage of the area eaten by herbivores. My response variables are:
> ants: number of ants (this was measured just to check the physical barrier
> had worked OK) eggs: number of butterfly eggs larve: number of butterfly
> larvae dam: percent leaf damage (percentage eaten by larvae) My explanatory
> variables are: treat: treatment (two levels: "con" and "sin" mean with and
> without ants, respectively) pair: plant pair date #I provide a workable
> example below require(lme4); require(tidyverse); require(lubridate) #read
> data from Google drive id <- "0Bzd8I1jr8z_iU0h6R0hxaElseDA" # google file
> ID gaston <- read.table(sprintf(" https://docs.google.com/uc?id=
> %s&export=download", id), head=T) #create variables following suggestion
> by Drew Tyre gaston <- mutate(gaston, plant = paste0(pair,treat), date =
> dmy(date), day = as.numeric((date - min(date)))) #After a few hours of
> scavenging through books, articles, blogs and R-lists I ended up with the
> following model which turns the % damage into a proportion and then logit
> transforms it so I can use lmer. M1 <- lmer(logit(dam/100) ~ treat + day +
> (1|pair/plant), data=gaston) summary(M1) #Am I headed in the right
> direction? Any advice would be greatly appreciated. Best, Mariano *Dr.
> Mariano Devoto* Profesor Adjunto - Cátedra de Botánica General, Facultad de
> Agronomía de la UBA Investigador Adjunto del CONICET Av. San Martín 4453 -
> C1417DSE - C. A. de Buenos Aires - Argentina +5411 4524-8069
> http://www.agro.uba.ar/users/mdevoto/ [[alternative HTML version
> deleted]] _______________________________________________ R-sig-ecology
> mailing list [hidden email] https://stat.ethz.ch/mailman/
> listinfo/r-sig-ecology
>
> [[alternative HTML version deleted]] _______________________________________________
> R-sig-ecology mailing list [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Loading...