Hi all,
I am seeking advice on how to analyse my unbalanced, multi-nested multivariate data set. I realise there are many questions in this email and I would be willing to consult with someone privately on this if it is an option. I am using abundance data for insect species (I have the same experimental design for reptiles, and annual plants as well). I use Simpson's diversity as a univariate response and species composition as a multivariate response. Experimental Design: Plots are divided into three habitat types A, B, C based on vegetation. Each habitat has 3 or 4 replicate control plots that are repeat sampled (one sample a year always in spring). In addition B and C have 3 or 4 treatment (vegetation removal) plots. 'S' plots are disturbed( trampling and off-road vehicles) but the disturbance is unquantified and I don't know the pre-disturbance habitat type. The total data set is across a 12 year period, but the sampling was unbalanced for various reasons. I attach a png of the metadata of the plots over time to show the unbalanced sampling. https://www.dropbox.com/s/7vxvo3x9lnywdbm/insects_years.gif?dl=0 Each year the sampling across plots was conducted at the same time, and so plots are comparable within a year. In general, As were sampled every year and are considered the 'target' habitat. B's were sampled in the earlier years and C's later on, and in the last couple of years all three types were sampled together. The treatments on B & C were conducted using different methods and in different years, so in principle I should probably test each separately just against their own control pairs. However the hypothesis for both treatments is that treated plots will be more similar in composition to A plots than the paired control plots (if possible I want to check if they become more or less similar to A over time). So in that regard I thought there might be a way to include all habitat types in one analysis? Perhaps using time as "number of years since treatment" rather than a date? (Although I have no environmental data with which to standardise). S dunes have no "pre-treatment" but the hypothesis is that S plots will be most similar to A compared to all other (treated and control) plot types. I am not sure how to include these plots in a testable model. Questions regarding the design: >> Can I use all the habitat types in one model (preferable!) or can I only test B treated against B control etc? >>Must I remove data to create blocks of sampled or is 'all data useful'? e.g. A's were the only plots sampled in 2010- should I remove that year completely? e.g. C1 & C5 were sampled in 2005 while the rest were not until 2011,- should I only include data from 2011 onwards for all C's? e.g. Should I remove A4 completely since its only sampled in the last few years or its still useable? >> Can I include S in the analyses in order to compare them with B and C treated plots in relation to A plots? I have already analysed my first research question Q1) To understanding the differences in diversity and composition across control habitats, irrelevant of time. The analysis approach I used for this is: i) Mixed effect model: GLMM PQL (Penalised Quasi-Likelihood) using MASS R package. Diversity ~ fixed effect = habitat type + random effect = year , Family = poisson ii) Pairwise permutational multivariate analysis of variance (MANOVA) with R code based on the adonis2 function, to determine if the composition among habitats (visualised in NMDS) were significantly different from each other. iii) RDA with habitat as explanatory and year as covariate to test explained variance. Now I am trying to expand this analysis to include a temporal element to answer Q2 & Q3 Q2) to understand the trends in diversity and composition over time in control habitats Q3) to understand the impact of treatment on diversity and composition (over time if possible?) The addition of time into the analyses is a bit difficult for me to work out, due to the multi-nested and unbalanced design of the data; I am not sure what methods to use to include time as a variable for looking at a) diversity and b) composition Questions regarding analyses: >1> Is there an appropriate mixed effect model I can use to look at differences in diversity on different control plots and include time as a factor (rather than as a random effect)? >2> How can I appropriately test if different habitats exhibit different trends in composition over time (ie. a multivariate approach). For example, I might expect that A's will remain relatively stable over time, while C's will exhibit high turnover (fluctuation) across years, or that B's will slowly shift composition to be more similar to C. How can I test these directional hypotheses? I thought to create a Principle Response Curve to see relative differences over time, but as far as I understand, I cannot use a permutation test here due to the unbalance design. I also thought to take the scores on the first RDA axis as a univariate measure, and then plot this over time.. but I'm not sure if its an appropriate approach or how to then test this statistically. I also thought to try and create some measure of "compositional temporal stability" for each plot and test this using ANOVA (like some sort of "multivariate Coefficient of Variation"). One such measure could be distance of each plot-year from the habitat centroid in ordination space but again, I'm not sure if this is an appropriate approach. Any suggestions for other measures would be welcome. >3> Finally can I extend these temporal analysis (of diversity and of composition) to look at response trends to treatments, given the structure of my data? I would like to see if I can detect some form of resistance to, or recovery from, the treatment over time ... But if not, can I test the overall treatment affect and use time as a random effect like i did for my first question? Thank you for any suggestions of analyses and/or ways to subset the data that would allow me to answer these questions. With kind regards, Tania PhD Student Geo-Ecology Lab Ben Gurion University _______________________________________________ R-sig-ecology mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-ecology |
Hm, this is a big job. The optimal solution is to see if your university
offers a statistical consulting service. I don't see any big conceptual problems, but getting a good analysis will take a bit of time and exploration. I think you can probably 'just' use a GLMM, but getting the right GLMM and deciding what a good model is will take time and some poking of the data. Anyway, some answers below, which may (or may not) help. On 02/27/2017 04:27 PM, Tania Bird wrote: > Hi all, > > I am seeking advice on how to analyse my unbalanced, multi-nested > multivariate data set. I realise there are many questions in this > email and I would be willing to consult with someone privately on this > if it is an option. > > I am using abundance data for insect species (I have the same > experimental design for reptiles, and annual plants as well). I use > Simpson's diversity as a univariate response and species composition > as a multivariate response. > > Experimental Design: > Plots are divided into three habitat types A, B, C based on vegetation. > Each habitat has 3 or 4 replicate control plots that are repeat > sampled (one sample a year always in spring). > In addition B and C have 3 or 4 treatment (vegetation removal) plots. > 'S' plots are disturbed( trampling and off-road vehicles) but the > disturbance is unquantified and I don't know the pre-disturbance > habitat type. > > The total data set is across a 12 year period, but the sampling was > unbalanced for various reasons. I attach a png of the metadata of the > plots over time to show the unbalanced sampling. > https://www.dropbox.com/s/7vxvo3x9lnywdbm/insects_years.gif?dl=0 > > Each year the sampling across plots was conducted at the same time, > and so plots are comparable within a year. > In general, As were sampled every year and are considered the 'target' > habitat. B's were sampled in the earlier years and C's later on, and > in the last couple of years all three types were sampled together. > > The treatments on B & C were conducted using different methods and in > different years, so in principle I should probably test each > separately just against their own control pairs. However the > hypothesis for both treatments is that treated plots will be more > similar in composition to A plots than the paired control plots (if > possible I want to check if they become more or less similar to A over > time). > > So in that regard I thought there might be a way to include all > habitat types in one analysis? Perhaps using time as "number of years > since treatment" rather than a date? (Although I have no environmental > data with which to standardise). S dunes have no "pre-treatment" but > the hypothesis is that S plots will be most similar to A compared to > all other (treated and control) plot types. I am not sure how to > include these plots in a testable model. > > Questions regarding the design: > >>> Can I use all the habitat types in one model (preferable!) or can I only test B treated against B control etc? expect to have a Treatment by Habitat interaction. There may also be some sort of interaction with time (either as Time, or Time Since Treatment) >>> Must I remove data to create blocks of sampled or is 'all data useful'? > e.g. A's were the only plots sampled in 2010- should I remove that > year completely? > e.g. C1 & C5 were sampled in 2005 while the rest were not until 2011,- > should I only include data from 2011 onwards for all C's? > e.g. Should I remove A4 completely since its only sampled in the last > few years or its still useable? No, you should be able to use all of the data, you just have to be a bit careful about how you model Time. >>> Can I include S in the analyses in order to compare them with B and C treated plots in relation to A plots? Yes, in principal. It just doesn't have a Habitat:Treatment interaction. > I have already analysed my first research question > Q1) To understanding the differences in diversity and composition > across control habitats, irrelevant of time. > > The analysis approach I used for this is: > i) Mixed effect model: GLMM PQL (Penalised Quasi-Likelihood) using > MASS R package. > Diversity ~ fixed effect = habitat type + random effect = year , > Family = poisson There are better tools than glmmPQL nowadays. Have a look at the lme4 package, for example. > ii) Pairwise permutational multivariate analysis of variance (MANOVA) > with R code based on the adonis2 function, to determine if the > composition among habitats (visualised in NMDS) were significantly > different from each other. > > iii) RDA with habitat as explanatory and year as covariate to test > explained variance. > > Now I am trying to expand this analysis to include a temporal element > to answer Q2 & Q3 > Q2) to understand the trends in diversity and composition over time in > control habitats > Q3) to understand the impact of treatment on diversity and composition > (over time if possible?) > > The addition of time into the analyses is a bit difficult for me to > work out, due to the multi-nested and unbalanced design of the data; I > am not sure what methods to use to include time as a variable for > looking at a) diversity and b) composition be set up, depending a bit on the data. > Questions regarding analyses: >> 1> Is there an appropriate mixed effect model I can use to look at differences in diversity on different control plots and include time as a factor (rather than as a random effect)? There are probably several. :-) For example you could include Time as a continuous covariate, alongside the random effect. You could also just include it as a fixed effect, but that could get messy. >> 2> How can I appropriately test if different habitats exhibit different trends in composition over time (ie. a multivariate approach). For example, I might expect that A's will remain relatively stable over time, while C's will exhibit high turnover (fluctuation) across years, or that B's will slowly shift composition to be more similar to C. How can I test these directional hypotheses? > I thought to create a Principle Response Curve to see relative > differences over time, but as far as I understand, I cannot use a > permutation test here due to the unbalance design. I also thought to > take the scores on the first RDA axis as a univariate measure, and > then plot this over time.. but I'm not sure if its an appropriate > approach or how to then test this statistically. > > I also thought to try and create some measure of "compositional > temporal stability" for each plot and test this using ANOVA (like some > sort of "multivariate Coefficient of Variation"). One such measure > could be distance of each plot-year from the habitat centroid in > ordination space but again, I'm not sure if this is an appropriate > approach. Any suggestions for other measures would be welcome. doubly hierarchical models that you could try, but you might not want to go there. >> 3> Finally can I extend these temporal analysis (of diversity and of composition) to look at response trends to treatments, given the structure of my data? > I would like to see if I can detect some form of resistance to, or > recovery from, the treatment over time ... But if not, can I test the > overall treatment affect and use time as a random effect like i did > for my first question? > > Thank you for any suggestions of analyses and/or ways to subset the > data that would allow me to answer these questions. Essentially you need some structure on the time covariate. You could start by using time since treatment as a factor, and plot those estimates. Again, there should be a bit of playing around with the model, to see what makes sense. Bob -- Bob O'Hara NOTE NEW ADDRESS!!! Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Journal of Negative Results - EEB: www.jnr-eeb.org _______________________________________________ R-sig-ecology mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-ecology |
Many thanks for your useful advice Bob!
Unfortunately I did try to use my University's statistical consulting department, but they were not able to provide advice at this level for either the multivariate or mixed effect models. :( I would be happy to consult with someone else if anyone if offering such a service? Tania Bird On 27 February 2017 at 18:14, Bob OHara <[hidden email]> wrote: > Hm, this is a big job. The optimal solution is to see if your university > offers a statistical consulting service. I don't see any big conceptual > problems, but getting a good analysis will take a bit of time and > exploration. I think you can probably 'just' use a GLMM, but getting the > right GLMM and deciding what a good model is will take time and some poking > of the data. > > Anyway, some answers below, which may (or may not) help. > > > On 02/27/2017 04:27 PM, Tania Bird wrote: >> >> Hi all, >> >> I am seeking advice on how to analyse my unbalanced, multi-nested >> multivariate data set. I realise there are many questions in this >> email and I would be willing to consult with someone privately on this >> if it is an option. >> >> I am using abundance data for insect species (I have the same >> experimental design for reptiles, and annual plants as well). I use >> Simpson's diversity as a univariate response and species composition >> as a multivariate response. >> >> Experimental Design: >> Plots are divided into three habitat types A, B, C based on vegetation. >> Each habitat has 3 or 4 replicate control plots that are repeat >> sampled (one sample a year always in spring). >> In addition B and C have 3 or 4 treatment (vegetation removal) plots. >> 'S' plots are disturbed( trampling and off-road vehicles) but the >> disturbance is unquantified and I don't know the pre-disturbance >> habitat type. >> >> The total data set is across a 12 year period, but the sampling was >> unbalanced for various reasons. I attach a png of the metadata of the >> plots over time to show the unbalanced sampling. >> https://www.dropbox.com/s/7vxvo3x9lnywdbm/insects_years.gif?dl=0 >> >> Each year the sampling across plots was conducted at the same time, >> and so plots are comparable within a year. >> In general, As were sampled every year and are considered the 'target' >> habitat. B's were sampled in the earlier years and C's later on, and >> in the last couple of years all three types were sampled together. >> >> The treatments on B & C were conducted using different methods and in >> different years, so in principle I should probably test each >> separately just against their own control pairs. However the >> hypothesis for both treatments is that treated plots will be more >> similar in composition to A plots than the paired control plots (if >> possible I want to check if they become more or less similar to A over >> time). >> >> So in that regard I thought there might be a way to include all >> habitat types in one analysis? Perhaps using time as "number of years >> since treatment" rather than a date? (Although I have no environmental >> data with which to standardise). S dunes have no "pre-treatment" but >> the hypothesis is that S plots will be most similar to A compared to >> all other (treated and control) plot types. I am not sure how to >> include these plots in a testable model. >> >> Questions regarding the design: >> >>>> Can I use all the habitat types in one model (preferable!) or can I only >>>> test B treated against B control etc? > > Yes you can. you obviously need a Treatment effect, and you should expect to > have a Treatment by Habitat interaction. > > There may also be some sort of interaction with time (either as Time, or > Time Since Treatment) > >>>> Must I remove data to create blocks of sampled or is 'all data useful'? >> >> e.g. A's were the only plots sampled in 2010- should I remove that >> year completely? >> e.g. C1 & C5 were sampled in 2005 while the rest were not until 2011,- >> should I only include data from 2011 onwards for all C's? >> e.g. Should I remove A4 completely since its only sampled in the last >> few years or its still useable? > > No, you should be able to use all of the data, you just have to be a bit > careful about how you model Time. >>>> >>>> Can I include S in the analyses in order to compare them with B and C >>>> treated plots in relation to A plots? > > Yes, in principal. It just doesn't have a Habitat:Treatment interaction. > >> I have already analysed my first research question >> Q1) To understanding the differences in diversity and composition >> across control habitats, irrelevant of time. >> >> The analysis approach I used for this is: >> i) Mixed effect model: GLMM PQL (Penalised Quasi-Likelihood) using >> MASS R package. >> Diversity ~ fixed effect = habitat type + random effect = year , >> Family = poisson > > There are better tools than glmmPQL nowadays. Have a look at the lme4 > package, for example. >> >> ii) Pairwise permutational multivariate analysis of variance (MANOVA) >> with R code based on the adonis2 function, to determine if the >> composition among habitats (visualised in NMDS) were significantly >> different from each other. >> >> iii) RDA with habitat as explanatory and year as covariate to test >> explained variance. >> >> Now I am trying to expand this analysis to include a temporal element >> to answer Q2 & Q3 >> Q2) to understand the trends in diversity and composition over time in >> control habitats >> Q3) to understand the impact of treatment on diversity and composition >> (over time if possible?) >> >> The addition of time into the analyses is a bit difficult for me to >> work out, due to the multi-nested and unbalanced design of the data; I >> am not sure what methods to use to include time as a variable for >> looking at a) diversity and b) composition > > Take a look at repeated measures models. There are a few ways this could be > set up, depending a bit on the data. > >> Questions regarding analyses: >>> >>> 1> Is there an appropriate mixed effect model I can use to look at >>> differences in diversity on different control plots and include time as a >>> factor (rather than as a random effect)? > > There are probably several. :-) For example you could include Time as a > continuous covariate, alongside the random effect. You could also just > include it as a fixed effect, but that could get messy. >>> >>> 2> How can I appropriately test if different habitats exhibit different >>> trends in composition over time (ie. a multivariate approach). For example, >>> I might expect that A's will remain relatively stable over time, while C's >>> will exhibit high turnover (fluctuation) across years, or that B's will >>> slowly shift composition to be more similar to C. How can I test these >>> directional hypotheses? >> >> I thought to create a Principle Response Curve to see relative >> differences over time, but as far as I understand, I cannot use a >> permutation test here due to the unbalance design. I also thought to >> take the scores on the first RDA axis as a univariate measure, and >> then plot this over time.. but I'm not sure if its an appropriate >> approach or how to then test this statistically. >> >> I also thought to try and create some measure of "compositional >> temporal stability" for each plot and test this using ANOVA (like some >> sort of "multivariate Coefficient of Variation"). One such measure >> could be distance of each plot-year from the habitat centroid in >> ordination space but again, I'm not sure if this is an appropriate >> approach. Any suggestions for other measures would be welcome. > > That's essentially a question about the variance in responses. There are > doubly hierarchical models that you could try, but you might not want to go > there. >>> >>> 3> Finally can I extend these temporal analysis (of diversity and of >>> composition) to look at response trends to treatments, given the structure >>> of my data? >> >> I would like to see if I can detect some form of resistance to, or >> recovery from, the treatment over time ... But if not, can I test the >> overall treatment affect and use time as a random effect like i did >> for my first question? >> >> Thank you for any suggestions of analyses and/or ways to subset the >> data that would allow me to answer these questions. > > Essentially you need some structure on the time covariate. You could start > by using time since treatment as a factor, and plot those estimates. Again, > there should be a bit of playing around with the model, to see what makes > sense. > > Bob > > -- > Bob O'Hara > NOTE NEW ADDRESS!!! > Institutt for matematiske fag > NTNU > 7491 Trondheim > Norway > > Mobile: +49 1515 888 5440 > Journal of Negative Results - EEB: www.jnr-eeb.org > > _______________________________________________ > R-sig-ecology mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology _______________________________________________ R-sig-ecology mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-ecology |
Free forum by Nabble | Edit this page |