How to get the total variance explained from an envfit object?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get the total variance explained from an envfit object?

Couch, Claire Elizabeth
I am analyzing some microbiome data by using unconstrained ordination (PCA
or NMDS) followed by environmental vector fitting with the envfit function
in the vegan package. The output of envfit includes an r2 value for each
vector or factor included in the envfit model, but I am interested in the
total amount of variation explained by all the vectors/factors, rather than
just stand-alone variables. I presume I cannot simply add up the R2 values
assigned to each environmental variable, because there may be overlap in
the microbiome variation that is "explained" by each environmental
variable. However, there does not seem to be any way of accessing the total
r2 value for the model.

Using an example dataset, this is what I have tried so far:

library(vegan)
library(MASS)

data(varespec, varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
fit

This shows r2 for each environmental variable, but how do I extract the r2
value for the entire model?

I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
these all return NULL.

I would greatly appreciate any suggestions!

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: How to get the total variance explained from an envfit object?

Juan Antonio Balbuena
Hi,

Unfortunately, I cannot give a direct answer to your question since I am
not familiar with envfit.

However, I wonder whether you have a specific reason to use
unconstrained ordination. If you use a constrained method, varpart will
produce exactly what you want. (See e.g.
https://www.davidzeleny.net/anadat-r/doku.php/en:varpart_examples).

Best

Juan A. Balbuena

El 07/11/2019 a las 21:02, Couch, Claire Elizabeth escribió:

> I am analyzing some microbiome data by using unconstrained ordination (PCA
> or NMDS) followed by environmental vector fitting with the envfit function
> in the vegan package. The output of envfit includes an r2 value for each
> vector or factor included in the envfit model, but I am interested in the
> total amount of variation explained by all the vectors/factors, rather than
> just stand-alone variables. I presume I cannot simply add up the R2 values
> assigned to each environmental variable, because there may be overlap in
> the microbiome variation that is "explained" by each environmental
> variable. However, there does not seem to be any way of accessing the total
> r2 value for the model.
>
> Using an example dataset, this is what I have tried so far:
>
> library(vegan)
> library(MASS)
>
> data(varespec, varechem)
> library(MASS)
> ord <- metaMDS(varespec)
> fit <- envfit(ord, varechem, perm = 999)
> fit
>
> This shows r2 for each environmental variable, but how do I extract the r2
> value for the entire model?
>
> I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
> these all return NULL.
>
> I would greatly appreciate any suggestions!
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
--

Dr. Juan A. Balbuena
Cavanilles Institute of Biodiversity and Evolutionary Biology
Symbiont Ecology and Evolution Lab
University of Valencia http://www.uv.es/~balbuena 
<http://www.uv.es/%7Ebalbuena>
P.O. Box 22085 http://www.uv.es/cophylpaco
46071 Valencia, Spain
e-mail: [hidden email] <mailto:[hidden email]>tel. +34 963 543
658    fax +34 963 543 733
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*NOTE!*For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: How to get the total variance explained from an envfit object?

Jari Oksanen
In reply to this post by Couch, Claire Elizabeth
Dear Claire Elizabeth Couch,

First let me explain how envfit() works: For continuous environmental variables it actually uses ordination to predict the environmental variable. Under the hood (bonnet), it fits a first degree trend surface (plane in 2D) for environmental variable over the ordination scores, and the R2 is the proportion that surface explains of the variable. The arrow shown is the gradient of this fitted trend surface and shows the direction to which the variable changes most rapidly in a first degree linear model.

Clearly you cannot add these R2 values, because your environmental variables can be (and normally are) inter-correlated.

It seems that you want to work into another direction than envfit: Predict ordination scores by a set of environmental variables.

There are many ways of doing this in R, although we do not provide canned tool for this. You can actually do this even with multiple linear model with function lm() of R::stats. Here is how to do this with vegan::rda() function:

library(vegan)
data(varespec, varechem)
ord <- metaMDS(varespec, trace = FALSE)
fit <- rda(scores(ord) ~ ., data = varechem)

The basic output of this will show you R2:

Call: rda(formula = scores(ord) ~ N + P + K + Ca + Mg + S + Al + Fe +
Mn + Zn + Mo + Baresoil + Humdepth + pH, data = varechem)

              Inertia Proportion Rank
Total         0.13905    1.00000
Constrained   0.11357    0.81681    2
Unconstrained 0.02547    0.18319    2
Inertia is variance

Here the “Proportion” for the Constrained component is the overall R2 = 0.81681. If you want to see the adjusted R2, this is found with

 RsquareAdj(fit)
$r.squared
[1] 0.8168084

$adj.r.squared
[1] 0.5318436

However, you only get the overall R2, but not partial R2 values for single variables. You can use anova(fit, by = “margin”) to find the (lacking) marginal significances of unique effects of the variables, though.

The regression coefficients can be found with command coef() — and probably you want them normalized:

> coef(fit, norm=TRUE)
                 RDA1        RDA2
N        -0.154511701  0.48579513
P         0.002463991 -0.13179802
...
pH        0.701009027 -0.66724274

You can also simplify this model in the usual way, for instance with the function ordistep() that uses permutation test to drop variables one by one (or you can build up this model adding variables one by one if you start with an empty model with ordistep() or ordiR2step()):

ordistep(fit)            # drop variables
m0 <- update(fit, . ~ 1) # m0 is an empty model
ordistep(m0, scope=fit)  # add variables to an empty model

(After these anova(…, by = “margin”) results also give significant effects.)

All this sounds a bit weird to me (or more than “a bit”), but it can be done. I guess there are some readers who get hiccups for using RDA on NMDS, but this can be done as the NMDS space is metric (the *transfer* function is non-metric from community dissimilarities to metric ordination space). It also sounds really odd to have an ordination scores as dependent data in RDA, but this exactly answers the problem you presented: predict ordination scores by a set of external variables.  After all, RDA is nothing but a linear regression for multivariate response data, and there is no need to think it as an ordination.

Cheers, Jari Oksanen

On 7 Nov 2019, at 22:02, Couch, Claire Elizabeth <[hidden email]<mailto:[hidden email]>> wrote:

I am analyzing some microbiome data by using unconstrained ordination (PCA
or NMDS) followed by environmental vector fitting with the envfit function
in the vegan package. The output of envfit includes an r2 value for each
vector or factor included in the envfit model, but I am interested in the
total amount of variation explained by all the vectors/factors, rather than
just stand-alone variables. I presume I cannot simply add up the R2 values
assigned to each environmental variable, because there may be overlap in
the microbiome variation that is "explained" by each environmental
variable. However, there does not seem to be any way of accessing the total
r2 value for the model.

Using an example dataset, this is what I have tried so far:

library(vegan)
library(MASS)

data(varespec, varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
fit

This shows r2 for each environmental variable, but how do I extract the r2
value for the entire model?

I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
these all return NULL.

I would greatly appreciate any suggestions!

[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]<mailto:[hidden email]>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: How to get the total variance explained from an envfit object?

Couch, Claire Elizabeth
Dr. Oksanen, thank you so much for taking the time to offer such a clear
and detailed response. Very helpful!

On Fri, Nov 8, 2019 at 06:31 Jari Oksanen <[hidden email]> wrote:

> Dear Claire Elizabeth Couch,
>
> First let me explain how envfit() works: For continuous environmental
> variables it actually uses ordination to predict the environmental
> variable. Under the hood (bonnet), it fits a first degree trend surface
> (plane in 2D) for environmental variable over the ordination scores, and
> the R2 is the proportion that surface explains of the variable. The arrow
> shown is the gradient of this fitted trend surface and shows the direction
> to which the variable changes most rapidly in a first degree linear model.
>
> Clearly you cannot add these R2 values, because your environmental
> variables can be (and normally are) inter-correlated.
>
> It seems that you want to work into another direction than envfit: Predict
> ordination scores by a set of environmental variables.
>
> There are many ways of doing this in R, although we do not provide canned
> tool for this. You can actually do this even with multiple linear model
> with function lm() of R::stats. Here is how to do this with vegan::rda()
> function:
>
> library(vegan)
> data(varespec, varechem)
> ord <- metaMDS(varespec, trace = FALSE)
> fit <- rda(scores(ord) ~ ., data = varechem)
>
> The basic output of this will show you R2:
>
> Call: rda(formula = scores(ord) ~ N + P + K + Ca + Mg + S + Al + Fe +
> Mn + Zn + Mo + Baresoil + Humdepth + pH, data = varechem)
>
>               Inertia Proportion Rank
> Total         0.13905    1.00000
> Constrained   0.11357    0.81681    2
> Unconstrained 0.02547    0.18319    2
> Inertia is variance
>
> Here the “Proportion” for the Constrained component is the overall R2 =
> 0.81681. If you want to see the adjusted R2, this is found with
>
>  RsquareAdj(fit)
> $r.squared
> [1] 0.8168084
>
> $adj.r.squared
> [1] 0.5318436
>
> However, you only get the overall R2, but not partial R2 values for single
> variables. You can use anova(fit, by = “margin”) to find the (lacking)
> marginal significances of unique effects of the variables, though.
>
> The regression coefficients can be found with command coef() — and
> probably you want them normalized:
>
> > coef(fit, norm=TRUE)
>                  RDA1        RDA2
> N        -0.154511701  0.48579513
> P         0.002463991 -0.13179802
> ...
> pH        0.701009027 -0.66724274
>
> You can also simplify this model in the usual way, for instance with the
> function ordistep() that uses permutation test to drop variables one by one
> (or you can build up this model adding variables one by one if you start
> with an empty model with ordistep() or ordiR2step()):
>
> ordistep(fit)            # drop variables
> m0 <- update(fit, . ~ 1) # m0 is an empty model
> ordistep(m0, scope=fit)  # add variables to an empty model
>
> (After these anova(…, by = “margin”) results also give significant
> effects.)
>
> All this sounds a bit weird to me (or more than “a bit”), but it can be
> done. I guess there are some readers who get hiccups for using RDA on NMDS,
> but this can be done as the NMDS space is metric (the *transfer* function
> is non-metric from community dissimilarities to metric ordination space).
> It also sounds really odd to have an ordination scores as dependent data in
> RDA, but this exactly answers the problem you presented: predict ordination
> scores by a set of external variables.  After all, RDA is nothing but a
> linear regression for multivariate response data, and there is no need to
> think it as an ordination.
>
> Cheers, Jari Oksanen
>
> On 7 Nov 2019, at 22:02, Couch, Claire Elizabeth <[hidden email]>
> wrote:
>
> I am analyzing some microbiome data by using unconstrained ordination (PCA
> or NMDS) followed by environmental vector fitting with the envfit function
> in the vegan package. The output of envfit includes an r2 value for each
> vector or factor included in the envfit model, but I am interested in the
> total amount of variation explained by all the vectors/factors, rather than
> just stand-alone variables. I presume I cannot simply add up the R2 values
> assigned to each environmental variable, because there may be overlap in
> the microbiome variation that is "explained" by each environmental
> variable. However, there does not seem to be any way of accessing the total
> r2 value for the model.
>
> Using an example dataset, this is what I have tried so far:
>
> library(vegan)
> library(MASS)
>
> data(varespec, varechem)
> library(MASS)
> ord <- metaMDS(varespec)
> fit <- envfit(ord, varechem, perm = 999)
> fit
>
> This shows r2 for each environmental variable, but how do I extract the r2
> value for the entire model?
>
> I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
> these all return NULL.
>
> I would greatly appreciate any suggestions!
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
> --
Claire E. Couch (she/her/hers)
NSF Graduate Fellow
Department of Integrative Biology, Oregon State University
--
*Oregon State University in Corvallis, OR is located within the traditional
homelands of the Mary's River  or Ampinefu Band of Kalapuya.  Following the
Willamette Valley Treaty of 1855 (Kalapuya etc. Treaty), Kalapuya people
were forcibly removed to reservations in Western Oregon. Today, living
descendants of these people are a part of the Confederated Tribes of Grand
Ronde Community of Oregon (https://www.grandronde.org
<https://www.grandronde.org/>) and the Confederated Tribes of the Siletz
Indians (https://ctsi.nsn.us <https://ctsi.nsn.us/>).*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology