Removing non significant response variable in rda analysis with forward selection?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing non significant response variable in rda analysis with forward selection?

amelie_can
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Removing non significant response variable in rda analysis with forward selection?

elaliberte
Dear Amélie,

To me, the approach you're describing sounds like you're trying to
shoehorn you data to fit your predictions, which can be dangerous at
best and dishonest at worst.

My understanding is that your explanatory variable is a factor with
different groups. If you're interested to see which species best
discriminate between these a priori specified groups, then you may want
to use canonical discriminant analysis (CAD). Have a look at:

Anderson, M. J., and T. J. Willis. 2003. Canonical analysis of principal
coordinates: a useful method of constrained ordination for ecology.
Ecology 84:511-525.

I've only used this in PRIMER v6 / PERMANOVA, but not in R. However I
believe it is implemented in:

?capscale

but Jari and others will be more helpful there.

A somewhat related (but focusing on a different question) approach could
be the IndVal method described in:

Dufrêne, M., and P. Legendre. 1997. Species assemblages and indicator
species: the need for a flexible asymmetrical approach. Ecological
Monographs 67:345-366.

where you could look at which species are the best "indicators" that
characterize different groups of sites.

Hope that helps,

Etienne




 Le jeudi 29 juillet 2010 à 08:00 -0700, amelie_can a écrit :

> Hello all,
>
> My problem is somewhat similar to Vit Syrovatka posted on July 23th and
> titled “Species fit in ordination”.
>
> In my project, I am doing an rda between species abundances (response
> variable – about 130 species) and type of sites (explanatory/environmental
> variable – one variable). When I finish my analysis & plot it, I have a lot
> of species present and I suspected that several of them did not contribute
> significantly to the analysis.
>
> Consequently, I decided to do a forward selection analysis. Usually, a
> forward selection analysis is used to remove environmental variable that
> don’t relate as well with the response variable. But in my case, I only have
> one environmental variable, so I basically switch around my response
> variable (which are now my types of sites) and my explanatory variable
> (which is now my species abundances) for the forward selection analysis. So,
> basically, the forward selection shows me which species explains
> significantly the types of sites found. Then I reran my rda analysis to
> found that including the 20 species that were significant in the forward
> analysis would explain as much the variation of my rda axis as when I had
> all of my species.
>
> Is this correct? My supervisor raised question about the fact that I used my
> response variable in forward analysis instead of environmental variable….  ?
> If not, how can we remove species that are not significant?
>
> I thought of trying to find which species are correlated to one another. I
> know one can use the cor.test function or the vif function, but it is
> problematic to me, as we can only check two species per analysis. Since I
> have about 130 species, checking all of those permutations by hand is just
> too long. I also thought about doing a partial rda analysis, one species at
> the time to see its significance in the model, but again, seemed too long.
>
> Thank you all for your time,
>
> Amelie D’Astous
> Laval university
> Quebec
>


--
Etienne Laliberté
================================
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: Removing non significant response variable in rda analysis with forward selection?

Gavin Simpson
In reply to this post by amelie_can
On Thu, 2010-07-29 at 08:00 -0700, amelie_can wrote:

> Hello all,
>
> My problem is somewhat similar to Vit Syrovatka posted on July 23th and
> titled “Species fit in ordination”.
>
> In my project, I am doing an rda between species abundances (response
> variable – about 130 species) and type of sites (explanatory/environmental
> variable – one variable). When I finish my analysis & plot it, I have a lot
> of species present and I suspected that several of them did not contribute
> significantly to the analysis.
>
> Consequently, I decided to do a forward selection analysis. Usually, a
> forward selection analysis is used to remove environmental variable that
> don’t relate as well with the response variable. But in my case, I only have
> one environmental variable, so I basically switch around my response
> variable (which are now my types of sites) and my explanatory variable
> (which is now my species abundances) for the forward selection analysis. So,
> basically, the forward selection shows me which species explains
> significantly the types of sites found. Then I reran my rda analysis to
> found that including the 20 species that were significant in the forward
> analysis would explain as much the variation of my rda axis as when I had
> all of my species.

Smacks Head; To me, this makes no sense. Less than no sense actually. If
you are interested in the community of species, why do you want to
ignore most of them?

What you in effect did is a multiple regression with 130 predictor
variables. Unless you have LOTS (read lots and lots, and lots, and then
some more) of sites/samples you might as well just do:

spp.want <- sample(130, 20)

and used spp.want to select which species to retain.

The way you are talking it is as if you have the analysis the wrong way
round. You are analysing the species, therefore they cannot "contribute"
significantly or otherwise to the model. They are the things you are
trying to predict/model.

> Is this correct? My supervisor raised question about the fact that I used my
> response variable in forward analysis instead of environmental variable….  ?
> If not, how can we remove species that are not significant?

If you can explain to me why you want to remove "not significant"
species then (and what "not significant" means in this case) maybe we
can try to help. If this is just because your plot is too crowded, then
I think you need to rethink what you are doing.

Jari's comments about well fitted species versus poorly fitted species
and only plotting the well-fitted ones a la recent Canoco/Canodraw could
be done, but again we need to know about what you want to do.
See ?goodness.cca for some of these definitions.

> I thought of trying to find which species are correlated to one another. I
> know one can use the cor.test function or the vif function, but it is
> problematic to me, as we can only check two species per analysis. Since I
> have about 130 species, checking all of those permutations by hand is just
> too long. I also thought about doing a partial rda analysis, one species at
> the time to see its significance in the model, but again, seemed too long.
>
> Thank you all for your time,

If you want to know which species a good indicators of site type, then I
think you might want to look at a classification tree to predict the
site type factor. But then to make these things work you'd need a lot of
samples.

If you need further help, perhaps you can explain what it is you want to
do?

HTH

G

> Amelie D’Astous
> Laval university
> Quebec
>

--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: Removing non significant response variable in rda analysis with forward selection?

Gavin Simpson
In reply to this post by elaliberte
On Fri, 2010-07-30 at 10:11 +1200, Etienne Laliberté wrote:

> Dear Amélie,
>
> To me, the approach you're describing sounds like you're trying to
> shoehorn you data to fit your predictions, which can be dangerous at
> best and dishonest at worst.
>
> My understanding is that your explanatory variable is a factor with
> different groups. If you're interested to see which species best
> discriminate between these a priori specified groups, then you may want
> to use canonical discriminant analysis (CAD). Have a look at:
>
> Anderson, M. J., and T. J. Willis. 2003. Canonical analysis of principal
> coordinates: a useful method of constrained ordination for ecology.
> Ecology 84:511-525.
>
> I've only used this in PRIMER v6 / PERMANOVA, but not in R. However I
> believe it is implemented in:
>
> ?capscale

capscale() is like cca() but without the constraint of using the
chi-square metric (or rda() without Euclidean). It still takes a species
response matrix to be predicted by a set of explanatory variables and
these are fitted as linear combinations, just as in cca().

> but Jari and others will be more helpful there.
>
> A somewhat related (but focusing on a different question) approach could
> be the IndVal method described in:
>
> Dufrêne, M., and P. Legendre. 1997. Species assemblages and indicator
> species: the need for a flexible asymmetrical approach. Ecological
> Monographs 67:345-366.
>
> where you could look at which species are the best "indicators" that
> characterize different groups of sites.

Yes, I too think this might be a good way to go if the focus is on why
species seem to be associated with which site-types. If the OP is
interested in her species as the response, and/or wants a less cluttered
plot, then something else will be required.

G

>
> Hope that helps,
>
> Etienne
>
>
>
>
>  Le jeudi 29 juillet 2010 à 08:00 -0700, amelie_can a écrit :
> > Hello all,
> >
> > My problem is somewhat similar to Vit Syrovatka posted on July 23th and
> > titled “Species fit in ordination”.
> >
> > In my project, I am doing an rda between species abundances (response
> > variable – about 130 species) and type of sites (explanatory/environmental
> > variable – one variable). When I finish my analysis & plot it, I have a lot
> > of species present and I suspected that several of them did not contribute
> > significantly to the analysis.
> >
> > Consequently, I decided to do a forward selection analysis. Usually, a
> > forward selection analysis is used to remove environmental variable that
> > don’t relate as well with the response variable. But in my case, I only have
> > one environmental variable, so I basically switch around my response
> > variable (which are now my types of sites) and my explanatory variable
> > (which is now my species abundances) for the forward selection analysis. So,
> > basically, the forward selection shows me which species explains
> > significantly the types of sites found. Then I reran my rda analysis to
> > found that including the 20 species that were significant in the forward
> > analysis would explain as much the variation of my rda axis as when I had
> > all of my species.
> >
> > Is this correct? My supervisor raised question about the fact that I used my
> > response variable in forward analysis instead of environmental variable….  ?
> > If not, how can we remove species that are not significant?
> >
> > I thought of trying to find which species are correlated to one another. I
> > know one can use the cor.test function or the vif function, but it is
> > problematic to me, as we can only check two species per analysis. Since I
> > have about 130 species, checking all of those permutations by hand is just
> > too long. I also thought about doing a partial rda analysis, one species at
> > the time to see its significance in the model, but again, seemed too long.
> >
> > Thank you all for your time,
> >
> > Amelie D’Astous
> > Laval university
> > Quebec
> >
>
>

--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology