twinspan classification rules as narrative

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

twinspan classification rules as narrative

Gonzalez-Mirelis, Genoveva
Dear all,

I am trying to understand the results from the twinspan function in the R package that has been recently developed (also named twinspan).

Particularly, I would like to be able to derive the classification rules (indicator species and abundance values, or rather ranges) for each terminal group of the twinspan classification.

From this:

library(twinspan)
data(ahti)
twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))
summary(twb)

I understand that say, for group number 16 (the first terminal group encountered) the indicator species were Cladnigr and Cladgray.

I also understand that the indicator score threshold tells me which path to follow down the tree (left or right). But I struggle to understand just what the indicator score means (1)? And whether it can be related to the original abundance value for those two species at the three relevant sites, namely Ster113, Ster097 and Ster098?

What would be a narrative way to describe this particular branch of the tree?

Many thanks in advance,

Genoveva

Genoveva Gonzalez Mirelis, Scientist
Institute of Marine Research
Nordnesgaten 50
5005 Bergen, Norway
Phone number +47 55238510


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: twinspan classification rules as narrative

Jari Oksanen
Howdy,

TWINSPAN is not in CRAN. It seems that you found it in github.

TWINSPAN is an old method, and it seems that people are forgetting how it works. Here some narrative:

First, you have defined cut levels to transform your abundance data into binary indicator “pseudospecies”. You give these cut levels in your call. Each species is split by these cut levels into pseudospecies, and that cut level number is added to the name of species. In your example, the indicator pseudospecies at division 8 are actually Cladgray1 and Cladnigr1 where the added ‘1’ just means that the species just occurs, but can have any abundance value: there is no way of knowing its abundance except for the lower limit (>0). In division 1 you have, for instance, Cladmiti4 which means that the species occurs at least at the cutlevel 4: at least at quantity 5, but it can have any value above that limit.

Now to the narrative for the division. The rule for division 8 (that you mention in your post) is actually "+Cladnigr1 +Cladgray1 < 1”. So they both are at the lowest cut level 1 (present with any abundance), the ‘+’ sign means that they are both positive indicator values and you add +1 for every plot where they occur. Would the sign be ‘-‘, you would add -1 for each presence to give negative scores. Doing this for all species gives you the indicator score: if both species are present, your score is 2, if one is present, your scores is 1 and if neither is present your score is 0. The condition is ‘< 1’ meaning that if neither is present (score 0), the condition is true and you go to final group 16, but if one or both are present (scores 1 or 2), the  condition is false and you continue to division 17. However, this is a tree, and this narrative rule only applies to division 8 and those 16 sampling units it contains: these are split by this rule. To get to this division with this rule you must have satisfied the previous rules leading to this branch. You may see the branch structure using plot(twb): the internal divisions are shown in squared on tree, and the final groups and their sizes as terminal leaves.

The classification rules give you only the lower limit of species, and depending on the indicator score threshold, even some of these indicators may be missing in plot. However, you can use function twintable to see the actual cutlevels for each species. These serve as a cover-class values, but do not give any more detail than the cutlevels you defined.

Cheers, Jari

> On 16 Dec 2019, at 16:44, Gonzalez-Mirelis, Genoveva <[hidden email]> wrote:
>
> Dear all,
>
> I am trying to understand the results from the twinspan function in the R package that has been recently developed (also named twinspan).
>
> Particularly, I would like to be able to derive the classification rules (indicator species and abundance values, or rather ranges) for each terminal group of the twinspan classification.
>
> From this:
>
> library(twinspan)
> data(ahti)
> twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))
> summary(twb)
>
> I understand that say, for group number 16 (the first terminal group encountered) the indicator species were Cladnigr and Cladgray.
>
> I also understand that the indicator score threshold tells me which path to follow down the tree (left or right). But I struggle to understand just what the indicator score means (1)? And whether it can be related to the original abundance value for those two species at the three relevant sites, namely Ster113, Ster097 and Ster098?
>
> What would be a narrative way to describe this particular branch of the tree?
>
> Many thanks in advance,
>
> Genoveva
>
> Genoveva Gonzalez Mirelis, Scientist
> Institute of Marine Research
> Nordnesgaten 50
> 5005 Bergen, Norway
> Phone number +47 55238510
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: twinspan classification rules as narrative

Gonzalez-Mirelis, Genoveva
Hi Jari and all,

Thank you for your explanation, and indeed thank you for developing the R package, which I did download off your GitHub repo.

It seems like I picked a fairly easy example, so let me make sure that I have understood the rules correctly by describing here the previous division, namely division 4 (+Cladrang4 +Cladmero1 -Sterpasc5 -Empenigr3 +Callvulg2 < 2).

First of all, here are the species, and abundance thresholds that need to be checked at this division (I refer to these as "species conditions" later): Cladrang abundance > 5 (if so, +1), Cladmero abundance > 0 (if so, +1), Sterpasc abundance > 25 (if so, -1), Empenigr abundance > 1 (if so, -1) and Callvulg abundance > 0.1 (if so, +1). I'll just remind you that I have used the cutlevels in the Braun-Blanquet scale.

See:

ahti[which(rownames(ahti)%in%c("Ster113","Ster097", "Ster098")),
     which(colnames(ahti)%in%c("Cladrang", "Cladmero", "Sterpasc", "Empenigr", "Callvulg"))]

Now (as opposed to the example in my previous post) there are multiple ways in which the total score may be < 2. For example, site Ster113 (one of the three members of the final group) has Sterpasc present at an abundance > 25 and Empenigr at an abundance > 1; while the other three species conditions are not met, the total score is < 2. Similarly, site Ster 097 meets the conditions for the two negative indicators, as well as the condition for Cladmero, which is present at an abundance > 0. Etc. Correct?

Then, would it also be fair to say that, if the total score threshold is "much" smaller than the total number of indicators (in this example the former is 2 and the latter is 5) it is more "important" that the conditions for the negative indicators are met?

Keep in mind that what I'm ultimately trying to do is to provide a summarized description of species composition for all sites in a given group (and that I don't want this description to be much longer than it needs to be!).

Thank you very much again!

G


-----Original Message-----
From: Jari Oksanen <[hidden email]>
Sent: 16. desember 2019 16:39
To: Gonzalez-Mirelis, Genoveva <[hidden email]>
Cc: [hidden email]
Subject: Re: [R-sig-eco] twinspan classification rules as narrative

Howdy,

TWINSPAN is not in CRAN. It seems that you found it in github.

TWINSPAN is an old method, and it seems that people are forgetting how it works. Here some narrative:

First, you have defined cut levels to transform your abundance data into binary indicator “pseudospecies”. You give these cut levels in your call. Each species is split by these cut levels into pseudospecies, and that cut level number is added to the name of species. In your example, the indicator pseudospecies at division 8 are actually Cladgray1 and Cladnigr1 where the added ‘1’ just means that the species just occurs, but can have any abundance value: there is no way of knowing its abundance except for the lower limit (>0). In division 1 you have, for instance, Cladmiti4 which means that the species occurs at least at the cutlevel 4: at least at quantity 5, but it can have any value above that limit.

Now to the narrative for the division. The rule for division 8 (that you mention in your post) is actually "+Cladnigr1 +Cladgray1 < 1”. So they both are at the lowest cut level 1 (present with any abundance), the ‘+’ sign means that they are both positive indicator values and you add +1 for every plot where they occur. Would the sign be ‘-‘, you would add -1 for each presence to give negative scores. Doing this for all species gives you the indicator score: if both species are present, your score is 2, if one is present, your scores is 1 and if neither is present your score is 0. The condition is ‘< 1’ meaning that if neither is present (score 0), the condition is true and you go to final group 16, but if one or both are present (scores 1 or 2), the  condition is false and you continue to division 17. However, this is a tree, and this narrative rule only applies to division 8 and those 16 sampling units it contains: these are split by this rule. To get to this division with this rule you must have satisfied the previous rules leading to this branch. You may see the branch structure using plot(twb): the internal divisions are shown in squared on tree, and the final groups and their sizes as terminal leaves.

The classification rules give you only the lower limit of species, and depending on the indicator score threshold, even some of these indicators may be missing in plot. However, you can use function twintable to see the actual cutlevels for each species. These serve as a cover-class values, but do not give any more detail than the cutlevels you defined.

Cheers, Jari

> On 16 Dec 2019, at 16:44, Gonzalez-Mirelis, Genoveva <[hidden email]> wrote:
>
> Dear all,
>
> I am trying to understand the results from the twinspan function in the R package that has been recently developed (also named twinspan).
>
> Particularly, I would like to be able to derive the classification rules (indicator species and abundance values, or rather ranges) for each terminal group of the twinspan classification.
>
> From this:
>
> library(twinspan)
> data(ahti)
> twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))
> summary(twb)
>
> I understand that say, for group number 16 (the first terminal group encountered) the indicator species were Cladnigr and Cladgray.
>
> I also understand that the indicator score threshold tells me which path to follow down the tree (left or right). But I struggle to understand just what the indicator score means (1)? And whether it can be related to the original abundance value for those two species at the three relevant sites, namely Ster113, Ster097 and Ster098?
>
> What would be a narrative way to describe this particular branch of the tree?
>
> Many thanks in advance,
>
> Genoveva
>
> Genoveva Gonzalez Mirelis, Scientist
> Institute of Marine Research
> Nordnesgaten 50
> 5005 Bergen, Norway
> Phone number +47 55238510
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology