log link versus log response

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

log link versus log response

Tomas Easdale
Hi there,
 
I am using glms. Could someone please explain what's the difference
between (a) using a gaussian family distribution with a LOG link
function and (b) LOG transforming the response variable with a normal
distribution (Gaussian family distribution with identity link function).
The outputs differ and clearly one option or the other will result in
better fits depending on the dataset (everything else equal) but I want
to understand why is this so.
 
Thanks in advance,
 
Tomás Easdale
Landcare Research, NZ
     
 

        [[alternative HTML version deleted]]


_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: log link versus log response

Simon Blomberg-4
In a), we have

log(mu_a) = t(X) %*% beta, Y_i ~ N(mu_a, sigma^2)

ie we are modelling mu_a in terms of explanatory variables X and
parameters beta, and the link function operates on mu_a. mu_a is
estimated by mean(Y_i).

in b) we have

mu_b = t(X) %*% beta, log(Y_i) ~ N(mu_b, sigma^2)

Now, mean(Y_i) estimates mu_a, and mean(log(Y_i) ) estimates mu_b, but
clearly mu_a != mu_b because mean(log(x)) != log(mean(x))

So they are different models entirely. Comparing these models is
slightly tricky, because taking log(Y_i) means that you need to use the
change of variable formula to make the likelihood in b) comparable to
the likelihood of a). You can't just compare AIC's or the deviances for
example.

hope this helps,

Simon.

where mu_i is some function of On Thu, 2008-04-24 at 13:38 +1200, Tomas
Easdale wrote:

> Hi there,
>  
> I am using glms. Could someone please explain what's the difference
> between (a) using a gaussian family distribution with a LOG link
> function and (b) LOG transforming the response variable with a normal
> distribution (Gaussian family distribution with identity link function).
> The outputs differ and clearly one option or the other will result in
> better fits depending on the dataset (everything else equal) but I want
> to understand why is this so.
>  
> Thanks in advance,
>  
> Toms Easdale
> Landcare Research, NZ
>      
>  
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology