Quantile regressions across several predictors

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Quantile regressions across several predictors

Peter Houk
Greetings -

I'm wondering if folks might be able to point out the best approach for
examining the influence of any particular quantile of many predictor
variables simultaneously?  For instance, the below data show three
potential predictors of a dependent variable, but in this case, we might
want to use the 50% quantile (i.e., mean) of each predictor.  I'm wondering
if there Is any standard approach for dealing with multiple predictors,
that when binned, can no longer be contrasted in a single model.

Thanks for any discussion and guidance,

Peter


pred 1 pred 2 pred 3 dependent
2 14 4 800.5987
2 18 11 414.1341
11 15 12 825.5466
11 15 12 1143.972
11 14 3 904.4725
11 18 15 433.1852
11 22 14 726.6624
11 16 2 1450.15
12 20 2 670.4164
12 19 7 741.6311
12 15 7 1835.707
13 18 14 810.5779
13 22 5 418.6701
13 16 12 1127.189
13 20 1 782.0013
14 21 4 875.8959
14 16 13 1077.747
14 11 9 1949.56
15 15 14 972.0584
16 20 7 1048.716
16 11 8 689.4675
16 16 11 1523.632
16 21 11 816.4746
16 14 4 1303.638
16 21 13 1270.525
16 20 2 1174.816
15 13 5 1076.839
15 17 10 808.3099
15 15 9 1324.503
15 19 7 922.1628
15 16 6 1644.743
14 13 14 864.5559
13 19 10 119.296
13 19 12 659.5301
13 18 5 1214.279
13 20 5 1511.839
13 14 8 577.8826
12 12 2 1242.402
12 14 11 1422.48
12 19 6 210.9226
12 17 14 1982.219
11 9 12 1057.788
11 18 8 1723.669
11 10 3 2188.152
11 15 10 1240.588
10 16 1 1262.361
10 20 15 1092.262
10 15 4 813.7531
10 16 12 1423.387
9 15 10 1621.156
8 21 3 1184.342
8 21 5 935.7707
8 17 2 919.8948
8 15 1 960.7185
8 16 13 1041.912
7 16 8 1633.856
7 18 15 1276.876
7 18 8 1108.591
7 17 9 844.5977
7 10 6 1681.484
6 18 3 915.3588
6 21 11 938.9458
6 16 12 1309.535
6 20 3 881.339
6 17 15 952.1002
5 19 6 803.3203
5 16 13 826.4538
5 20 10 1382.564
5 21 2 851.8552
5 19 7 1400.708
4 19 14 1411.594

--

Peter Houk, PhD
Assistant Professor
University of Guam Marine Laboratory
*http://guammarinelab.org/peterhouk.html
<http://guammarinelab.org/peterhouk.html>*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: Quantile regressions across several predictors

Drew Tyre
I'm not sure I understand what you want to do.

> but in this case, we might want to use the
> 50% quantile (i.e., mean) of each predictor.  

You mean, use the median of pred1 to predict variation in dependent? But then all rows would have the same value - just predicting using a constant.

library(readr)
library(dplyr)
test <- read_delim("test.txt", " ")
apply(test,2,median)
#    pred1     pred2     pred3 dependent
#  11.000    17.000     8.000  1048.716


> standard approach for dealing with multiple predictors, that when binned,

So by "binned" do you mean converting pred1 to a categorical predictor that divides the continuous pred1 into, say, two bins, above the median and below the median? And then a different predictor is pred1 cut into 4 bins, like this:

test <- mutate(test,pred1_2 = cut(pred1,2), pred1_4 = cut(pred1,4))
test
#Source: local data frame [71 x 6]

#   pred1 pred2 pred3 dependent  pred1_2    pred1_4
#  (int) (int) (int)     (dbl)   (fctr)     (fctr)
#1      2    14     4  800.5987 (1.99,9] (1.99,5.5]
#2      2    18    11  414.1341 (1.99,9] (1.99,5.5]
#3     11    15    12  825.5466   (9,16]   (9,12.5]
#4     11    15    12 1143.9720   (9,16]   (9,12.5]
#5     11    14     3  904.4725   (9,16]   (9,12.5]
#6     11    18    15  433.1852   (9,16]   (9,12.5]
#7     11    22    14  726.6624   (9,16]   (9,12.5]
#8     11    16     2 1450.1500   (9,16]   (9,12.5]
#9     12    20     2  670.4164   (9,16]   (9,12.5]
#10    12    19     7  741.6311   (9,16]   (9,12.5]
#..   ...   ...   ...       ...      ...        ...

And you question is how to compare a model
Dependent~pred1_2
Vs
Dependent~pred1_4
? You don't want to include both in the same model because they are highly correlated. Assuming my interpretation of what you want is correct, I believe your best approach is to compare multiple models with AIC, which works with non-nested models.


-- 
Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: [hidden email]
http://snr.unl.edu/tyre
http://atyre2.github.io
ORCID: orcid.org/0000-0001-9736-641X

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Reply | Threaded
Open this post in threaded view
|

Re: Quantile regressions across several predictors

Cade, Brian
In reply to this post by Peter Houk
Peter:  Your question is not quite clear to me.  I thought at first you
might be talking about quantile regression but then you mentioned the 50%
quantile (which is not the mean) of the predictor and binning.  So I'm not
sure exactly what you are after. But under the presumption that you might
really be thinking along the lines of quantile regression (which does not
require binning by predictors), I took your example data and ran it through
a linear quantile regression from quantreg package, where quantiles of the
continuous dependent variable are estimated conditional on an additive
effect of the three predictors provided.  Some summary output below for
0.10, 0.25, 0.50, 0.75, and 0.90 quantiles.  Here it looks as if only pred2
has a strong nonzero (negative) effect for the upper quantiles (0.50, 0.75,
and 0.90) of the dependent variable based on 95% confidence intervals not
overlapping zero.  If this is along the lines of what you were thinking
about, then perhaps you can frame you question in a more focused fashion
and I might be able to provide better advice.  There is much more that can
be done with quantile regression. Plotting this sort of summary info is
especially useful.

Brian


example.qr.results <- rq(dependent ~ pred1 + pred2 +
pred3,data=example.data,tau=c(0.10,0.25,0.50,0.75,0.90))
summary(example.qr.results,se="rank",iid=F,alpha=0.05)

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
    0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.1

Coefficients:
            coefficients lower bd   upper bd
(Intercept) 1665.53049    -17.44156 2493.10597
pred1          8.81923    -40.77369   53.37269
pred2        -57.39947    -85.39144   23.59046
pred3        -19.74443    -60.76278   61.19992

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
    0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.25

Coefficients:
            coefficients lower bd   upper bd
(Intercept) 1231.52601    821.28092 1935.37219
pred1         -2.25995    -29.68130   30.79243
pred2        -20.83135    -62.10712    3.75916
pred3         -3.51839    -23.45116   13.38838

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
    0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.5

Coefficients:
            coefficients lower bd   upper bd
(Intercept) 1714.10796    729.52807 2553.46234
pred1          2.02560    -39.70704   29.34070
pred2        -41.81862    -81.38048   -4.06101
pred3          2.90515    -18.68419   21.02118

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
    0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.75

Coefficients:
            coefficients lower bd   upper bd
(Intercept) 2118.28691   1186.20556 3496.67829
pred1         17.75399    -38.41521   32.63466
pred2        -62.43047   -113.90480  -15.35846
pred3         10.53731    -41.48255   35.23541

Call: rq(formula = dependent ~ pred1 + pred2 + pred3, tau = c(0.1,
    0.25, 0.5, 0.75, 0.9), data = example.data)

tau: [1] 0.9

Coefficients:
            coefficients lower bd   upper bd
(Intercept) 2855.31941   1631.16351 4217.13007
pred1          1.31388    -71.21536   65.65507
pred2        -77.54635   -106.11297  -30.33534
pred3          1.74284    -63.49143   56.91477
Warning messages:
1: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
2: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
3: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  4.22535211267606 percent fis <=0
4: In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) :
  Solution may be nonunique
>




Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  [hidden email] <[hidden email]>
tel:  970 226-9326


On Wed, May 25, 2016 at 4:43 PM, peterhouk1 . <[hidden email]> wrote:

> Greetings -
>
> I'm wondering if folks might be able to point out the best approach for
> examining the influence of any particular quantile of many predictor
> variables simultaneously?  For instance, the below data show three
> potential predictors of a dependent variable, but in this case, we might
> want to use the 50% quantile (i.e., mean) of each predictor.  I'm wondering
> if there Is any standard approach for dealing with multiple predictors,
> that when binned, can no longer be contrasted in a single model.
>
> Thanks for any discussion and guidance,
>
> Peter
>
>
> pred 1 pred 2 pred 3 dependent
> 2 14 4 800.5987
> 2 18 11 414.1341
> 11 15 12 825.5466
> 11 15 12 1143.972
> 11 14 3 904.4725
> 11 18 15 433.1852
> 11 22 14 726.6624
> 11 16 2 1450.15
> 12 20 2 670.4164
> 12 19 7 741.6311
> 12 15 7 1835.707
> 13 18 14 810.5779
> 13 22 5 418.6701
> 13 16 12 1127.189
> 13 20 1 782.0013
> 14 21 4 875.8959
> 14 16 13 1077.747
> 14 11 9 1949.56
> 15 15 14 972.0584
> 16 20 7 1048.716
> 16 11 8 689.4675
> 16 16 11 1523.632
> 16 21 11 816.4746
> 16 14 4 1303.638
> 16 21 13 1270.525
> 16 20 2 1174.816
> 15 13 5 1076.839
> 15 17 10 808.3099
> 15 15 9 1324.503
> 15 19 7 922.1628
> 15 16 6 1644.743
> 14 13 14 864.5559
> 13 19 10 119.296
> 13 19 12 659.5301
> 13 18 5 1214.279
> 13 20 5 1511.839
> 13 14 8 577.8826
> 12 12 2 1242.402
> 12 14 11 1422.48
> 12 19 6 210.9226
> 12 17 14 1982.219
> 11 9 12 1057.788
> 11 18 8 1723.669
> 11 10 3 2188.152
> 11 15 10 1240.588
> 10 16 1 1262.361
> 10 20 15 1092.262
> 10 15 4 813.7531
> 10 16 12 1423.387
> 9 15 10 1621.156
> 8 21 3 1184.342
> 8 21 5 935.7707
> 8 17 2 919.8948
> 8 15 1 960.7185
> 8 16 13 1041.912
> 7 16 8 1633.856
> 7 18 15 1276.876
> 7 18 8 1108.591
> 7 17 9 844.5977
> 7 10 6 1681.484
> 6 18 3 915.3588
> 6 21 11 938.9458
> 6 16 12 1309.535
> 6 20 3 881.339
> 6 17 15 952.1002
> 5 19 6 803.3203
> 5 16 13 826.4538
> 5 20 10 1382.564
> 5 21 2 851.8552
> 5 19 7 1400.708
> 4 19 14 1411.594
>
> --
>
> Peter Houk, PhD
> Assistant Professor
> University of Guam Marine Laboratory
> *http://guammarinelab.org/peterhouk.html
> <http://guammarinelab.org/peterhouk.html>*
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology