Saturday, July 05, 2008

Quantile Regression for Trend Analysis

Introduction
Quantile regression extends ordinary least-squares regression to quantiles of the response variable. Ordinary regression is a model for the conditional mean, where the mean is conditional on the value of the explanatory variable. Likewise, quantile regression is a model for the conditional quantiles. For trend analysis the explanatory variable is time. Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. The quantiles mark a set of ordered data into equal-sized data subsets.

The software is downloaded from the internet and installed on a computer. A data set from the internet is imported into a software session. An exploratory plot of the data is created to visualize the trends. A quantile regression model is run on the data to quantify the trends and determine their statistical significance.

Equipment


1. Computer running Macintosh, Linux/Unix, or Windows

2. Internet connection

3. R for statistical computing (http://www.r-project.org/) [1]

Time Needed
Approximately 20 minutes.

Procedure
1. Download and install R.
Tip: Only the base directory is needed.

2. Click on the icon to start R. With Linux/Unix, type the letter R from a command window.

3. Read the data into an R session by typing on the command line:
StormMax=read.csv("http://myweb.fsu.edu/jelsner/
extspace/extremedatasince1899.csv",T)

Caution: Type it all on a single line. The quotes must be bidirectional.

4. Subset the cyclones by basin and by year after 1977 (satellite era) by typing:
StormMaxBasin=subset(StormMax,Region=="Basin")
StormMaxBasin=subset(StormMaxBasin,Yr>1977)

5. Make the columns of the data set available by name by typing:
attach(StormMaxBasin)
6. Create an exploratory plot of the annual lifetime maximum wind speed (intensity) as a function of year by typing:
x=boxplot(Wmax~as.factor(Yr),plot=F)
boxplot(Wmax~as.factor(Yr),ylim=c(35,175),
xlab="Year",ylab="Intensity (kt)")
xx=1:32
abline(lm(x$stats[5,]~xx),col="red")
abline(lm(x$stats[4,]~xx),col="blue")
abline(lm(x$stats[3,]~xx),col="green")

7. Install and load the quantreg package developed by Roger Koenker [2]. Then print the reference citation.
install.packages("quantreg")
library(quantreg)
citation("quantreg")

8. Summarize the results of the quantile regression model at the upper quantiles 0.75, 0.9, and 0.95.
summary(rq(Wmax~Yr,tau=c(0.75,0.9,0.95)),se="iid")
Tip: The standard errors (se= argument) can be estimated using other methods, type ?summary.rq

9. Plot the model results.
model=rq(Wmax~Yr,tau=seq(0.2,0.8,0.1))
plot(summary(model,alpha=0.05,se="iid"),
parm=2,pch=19,cex=1.2,mar=c(5,5,4,2)+0.1,
ylab="Trend (kt/yr)",xlab="Quantile")

Troubleshooting

Make sure there is permission on the computer to install software.


Make sure the commands are typed exactly as shown with bidirectional double quotes.


If copy/paste is used, make sure to change the quotations to bidirectional.

Anticipated Results
The exploratory plot should verify the lack of trend in the median lifetime maximum intensity. It should show a tendency for the strongest cyclones (higher quantiles) to get stronger over time. The statistical significance of the trends is assessed with a quantile regression model and the results are plotted.


References


[1] R Development Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org (2006).


[2] Koenker, R. quantreg: Quantile Regression. R package version 4.10. http://www.r-project.org (2007)

Acknowledgments
Thanks go to all involved with the R project for statistical computing. Special thanks go to Thomas Jagger for his statistical help. The work is supported by the U.S. National Science Foundation, Risk Prediction Initiative of the Bermuda Institute for Ocean Studies, and the Florida Catastrophic Storm Risk Management Center of Florida State University.