Team 3 – Dimensionality Reduction Using Sliced Inverse Regression in Modeling Large Climate Data

Team Members: Ross Flieger-Allison1, Lois Miller2, Danielle Sykes3, and Pablo Valle4
Graduate Assistants: Sai K. Popuri3 and Nadeesri Wijkekoon3
Faculty Mentor: Nagaraj K. Neerchal3
Client: Amita Mehta5

1Department of Computer Science and Department of Statistics, Williams College,
2Department of Mathematics, DePauw University,
3Department of Mathematics and Statistics, University of Maryland, Baltimore County,
4Department of Mathematical Sciences, Kean University,
5Joint Center for Earth System Technology (JCET)

REU2016_team3REU2016_team3all

About the Team

Team 3 consisted of Ross Flieger-Allison, Lois Miller, Danielle Sykes, and Pablo Valle. We worked with our faculty mentor, Dr. Nagaraj K. Neerchal, along with our graduate assistants, Sai K. Popuri and Nadeesri Wijekoon, who provided guidance and useful background knowledge throughout the research process. Our project focused on implementing and evaluating a weather forecasting model that uses a statistical tool called Sliced Inverse Regression (SIR) to reduce the dimensionality of a large set of covariates, then uses a modified Nadaraya-Watson estimator (NWE) to produce rainfall predictions.

Motivation

Climate conditions, especially rainfall, have an important impact on agricultural yields in several regions of the United States. The Missouri River Basin (MRB) is a significant agricultural region that is not irrigated and thus highly dependent on rainfall. The UMBC-JCET team as well as UMBC-REU teams from 2014-2015 previously used daily and monthly weather data provided by NASA and several Global Climate Models (GCMs) to produce predictions for a number of climate variables. Their precipitation predictions, however, have proven to be troublesome and inaccurate due to the semi-continuous nature of precipitation data, as well as the primitive modeling techniques used to produce their results (simple linear regression). This year’s project focused on implementing a more complex forecasting model (using SIR and NWE) in hopes of improving upon previous years’ predictions.

Data

The data we used was provided by NASA and included weather observations made at approximately 21,000 locations across 57 years (1949-2005). Our model assumes that precipitation at any given location s depends on a large number of covariates: current and past values of monthly precipitation, sea-level pressure, relative humidity, and maximum/minimum temperature at s and its neighboring locations.

Methodology

We implemented a data-analytical tool called Sliced Inverse Regression (SIR) that can be used to reduce the dimensionality of large set of covariates that influence precipitation in the Missouri River Basin. In our final stage of obtaining our predictions from the GCM data, we used a simple non-parametric technique called Nadaraya-Watson Estimator in which it was modified to account for the semi-continuous nature of the precipitation data.

Results

Observed and predicted monthly precipitation, including the proportion of 0 values for both:
ObsPre

MSE from positive predictions for months with positive rainfall (true positives):
MSE

Parallelism

TableParallelSpeedup

Conclusion

We have successfully demonstrated that SIR and NWE methods can be implemented to work on a large dataset, and we were able to improve upon the predictive accuracy of previous years’ models. Additionally, we showed that parallelization of the SIR and NWE code greatly increases computational efficiency on the subregion, and also improves efficiency for the entire MRB region, up to 16 processes. Further study is needed on the implementation of SIR and NWE for daily precipitation data, and for other methods to continue improving accuracy.

Links

Ross Flieger-Allison, Lois Miller, Danielle Sykes, Pablo Valle, Sai K. Popuri, Nadeesri Wijkekoon, Nagaraj K. Neerchal, and Amita Mehta. Dimensionality Reduction Using Sliced Inverse Regression in Modeling Large Climate Data. Technical Report HPCF-2016-13, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2016. (HPCF machines used: maya.). Reprint in HPCF publications list

Poster presented at the Summer Undergraduate Research Fest (SURF)

Click here to view Team 1’s project
Click here to view Team 2’s project
Click here to view Team 4’s project
Click here to view Team 5’s project
Click here to view Team 6’s project
Click here to view Team 7’s project