Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency

Team Members:	Matthew G. Bachmann¹, Ashley D. Dyas², Shelby C. Kilmer³, and Julian Sass⁴
Graduate Research Assistant:	Andrew Raim⁴
Faculty Mentor:	Nagaraj K. Neerchal⁴ and Kofi P. Adragani⁴
Client:	George Ostrouchov⁵ and Ian F. Thorpe⁶

¹Department of Mathematics, Northeast Lakeview College,
²Department of Computer Science, Contra Costa College,
³Department of Mathematics, Bucknell University,
⁴Department of Mathematics and Statistics, University of Maryland, Baltimore County,
⁵Oak Ridge National Laboratory,
⁶Department of Chemistry and Biochemistry, University of Maryland, Baltimore County

About the Team

Our team, composed of Matthew Bachmann, Ashley Dyas, Shelby Kilmer, and Julian Sass, performed an efficiency study using a package for R, a popular statistical computing language, called pbdR (Programming with Big Data in R). This research took place at the UMBC REU Site: Interdisciplinary Program in High Performance Computing. Assisting us in our research and providing insight and supervision was our faculty mentor, Dr. Nagaraj Neerchal and our graduate assistant, Andrew M. Raim. Our client, Dr. George Ostrouchov, Senior Research Staff Member at the Oak Ridge National Laboratory, proposed our project. Dr. Ian Thorpe also provided us with data that was used in an application of our study.

Introduction to our Project

pbdR is an R package that is used to implement high performance statistical computing on very large data sets. Our study focused on efficiency while changing two main factors: block cyclic distribution and processor grid layout. We explored the impact of block size and grid layout on computation by implementing the statistical method PCA (Principal Component Analysis).

Methods and Results

For our study, we implemented PCA on a randomly generated data set and recorded the time it took for the code to run. Our pilot study varied n and k, the dimensions of our data matrix, and the results allowed us to show that that the relationship between the dimension of the matrix and the run time was predictable, which allowed us to keep n and k constant throughout the rest of our study.

When changing grid layout and block size, we found that grid layout has less of an effect on the runtime than the block size. We also observed that the 8×8 block size was consistently faster than the other block sizes. We concluded that the 8×8 block size was consistently faster than the other block sizes, no matter the n, k, or grid layout. Therefore, we can conclude that block size has a clear effect on computational efficiency.

Applications of our Study

As an application of our study, we used data containing the movement of amino acids in a protein from the lab of Dr. Ian Thorpe. The data was formatted as 3100 snapshots, each snapshot containing the x, y, and z coordinates of amino acids in different atoms of a protein. We performed PCA on the data matrix and also created a correlation matrix from the data. Once we had a correlation matrix, we created a level plot from the matrix and saw how different amino acids in various atoms correlate with each other.

We then greyed out the correlations that are not statistically significant and did a level plot of the same data set. There is a significant drop in the amount of data points, showing that few of these correlations are statistically significant.

Links

Matthew G. Bachmann, Ashley D. Dyas, Shelby C. Kilmer, Julian Sass, Andrew Raim, Nagaraj K. Neerchal, Kofi P. Adragani, George Ostrouchov, Ian F. Thorpe. Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency. Technical Report HPCF-2013-11, UMBC High Performance Computing Facility, University of Maryland, Baltimore County, 2013. Reprint in HPCF publications list

Poster presented at the Summer Undergraduate Research Fest (SURF)

Click here to view Team 2’s project
Click here to view Team 3’s project
Click here to view Team 4’s project

REU Site: Interdisciplinary Program in High Performance Computing

About the Team

Introduction to our Project

Methods and Results

Applications of our Study

Subscribe to UMBC Weekly Top Stories

I am interested in: