Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform
Category
Published on
Abstract
Large datasets culled from social media can easily grow to a size beyond the analytical capability of common software tools. Undergraduate institutions such as comprehensive colleges often lack the computing infrastructure and support personnel needed to allow students and researchers to create, manipulate, and analyze such large datasets. However, modern graduates in the social sciences are expected to be competent users of such. In order to provide the infrastructure necessary to expose undergraduates in the social sciences to data intensive computing, State University of New York (SUNY) Oneonta teamed with University at Buffalo's Center for Computational Research (CCR) to establish a collaborative virtual community focusing on data intensive computing education.
The fruit of this collaboration, Virtual Infrastructure for Data Intensive Analysis (VIDIA), utilizes the HUBzero platform and is hosted at University at Buffalo's CCR. Using VIDIA, SUNY Oneonta has integrated analysis of large datasets into coursework and research in their Sociology, Political Science, and Philosophy departments. The use of these datasets centers upon a social-scientific examination of how moral claims are generated, sustained, and challenged in the electronic sphere. To perform these studies, students and faculty capture datasets using the Twitter Application Programming Interface (API), then analyze them using tools deployed on the VIDIA HUBzero instance.
In the first semester of this collaboration, CCR deployed three analysis tools and supported their use in three undergraduate classes at Oneonta. VIDIA supported up to 25 simultaneous sessions on data analysis tools such as RapidMiner; 900+ sessions of the tools were run in a single one-week period. In Fall 2014, additional SUNY Oneonta courses will be supported, including courses in the Statistics department. The HUBzero platform helps foster a collaborative environment in which Oneonta students and faculty can conduct intensive data analysis of a kind that is not otherwise possible at their institution.
Bio
Jeanette Sperhac is a Scientific Programmer at the University at Buffalo’s Center for Computational Research (CCR). She holds an S.B. in Chemistry from the University of Chicago, an M.S. in Chemistry from the University of Colorado, and an M.S. in Computer Science from the University at Buffalo. Jeanette has worked as a software engineer and DBA in both academia and industry.
Jeanette supports a number of software engineering and web application projects at CCR, including HUBzero instances vidia.ccr.buffalo.edu, a collaboration between SUNY Oneonta and SUNY Buffalo, and hpc2.org, a partnership between Rensselaer Polytechnic Institute, Stony Brook University, Brookhaven National Laboratory, NYSERNet, and the University at Buffalo. She is a member of the XDMoD development team and maintains a portal for the Association of Academic Health Sciences Libraries.
Cite this work
Researchers should cite this work as follows: