Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform

By Jeanette M Sperhac1; Steve Gallo; Jim B Greenberg2; Brian Lowe; Bill Wilkerson; Greg Fulkerson; Brett Heindl

1. CCR -- University at Buffalo 2. SUNY Oneonta

Category

Seminars

Published on

Abstract

Large datasets culled from social media can easily grow to a size beyond the analytical capability of common software tools. Undergraduate institutions such as comprehensive colleges often lack the computing infrastructure and support personnel needed to allow students and researchers to create, manipulate, and analyze such large datasets. However, modern graduates in the social sciences are expected to be competent users of such. In order to provide the infrastructure necessary to expose undergraduates in the social sciences to data intensive computing, State University of New York (SUNY) Oneonta teamed with University at Buffalo's Center for Computational Research (CCR) to establish a collaborative virtual community focusing on data intensive computing education.

The fruit of this collaboration, Virtual Infrastructure for Data Intensive Analysis (VIDIA), utilizes the HUBzero platform and is hosted at University at Buffalo's CCR. Using VIDIA, SUNY Oneonta has integrated analysis of large datasets into coursework and research in their Sociology, Political Science, and Philosophy departments. The use of these datasets centers upon a social-scientific examination of how moral claims are generated, sustained, and challenged in the electronic sphere. To perform these studies, students and faculty capture datasets using the Twitter Application Programming Interface (API), then analyze them using tools deployed on the VIDIA HUBzero instance.

In the first semester of this collaboration, CCR deployed three analysis tools and supported their use in three undergraduate classes at Oneonta. VIDIA supported up to 25 simultaneous sessions on data analysis tools such as RapidMiner; 900+ sessions of the tools were run in a single one-week period. In Fall 2014, additional SUNY Oneonta courses will be supported, including courses in the Statistics department. The HUBzero platform helps foster a collaborative environment in which Oneonta students and faculty can conduct intensive data analysis of a kind that is not otherwise possible at their institution.

Bio

Jeanette Sperhac is a Scientific Programmer at the University at Buffalo’s Center for Computational Research (CCR). She holds an S.B. in Chemistry from the University of Chicago, an M.S. in Chemistry from the University of Colorado, and an M.S. in Computer Science from the University at Buffalo. Jeanette has worked as a software engineer and DBA in both academia and industry.

Jeanette supports a number of software engineering and web application projects at CCR, including HUBzero instances vidia.ccr.buffalo.edu, a collaboration between SUNY Oneonta and SUNY Buffalo, and hpc2.org, a partnership between Rensselaer Polytechnic Institute, Stony Brook University, Brookhaven National Laboratory, NYSERNet, and the University at Buffalo. She is a member of the XDMoD development team and maintains a portal for the Association of Academic Health Sciences Libraries.

Cite this work

Researchers should cite this work as follows:

  • Jeanette M Sperhac; Steve Gallo; Jim B Greenberg; Brian Lowe; Bill Wilkerson; Greg Fulkerson; Brett Heindl (2014), "Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform," https://help.hubzero.org/resources/1235.

    BibTex | EndNote

Tags

Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform
by: Jeanette M Sperhac, Steve Gallo, Jim B Greenberg, Brian Lowe, Bill Wilkerson, Greg Fulkerson, Brett Heindl
  • Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform 1. Teaching Big Data analysis in … 0
    00:00/00:00
  • Outline 2. Outline 69.469469469469473
    00:00/00:00
  • The Collaboration 3. The Collaboration 87.153820487153823
    00:00/00:00
  • Adopting Social Media Analysis at Oneonta 4. Adopting Social Media Analysis… 127.16049382716049
    00:00/00:00
  • Social Sciences Course Goals 5. Social Sciences Course Goals 189.05572238905572
    00:00/00:00
  • Case Study: Society and Animals 6. Case Study: Society and Animal… 233.33333333333334
    00:00/00:00
  • Case Study: Society and Animals 7. Case Study: Society and Animal… 282.21554888221556
    00:00/00:00
  • Collaboration Goals 8. Collaboration Goals 314.44778111444782
    00:00/00:00
  • vidia.ccr.buffalo.edu 9. vidia.ccr.buffalo.edu 385.985985985986
    00:00/00:00
  • VIDIA 10. VIDIA 392.69269269269273
    00:00/00:00
  • Why HUBzero? 11. Why HUBzero? 430.19686353019688
    00:00/00:00
  • VIDIA Hardware 12. VIDIA Hardware 479.44611277944614
    00:00/00:00
  • VIDIA HUBzero Instance 13. VIDIA HUBzero Instance 508.008008008008
    00:00/00:00
  • Data Mining Workflow Tools 14. Data Mining Workflow Tools 545.54554554554556
    00:00/00:00
  • Why RapidMiner? 15. Why RapidMiner? 579.27927927927931
    00:00/00:00
  • Why RapidMiner? 16. Why RapidMiner? 629.99666332999664
    00:00/00:00
  • Why RapidMiner? 17. Why RapidMiner? 633.43343343343349
    00:00/00:00
  • Curating Datasets 18. Curating Datasets 639.23923923923928
    00:00/00:00
  • How many tweets? 19. How many tweets? 657.957957957958
    00:00/00:00
  • Pitfalls: Purchasing Twitter Data 20. Pitfalls: Purchasing Twitter D… 683.24991658325
    00:00/00:00
  • Twitter Data Acquisition 21. Twitter Data Acquisition 743.81047714381054
    00:00/00:00
  • VIDIA: Spring 2014 22. VIDIA: Spring 2014 757.02369035702372
    00:00/00:00
  • RapidMiner Sessions 23. RapidMiner Sessions 781.54821488154823
    00:00/00:00
  • Challenges 24. Challenges 794.86152819486153
    00:00/00:00
  • VIDIA: Fall 2014 25. VIDIA: Fall 2014 843.34334334334335
    00:00/00:00
  • Plans 26. Plans 872.3056389723057
    00:00/00:00
  • SUNY 27. SUNY 974.240907574241
    00:00/00:00
  • The VIDIA Team 28. The VIDIA Team 1003.0030030030031
    00:00/00:00
  • Questions? 29. Questions? 1021.8551885218552
    00:00/00:00
  • Copyright © 2022 Hubzero
  • Powered by Hubzero®