Teaching Big Data analysis in the Social Sciences using a HUBzero-based platform
Category
Published on
Abstract
Large datasets culled from social media can easily grow to a size beyond the analytical capability of common software tools. Undergraduate institutions such as comprehensive colleges often lack the computing infrastructure and support personnel needed to allow students and researchers to create, manipulate, and analyze such large datasets. However, modern graduates in the social sciences are expected to be competent users of such. In order to provide the infrastructure necessary to expose undergraduates in the social sciences to data intensive computing, State University of New York (SUNY) Oneonta teamed with University at Buffalo's Center for Computational Research (CCR) to establish a collaborative virtual community focusing on data intensive computing education.
The fruit of this collaboration, Virtual Infrastructure for Data Intensive Analysis (VIDIA), utilizes the HUBzero platform and is hosted at University at Buffalo's CCR. Using VIDIA, SUNY Oneonta has integrated analysis of large datasets into coursework and research in their Sociology, Political Science, and Philosophy departments. The use of these datasets centers upon a social-scientific examination of how moral claims are generated, sustained, and challenged in the electronic sphere. To perform these studies, students and faculty capture datasets using the Twitter Application Programming Interface (API), then analyze them using tools deployed on the VIDIA HUBzero instance.
In the first semester of this collaboration, CCR deployed three analysis tools and supported their use in three undergraduate classes at Oneonta. VIDIA supported up to 25 simultaneous sessions on data analysis tools such as RapidMiner; 900+ sessions of the tools were run in a single one-week period. In Fall 2014, additional SUNY Oneonta courses will be supported, including courses in the Statistics department. The HUBzero platform helps foster a collaborative environment in which Oneonta students and faculty can conduct intensive data analysis of a kind that is not otherwise possible at their institution.
Bio
Jeanette Sperhac is a Scientific Programmer at the University at Buffalo’s Center for Computational Research (CCR). She holds an S.B. in Chemistry from the University of Chicago, an M.S. in Chemistry from the University of Colorado, and an M.S. in Computer Science from the University at Buffalo. Jeanette has worked as a software engineer and DBA in both academia and industry.
Jeanette supports a number of software engineering and web application projects at CCR, including HUBzero instances vidia.ccr.buffalo.edu, a collaboration between SUNY Oneonta and SUNY Buffalo, and hpc2.org, a partnership between Rensselaer Polytechnic Institute, Stony Brook University, Brookhaven National Laboratory, NYSERNet, and the University at Buffalo. She is a member of the XDMoD development team and maintains a portal for the Association of Academic Health Sciences Libraries.
Cite this work
Researchers should cite this work as follows:
Tags
-
1. Teaching Big Data analysis in …
0
00:00/00:00
-
2. Outline
69.469469469469473
00:00/00:00
-
3. The Collaboration
87.153820487153823
00:00/00:00
-
4. Adopting Social Media Analysis…
127.16049382716049
00:00/00:00
-
5. Social Sciences Course Goals
189.05572238905572
00:00/00:00
-
6. Case Study: Society and Animal…
233.33333333333334
00:00/00:00
-
7. Case Study: Society and Animal…
282.21554888221556
00:00/00:00
-
8. Collaboration Goals
314.44778111444782
00:00/00:00
-
9. vidia.ccr.buffalo.edu
385.985985985986
00:00/00:00
-
10. VIDIA
392.69269269269273
00:00/00:00
-
11. Why HUBzero?
430.19686353019688
00:00/00:00
-
12. VIDIA Hardware
479.44611277944614
00:00/00:00
-
13. VIDIA HUBzero Instance
508.008008008008
00:00/00:00
-
14. Data Mining Workflow Tools
545.54554554554556
00:00/00:00
-
15. Why RapidMiner?
579.27927927927931
00:00/00:00
-
16. Why RapidMiner?
629.99666332999664
00:00/00:00
-
17. Why RapidMiner?
633.43343343343349
00:00/00:00
-
18. Curating Datasets
639.23923923923928
00:00/00:00
-
19. How many tweets?
657.957957957958
00:00/00:00
-
20. Pitfalls: Purchasing Twitter D…
683.24991658325
00:00/00:00
-
21. Twitter Data Acquisition
743.81047714381054
00:00/00:00
-
22. VIDIA: Spring 2014
757.02369035702372
00:00/00:00
-
23. RapidMiner Sessions
781.54821488154823
00:00/00:00
-
24. Challenges
794.86152819486153
00:00/00:00
-
25. VIDIA: Fall 2014
843.34334334334335
00:00/00:00
-
26. Plans
872.3056389723057
00:00/00:00
-
27. SUNY
974.240907574241
00:00/00:00
-
28. The VIDIA Team
1003.0030030030031
00:00/00:00
-
29. Questions?
1021.8551885218552
00:00/00:00