cceHUB: An Environment for Collaborative Cancer Care Research
Category
Published on
Abstract
cceHUB supports collaboration and community resource sharing for Cancer Care Engineering, a research project that links clinical teams at IU Simon Cancer Center with science laboratories and statisticians at Purdue and Indiana Universities. The cceHUB environment encompasses major elements of the cancer research workflow: (1) blood sample tracking and patient data collection, (2) annotation, upload and storage for instrument-generated datasets, (3) discovery pipelines and modeling tools for data analysis and synthesis, and (4) data views to browse, search, sort, filter, link and explore community-shared data. cceHUB is an entirely unique research environment, as it directly connects data, tools and the analysis process. cceHUB created a new hub-based data technology for sample tracking and analysis. Tracking follows samples from collection at the hospital through transfer, storage and distribution to laboratories. Protocols for sample handling can be uniquely configured for each study, and web-forms for sample tracking are auto-generated. A key feature of sample tracking is the association of datasets generated by laboratory instruments directly to samples used in the analysis. This guarantees the permanent electronic linkage of analysis results to the original samples. Massive laboratory datasets (e.g., mass spectrometry for proteomics) are uploaded to repositories with annotations and provenance metadata, where they are available for exploration, analysis, and integrative modeling. Knowledge associated with laboratory workflows is built into the database and directly controls how laboratory data is uploaded and stored. Patient data encompasses clinical information and demographics; web-forms are auto-generated from data elements and data flows defined by the clinical team, and are access-controlled for data contribution. Patient identifiers are carried throughout the research workflow, so that laboratory analysis and modeling tools can use patient information for phenotyping. A major component of cceHUB data technology is its data viewer, which is used to explore clinical data, laboratory data collections, and results collections from tool-generated analysis. cceHUB has been in operation for nearly two years, collecting and exploring clinical data for hundreds of patients; uploading and annotating thousands of laboratory datasets; and integrating a dozen valuable modeling codes for integrative analysis. The small group of CCE researchers depends on the data sharing capability of cceHUB, where some modeling codes have run for many thousands of hours and several of the clinical data views have been accessed tens of thousands of times.