Use of Hierarchical Keywords for Easy Data Management on HUBzero

By Gaurav Nanda1; Jonathan Tan; Peter Auyeung; Bill Gaskill; Christopher A Smoak1; mark lehto1

1. Purdue University



Published on


Post implementation, HUBzero has been well accepted as a knowledge management and collaboration platform for the reliability engineering (RE) division of a large consumer goods company. Various RE tools are being used in the organization in form of spreadsheets. Automated workflows have been developed to collect and publish good quality RE files on HUBzero as resources for benchmarking and reuse. The WEB 2.0 features of HUBzero such as tags, ratings and reviews help the users to browse efficiently through the various RE tool analyses on HUBzero. We are now working towards intelligently assigning keywords to each of the reliability tool spreadsheets in an automated manner. These keywords will be displayed on HUBzero in a similar fashion as tags but with scores.

We have used a customized statistical approach for keyword extraction based on the term frequency for identifying keywords from the RE spreadsheets, which contain data structured in a unique manner. The keywords assigned to a particular RE file/resource will serve two purposes: help users find the RE files and give users an idea of the content of the RE files without actually opening it. Hence, we have two types of keywords: Global keywords, aimed to direct the user from the top level to a group of RE files and also to facilitate browsing through related content. File keywords are aimed to provide specific details of a particular RE file. These keywords are determined using two types of scores associated with each word in the RE file: file score and global score. The file score of a word indicates the association strength of the keyword with a particular file and would be displayed along with the file information to the user. The global score for a particular keyword indicates if the word has presence across a group of files. We are in the process of implementing this approach.


Gaurav Nanda is a PhD student in the School of Industrial Engineering at Purdue University. He is working with Professor Mark Lehto in the area of text mining and collaborative knowledge management. Before joining the PhD program, he worked for five years with Infosys Technologies designing and implementing large-scale software systems in the area of retail banking. He obtained his Bachelors in Agricultural and Food Engineering and Masters in Water Resource Development and Management from Indian Institute of Technology Kharagpur. He worked in the area of non-conventional optimization during his Bachelors and Masters Thesis.

Cite this work

Researchers should cite this work as follows:

  • Gaurav Nanda; Jonathan Tan; Peter Auyeung; Bill Gaskill; Christopher A Smoak; mark lehto (2013), "Use of Hierarchical Keywords for Easy Data Management on HUBzero,"

    BibTex | EndNote


Nikki Huang

Purdue University


  • Copyright © 2022 Hubzero
  • Powered by Hubzero®