Scientific Data Management
The Information Management group has been working on building new cyberinfrastructure that streamlines the creation, execution and sharing of complex visualizations, data mining and other large-scale data analysis applications. We developed VisTrails (www.vistrails.org), a new open source, scientific workflow and provenance management system that was designed to manage rapidly evolving workflows common in exploratory applications. VisTrails provides novel mechanisms for capturing and interacting with provenance that greatly simplify the data exploration process. The system has been downloaded over 8,000 times since its beta release in January, 2007. VisTrails has been adopted as part of the cyberinfrastructure in large scientific projects, as well as a teaching and learning tool in graduate and undergraduate courses, both in the U.S. and abroad.
There has been an explosive growth in the volume of structured information on the Web. This information often resides in the hidden (or deep) Web, stored in databases and exposed only through queries over Web forms. A recent study by Google estimates that there are several millions of such form interfaces. However, the high quality information in online databases can be hard to find: it is out of reach for traditional search engines, whose index include only content in the surface Web.
Our group is combining techniques from machine learning, information retrieval and databases to build infrastructure that automates, to a large extent, the process of discovering and organizing hidden-Web data sources, a necessary step to large-scale retrieval and integration of Web information. This infrastructure will enable people and applications to more easily find the right databases and consequently, the hidden information they are seeking on the Web. We have used our hidden-Web infrastructure to build DeepPeep (www.deeppeep.org), a new search engine for Web forms.
Scientific Data Management
 |

Synergistic Challenges in Data-Intensive Science and Exascale Computing
J. Chen, A. Choudhary, S. Feldman, B. Hendrickson, C.R. Johnson, R. Mount, V. Sarkar, V. White, D. Williams.
DOE ASCAC Data Subcommittee Report, Department of Energy Office of Science, March, 2013. The ASCAC Subcommittee on Synergistic Challenges in Data-Intensive Science and Exascale Computing has reviewed current practice and future plans in multiple science domains in the context of the challenges facing both Big Data and the Exascale Computing. challenges. The review drew from public presentations, workshop reports and expert testimony. Data-intensive research activities are increasing in all domains of science, and exascale computing is a key enabler of these activities. We briefly summarize below the key findings and recommendations from this report from the perspective of identifying investments that are most likely to positively impact both data-intensive science goals and exascale computing goals.
|
| |

Scientific Discovery at the Exascale: Report from the DOE ASCR 2011 Workshop on Exascale Data Management, Analysis, and Visualization
S. Ahern, A. Shoshani, K.L. Ma, A. Choudhary, T. Critchlow, S. Klasky, V. Pascucci.
Department of Energy, February, 2011.
|
| |

DEFOG: A System for Data-Backed Visual Composition
L. Lins, D. Koop, J. Freire, C.T. Silva.
SCI Technical Report, No. UUSCI-2011-003, SCI Institute, University of Utah, 2011.
|
| |

Collaborative Monitoring and Analysis for Simulation Scientists
R. Tchoua, S. Klasky, N. Podhorszki, B. Grimm, A. Khan, E. Santos, C.T. Silva, P. Mouallem, M. Vouk.
In Proceedings of The 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010), pp. 235--244. 2010.
DOI: 10.1109/CTS.2010.5478506 Collaboratively monitoring and analyzing large scale simulations from petascale computers is an important area of research and development within the scientific community. This paper addresses these issues when teams of colleagues from different research areas work together to help understand the complex data generated from these simulations. In particular, we address the issues when geographically diverse teams of disparate researchers work together to understand the complex science being simulated on high performance computers. Most application scientists want to focus on the sciences and spend a minimum amount of time learning new tools or adopting new techniques to monitor and analyze their simulation data. The challenge of eSimMon, our web-based system is to decrease or eliminate some of the hurdles on the scientists' path to scientific discovery, and allow these collaborations to flourish.
|
| |

User-Driven Application Development
E. Santos, L. Lins, J. Ahrens, J. Freire, C.T. Silva.
In IEEE Transactions on Visualization and Computer Graphics, Proceedings of the 2009 IEEE Visualization Conference, pp. (accepted). Sept/Oct, 2009.
|
| |

Provenance Management: Challenges and Opportunities
J. Freire.
In Datenbanksysteme in Business, Technologie und Web (BTW), pp. 4. 2009.
|
| |

Using Workflow Medleys to Streamline Exploratory Tasks
E. Santos, D. Koop, H.T. Vo, E. Anderson, J. Freire, C.T. Silva.
In 21st International Conference on Scientific and Statistical Database Management (SSDBM), pp. 292--301. 2009.
|
| |

A First Study on Strategies for Generating Workflow Snippets
T. Ellkvist, L. Stromback, L. Lins, J. Freire.
In Proceedings of the ACM SIGMOD Intenational Workshop on Keyword Search on Structured Data (KEYS), pp. 15--20. 2009.
ISBN: 978-1-60558-570-3
|
| |

Using Mediation to Achieve Provenance Interoperability
T. Ellkvist, D. Koop, J. Freire, C.T. Silva, L. Stromback.
In Proceedings of the IEEE International Workshop on Scientific Workflows, 2009, pp. 291--298. 2009.
ISBN: 978-0-7695-3708-5
|