|
||||||||||||
|
|
||||||||||||
|
Feature Story: The VisTrails Group Announcing VisTrails 1.0! Project History. Professors Juliana Freire and Claudio Silva started this project in 2005 and they have been funded by grants and contracts from NSF, DOE, and IBM. The VisTrails team includes six Ph.D. students as well as M.S. and undergraduate students. Since its early development stages, the VisTrails system has been available to a number of external collaborators who have provided invaluable feedback. These include researchers at Cornell University, California Institute of Technology, The Oregon & Health Science University, and the University of Utah. A beta version of VisTrails was made available (as open source) in late January 2007. Since then, the system has been downloaded over 2000 times. VisTrails 1.0 will be released on October 31st. On the Need of Provenance for Computational Tasks. Computing has been an enormous accelerator for science, leading to an information explosion in many different fields. Future scientific advances depend on our ability to comprehend the vast amounts of data currently being produced and acquired. However, to analyze and understand this data we must assemble complex computational processes and generate insightful visualizations, which often require combining loosely coupled resources, specialized libraries, and grid and Web-services. Such processes generate yet more data, adding to the information overflow scientists currently deal with. Today, the scientific community uses ad hoc approaches for data exploration, but such approaches have serious limitations. In particular, scientists and engineers must expend substantial effort to manage these data (such as scripts that encode computational tasks, raw data, data products, images, and notes) and record provenance so that they can answer basic questions: Who created a data product and when? When was it modified, and who modified it? What process was used to create the data product? Were two data products derived from the same raw data? This process is not only time-consuming, but also error-prone. Without provenance, it’s difficult (and sometimes impossible) to reproduce and share results, solve problems collaboratively, validate results with different input data, and understand the process used to solve a particular problem. In addition, the longevity of data products becomes limited—without precise and sufficient information about how the data product was generated, its value diminishes significantly. The lack of adequate provenance support in visualization and workflow systems motivated us to build VisTrails, an open source provenance management system that provides infrastructure for data exploration and visualization through workflows. VisTrails transparently records detailed provenance of exploratory computational tasks (see Fig. 1). This information not only enables the reproducibility of results, but it also allows scientists to easily navigate through the space of workflows and parameter settings used in a given exploration task. Powerful operations are also possible through direct manipulation of the provenance information. These include the ability to re-use workflows and workflow fragments through a mechanism for refining workflows by analogies; to explore a multi-dimensional slice of the parameter space of a workflow and generate a large number of data products through bulk-updates; to analyze (and visualize) the differences between two workflows (see Fig. 2); and to support collaborative data exploration in a distributed and disconnected fashion. These operations, combined with an intuitive interface for comparing the results of different workflows, simplify, to a great extent, the scientific discovery process. VisTrails provides a comprehensive provenance management infrastructure that can be combined with and extend existing workflow and visualization systems. Some distinguishing features of the system include:
Obtaining the Software. Visit http://www.vistrails.org to access the VisTrails community Web site. There, you will find information including instructions for obtaining the software, online documentation, video tutorials, and pointers to papers and presentations. VisTrails is written in Python and it uses the multi-platform Qt library for its user interface. The system is available as open source; it is released under the GPL 2.0 license. The pre-compiled versions for Windows, Mac OS X, and Linux, come with an installer and include a number of packages, including VTK, matplotlib, and ImageMagick. Additional packages, including packages written by users, are also available (e.g., ITK, Matlab, Metro).
|
||||||||||||
![]() |