Data science is a constantly evolving field, and as such, it is important to continually explore new ideas for improving the tools we use. In this talk, I will talk about two ideas that may change how and where we build these tools.
First, I will argue that data science should be interactive and live, with no wait time for changing filters or updating parameters. Slow analysis has been shown to have disadvantages and even dangers, yet few tools have been able to provide both a seamless user experience and the necessary performance. We will explore how web developers have already achieved this level of interactivity and demonstrate how the same experience should be and can be delivered to data workers.
Second, we will examine how the browser is already how that data scientists access many tools, such as Jupyter. However, delays caused by network connections create new challenges for tool builders. We will explore the opportunities that new technologies like WebAssembly, WebGPU, and Apache Arrow offer for analysis and machine learning completely in the browser.
Bio: Dominik Moritz is on the faculty at Carnegie Mellon University where he co-directs the Data Interaction Group (https://dig.cmu.edu/) at the Human-Computer Interaction Institute. His group’s research develops interactive systems that empower everyone to effectively analyze and communicate data. Dominik also manages the visualization team in Apple’s machine learning organization. His systems (Vega-Lite, Falcon, Draco, Voyager, and others) have won awards at academic venues (e.g. IEEE VIS and CHI), are widely used in industry, and by the Python and JavaScript data science communities. Dominik got his PhD from the Paul G. Allen School at the University of Washington, where he was advised by Jeff Heer and Bill Howe.
Posted by: Nathan Galli