Query, Analysis, and Visualization of Multidimensional Databases

Chris Stolte, Ph.D. Dissertation, Stanford University, June 2003.

Abstract:

In recent years, large multidimensional databases, or data warehouses, have become common in a variety of commercial and scientific applications. It is not unusual for these data warehouses to contain billions of tuples, each categorized by tens or hundreds of dimensions. A major challenge with these databases is to extract meaning from the important data they contain: to discover structure, find patterns, and derive causal relationships. A promising technique for the analysis of these multidimensional databases is visualization. To make visualization effective in this context, we need to develop tools that tightly integrate visual presentation and database queries, support interactive refinement of the display, and can visually present a large number of tuples and dimensions. This dissertation introduces a formal approach to building visualization systems that addresses these demands. The foundation of the dissertation is the Polaris formalism, a language for precisely describing a wide range of table-based graphical presentations of relational information. A key aspect of this formal language is the ability to compile visual specifications automatically into the precise queries and drawing commands necessary to generate the display. This ability enables us to design systems that closely integrate analysis and visualization. Using the Polaris formalism, we have built two interactive systems: the Polaris interface and a framework for multiscale visualization.

The Polaris interface for the exploration of multidimensional databases extends the popular Pivot Table interface to generate a rich, expressive set of graphic displays. The Polaris interface is simple and expressive because it is built upon the Polaris formalism. Analysts can incrementally construct complex queries, receiving visual feedback as they assemble and alter the query. The Polaris interface is a generally applicable tool that tightly integrates analysis with visualization. This dissertation also demonstrates how to use the Polaris formalism and data cubes to specify and implement domain specific multiscale (pan-and-zoom) visualizations efficiently. The presented approach to multiscale visualization addresses several limitations in the current approaches by introducing multiple zoom paths into the data and providing general mechanisms for abstraction.

Dissertation:

Query, Analysis, and Visualization of Multidimensional Databases PDF (24 MB)

Products and Software:

A commercial product based on this dissertation is now available from Tableau Software.

Related Papers:

Multiscale Visualization Using Data Cubes
Chris Stolte, Diane Tang and Pat Hanrahan
Best Paper Award
Proceedings of the Eighth IEEE Symposium on Information Visualization, October 2002.

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris
Chris Stolte, Diane Tang and Pat Hanrahan
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002.

Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (extended paper)
Chris Stolte, Diane Tang and Pat Hanrahan
IEEE Transactions on Visualization and Computer Graphics, Vol. 8, No. 1, January 2002.

Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases
Chris Stolte and Pat Hanrahan
Proceedings of the Sixth IEEE Symposium on Information Visualization, October 2000.


cstolte@graphics.stanford.edu