Polaris: Database and Data Cube Visualization

Formalism
Hierarchies and Data Cubes
Pan and Zoom
Implementation and Release
People
Publications and Presentations

The Polaris project at Stanford University is no longer active. However, commercial products are now available by

Tableau Software
www.tableausoftware.com

In the last several years, large multi-dimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and exploration tasks place significant demands on the interfaces to these databases. Because of the size of the data sets, dense graphical representations are more effective for exploration than spreadsheets and charts. Furthermore, because of the exploratory nature of the analysis, it must be possible for the analysts to change visualizations rapidly as they pursue a cycle involving first hypothesis and then experimentation.

The Polaris user interface (click to zoom).

Over the last several years, we have been developing Polaris, an interface for exploring large multi-dimensional databases that extends the well-known Pivot Table interface first popularized by Microsoft Excel. The novel features of Polaris include an interface for constructing visual specifications of table-based graphical displays and the ability to generate a precise set of relational queries from the visual specifications. The visual specification can be rapidly and incrementally developed, giving the analyst visual feedback as they construct complex queries and visualizations.

Formalism

The Polaris interface is simple and expressive because it is built on top of a formalism for describing table-based graphical representations of relational databases. Specifications in this language are compiled by our interpreter into a set of efficient queries and drawing operations to generate displays. The formalism precisely defines:

Visualizing an event log from the execution of a parallel graphics application using Polaris.

The mapping of data sources to layers. Multiple data sources may be combined in a single Polaris visualization. Each data source maps to a separate layer or set of layers.
The number of rows, columns, and layers in the table and their relative orders (left to right as well as back to front). The database dimensions assigned to rows are specified by the fields on the x shelf, columns by fields on the y shelf, and layers by fields on the layer (z) shelf. Multiple fields may be dragged onto each shelf to show categorical relationships.
The selection of records from the database and the partitioning of records into different layers and panes.
The grouping of data within a pane and the computation of statistical properties and aggregates. Records may also be sorted into a given drawing order.
The type of graphic displayed in each pane of the table. Each graphic consists of a set of marks, one mark per record in that pane.
The mapping of data fields to retinal properties of the marks in the graphics. The mappings used for any given visualization are shown in a set of automatically generated legends.

A key component of the formalism is the table algebra we have defined. Using the table algebra, an analyst or programmer can specify the configuration of a sophisticated table by simply providing three algebraic expressions: one for the x-axis of the table, one for the y-axis, and one that defines layering (or the z-axis). In the Polaris interface the user constructs these table expressions by dragging and dropping fields on shelves; programmers can directly write the expressions as part of an XML specification.

Hierarchies and Data Cubes

Analysis of profit/sales data for a hypothetical coffee chain.

To support interactive analysis, many data warehouses are being augmented with hierarchical structures that provide meaningful levels of abstraction that can be leveraged by both the computer and analyst. These hierarchies can encode known semantic information about the underlying data warehouses or can be generated from algorithmic analysis such as classification or clustering. This hierarchical structure generates many challenges and opportunities in the design of systems for the query, analysis, and visualization of these databases. Our paper presented at KDD in July 2002 explains in detail how we extended the interface, formalism, and generation of data queries within Polaris to support hierarchically structured data warehouses.

Panning and Zooming

A compelling visualization architecture is pan-and-zoom. Most analysts start with an overview of the data before gradually refining their view to be more focused and detailed. Multiscale pan-and-zoom systems are effective because they directly support this approach. However, generating abstract overviews of large data sets is difficult, and most systems take advantage of only one type of abstraction: visual abstraction. Furthermore, these existing systems limit the analyst to a single zooming path on their data and thus a single set of abstract views.

Screenshots from three different multiscale visualization systems developed using the Polaris formalism.

In our paper to be presented at Infovis in October 2002, we present a (1) a formalism for describing multiscale visualizations of data cubes with both data and visual abstraction, and (2) a method for independently zooming along one or more dimensions by traversing a zoom graph with nodes at different levels of detail. As an example of how to design multiscale visualizations using our system, we describe four design patterns using our formalism. These design patterns show the effectiveness of multiscale visualization of general relational databases.

The Polaris formalism is an important components of these multiscale systems--it is the mechanism we use to describe the individual nodes within the zoom graphics. This work is a nice example of how the Polaris formalism can be effectively used separately from the Polaris interface as the foundation of a visualization system.

Implementation and Release

The current implementation of Polaris is built within the Rivet visualization environment. Rivet is an environment for rapidly constructing visualization from components. The majority of Polaris is written in C++ and OpenGL with some pieces written in Tcl. Data can either be directly loaded into Rivet using built in parsers (regular expressions, delimited files) or stored externally in a database or datacube and accessed through OLE DB. Rivet (and thus Polaris) runs on Windows, Linux, and Solaris.

We are currently working on a new version of Polaris and Rivet that we intend to release to the public. This will likely be a native Windows application and will be available (hopefully) sometime this fall.

People

Publications

Multiscale Visualization Using Data Cubes

Chris Stolte

Diane Tang

Pat Hanrahan

BEST PAPER AWARD

Proceedings of the Eighth IEEE Symposium on Information Visualization

paper

slides

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris
Chris Stolte, Diane Tang and Pat Hanrahan
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002.
(paper) (slides)

Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (extended paper)
Chris Stolte, Diane Tang and Pat Hanrahan
IEEE Transactions on Visualization and Computer Graphics, Vol. 8, No. 1, January 2002.
(paper)

Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases
Chris Stolte and Pat Hanrahan
Proceedings of the Sixth IEEE Symposium on Information Visualization, October 2000.
(paper) (slides)

Presentations

"Multiscale Visualization Using Data Cubes"

"Polaris: Query, Analysis, and Visualization of Large Hierarchical Relational Databases" given by Chris Stolte at the DIMACS Workshop on Data Mining and Visualization, October 2002.

"Polaris: Query, Analysis, and Visualization of Large Hierarchical Relational Databases" given by Pat Hanrahan at IBM, September 2002.

"Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris." given by Chris Stolte at KDD 2002.

"Polaris: A System for Query, Analysis, and Visualization of Relational Databases." guest lecture given by Chris Stolte in CS345: "Database Systems: Foundations and Frontiers" at Stanford in May 2002.

"Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases." given by Chris Stolte at Infovis 1999.