Polaris: interactive database visualization
 
The Polaris project at Stanford University is no longer active. However, commercial products are now available by

Tableau Software
www.tableausoftware.com

 

In the last several years, large multi-dimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and exploration tasks place significant demands on the interfaces to these databases. Because of the size of the data sets, dense graphical representations are more effective for exploration than spreadsheets and charts. Furthermore, because of the exploratory nature of the analysis, it must be possible for the analysts to change visualizations rapidly as they pursue a cycle involving first hypothesis and then experimentation.

The Polaris interface
The Polaris user interface (click to zoom).

Over the last several years, we have been developing Polaris, an interface for exploring large multi-dimensional databases that extends the well-known Pivot Table interface first popularized by Microsoft Excel. The novel features of Polaris include an interface for constructing visual specifications of table-based graphical displays and the ability to generate a precise set of relational queries from the visual specifications. The visual specification can be rapidly and incrementally developed, giving the analyst visual feedback as they construct complex queries and visualizations.

Formalism

The Polaris interface is simple and expressive because it is built on top of a formalism for describing table-based graphical representations of relational databases. Specifications in this language are compiled by our interpreter into a set of efficient queries and drawing operations to generate displays. The formalism precisely defines:

Visualizing the execution of a parallel
graphics application using Polaris.
Visualizing an event log from the execution of a parallel graphics application using Polaris.

  • The mapping of data sources to layers. Multiple data sources may be combined in a single Polaris visualization. Each data source maps to a separate layer or set of layers.
  • The number of rows, columns, and layers in the table and their relative orders (left to right as well as back to front). The database dimensions assigned to rows are specified by the fields on the x shelf, columns by fields on the y shelf, and layers by fields on the layer (z) shelf. Multiple fields may be dragged onto each shelf to show categorical relationships.
  • The selection of records from the database and the partitioning of records into different layers and panes.
  • The grouping of data within a pane and the computation of statistical properties and aggregates. Records may also be sorted into a given drawing order.
  • The type of graphic displayed in each pane of the table. Each graphic consists of a set of marks, one mark per record in that pane.
  • The mapping of data fields to retinal properties of the marks in the graphics. The mappings used for any given visualization are shown in a set of automatically generated legends.
A key component of the formalism is the table algebra we have defined. Using the table algebra, an analyst or programmer can specify the configuration of a sophisticated table by simply providing three algebraic expressions: one for the x-axis of the table, one for the y-axis, and one that defines layering (or the z-axis). In the Polaris interface the user constructs these table expressions by dragging and dropping fields on shelves; programmers can directly write the expressions as part of an XML specification.

Hierarchies and Data Cubes

Coffee data
Analysis of profit/sales data for a hypothetical coffee chain.

To support interactive analysis, many data warehouses are being augmented with hierarchical structures that provide meaningful levels of abstraction that can be leveraged by both the computer and analyst. These hierarchies can encode known semantic information about the underlying data warehouses or can be generated from algorithmic analysis such as classification or clustering. This hierarchical structure generates many challenges and opportunities in the design of systems for the query, analysis, and visualization of these databases. Our paper presented at KDD in July 2002 explains in detail how we extended the interface, formalism, and generation of data queries within Polaris to support hierarchically structured data warehouses.

Panning and Zooming

A compelling visualization architecture is pan-and-zoom. Most analysts start with an overview of the data before gradually refining their view to be more focused and detailed. Multiscale pan-and-zoom systems are effective because they directly support this approach. However, generating abstract overviews of large data sets is difficult, and most systems take advantage of only one type of abstraction: visual abstraction. Furthermore, these existing systems limit the analyst to a single zooming path on their data and thus a single set of abstract views.

Map data Gene data Network data
Screenshots from three different multiscale visualization systems developed using the Polaris formalism.

In our paper to be presented at Infovis in October 2002, we present a (1) a formalism for describing multiscale visualizations of data cubes with both data and visual abstraction, and (2) a method for independently zooming along one or more dimensions by traversing a zoom graph with nodes at different levels of detail. As an example of how to design multiscale visualizations using our system, we describe four design patterns using our formalism. These design patterns show the effectiveness of multiscale visualization of general relational databases.

The Polaris formalism is an important components of these multiscale systems--it is the mechanism we use to describe the individual nodes within the zoom graphics. This work is a nice example of how the Polaris formalism can be effectively used separately from the Polaris interface as the foundation of a visualization system.

Implementation and Release

The current implementation of Polaris is built within the Rivet visualization environment. Rivet is an environment for rapidly constructing visualization from components. The majority of Polaris is written in C++ and OpenGL with some pieces written in Tcl. Data can either be directly loaded into Rivet using built in parsers (regular expressions, delimited files) or stored externally in a database or datacube and accessed through OLE DB. Rivet (and thus Polaris) runs on Windows, Linux, and Solaris.

We are currently working on a new version of Polaris and Rivet that we intend to release to the public. This will likely be a native Windows application and will be available (hopefully) sometime this fall.

People

Publications

Presentations