Design of Many-Camera Tracking Systems for Scalability and Efficient Resource Allocation

Xing Chen, Ph.D. Dissertation, Stanford University, June 2002.


There are many applications that require knowledge of the location and/or orientation of people or objects moving through space, such as motion tracking for virtual reality applications. Systems using cameras as sensors have the advantages of non-intrusiveness and immunity to ferromagnetic distortion. A network of cameras allows for a larger working volume, higher tracking accuracy, more robustness and greater flexibility, but also introduces interesting problems. The architecture of such systems must be scalable to the number of cameras. Approaches are needed for scalable calibration of cameras into a single global coordinate frame, and also for the placement of cameras to optimize tracking performance.

We have designed and implemented M-Track, a scalable architecture for real-time motion tracking with tens of cameras. M-Track enables parallel processing of high-bandwidth image data. The employment of a Kalman filter based central estimator allows the asynchronous integration of information from camera-processor pairs. The architecture also enables the employment of cameras with different resolutions and frame rates, and supports tracking and automatic labeling of multiple features, even during temporary periods of occlusion. Three applications built upon this architecture demonstrate the usefulness of the system.

Next, we present a scalable wide-area multi-camera calibration scheme. Many asynchronous cameras can be calibrated into a single consistent coordinate frame by simply waving a bright light in front of them, even when cameras are arranged with non-overlapping working volumes and without initial estimates of camera poses. The construction of a universally visible physical calibration object is not necessary, and the method is easily adaptable to working volumes of variable size and shape.

We then propose a quantitative metric for evaluating the quality of multi-camera placement configurations. Previous work only uses the 3D uncertainty caused by limited camera resolution to evaluate quality and ignores occlusion. Our metric considers both camera resolution and the likelihood of target occlusion. The metric is based on a novel probabilistic model that estimates the dynamic self-occlusion of targets. We verify its validity through experimental data and analysis of various camera placement configurations.


[PDF 26 MB]

Related Papers:

Wide Area Camera Calibration Using Virtual Calibration Objects
Xing Chen, James Davis and Philipp Slusallek
IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR00), June 2000.

Camera Placement Considering Occlusion for Robust Motion Capture
Xing Chen and James Davis Stanford University Computer Science Technical Report, CS-TR-2000-07, December 2000.

Foveated Observation of Shape and Motion
Jason Davis and Xing Chen
IEEE Conf. on Robotics and Automation (ICRA) , 2003

LumiPoint: Multi-User Laser-Based Interaction on Large Tiled Displays
Xing Chen and James Davis
accepted to Displays (journal) , Elsevier Science, Volume 23, 2002.