Merging and Transformation of Raster Images for Cartoon Animation

Bruce A. Wallace
Vol 15, No. 3, August, 1981, pp. 253-262.


The task of assembling drawings and backgrounds together for each frame of an animated sequence has always been a tedious undertaking using conventional animation camera stands, and has contributed to the high cost of animation production. In addition, the physical limitations that these camera stands place on the manipulation of the individual artwork levels restricts the total image-making possibilities afforded by traditional cartoon animation. Documents containing all frame assembly information must also be maintained. This paper presents several computer methods for assisting in the production of cartoon animation, both to reduce expense and to improve the overall quality.

Merging is the process of combining levels of artwork into a final composite frame using digital computer graphics. The term "level" refers to a single painted drawing (cel) or background. A method for the simulation of any hypothetical animation camera set-up is introduced. A technique is presented for reducing the total number of merges by retaining merged groups consisting of individual levels which do not change over successive frames. Lastly, a sequence-editing system, which controls precise definition of an animated sequence, is described. Also discussed is the actual method for merging any two adjacent levels and several computational and storage optimizations to speed the process.

Additional information available:

The role of this paper in the history of digital compositing

written by Marc Levoy
October, 2001

Digital compositing is at the heart of all modern desktop publishing and film production systems. The history of this technique is complicated, and many researchers contributed to its development. From a mathematical standpoint, this history can be conveniently partitioned into three steps:

  1. In 1977, Alvy Ray Smith and Ed Catmull, then working at the New York Institute of Technology, invented a method for blending a partially opaque foreground image over a completely opaque background image. The opacity of each pixel in the foreground is given by a third, single-channel ("alpha") image, whose values range from 0.0 ("transparent") to 1.0 ("opaque"). (Eventually, Smith and Catmull began treating alpha as integral to an image, leading to the notion of a 4-channel image: red, green, blue, and alpha.) Their blending method, sometimes called "digital matting", employs the linear interpolation formula [1]
    Cout = (Cfgd * Afgd) + (1 - Afgd) * Cbkg
    Cfgd = red, green, blue of foreground
    Cbkg = red, green, blue of background
    Afgd = alpha of foreground

  2. In 1980, Bruce Wallace and Marc Levoy, then working at Hanna-Barbara Productions, derived a recursive blending method, in which two partially opaque images are combined to produce a partially opaque result. Their formula, described on page 257 of Wallace's Siggraph '81 paper [2] (which in retrospect should have included Levoy as a co-author), is
    Aout = (1 - (1 - Afgd) * (1 - Abkg))
    Cout = (Cfgd * Afgd + (1 - Afgd) * Cbkg * Abkg) / Aout
    Cfgd = red, green, blue of foreground
    Cbkg = red, green, blue of background
    Afgd = alpha of foreground
    Abkg = alpha of background

  3. In 1984, Tom Porter and Tom Duff, then working at Lucasfilm, showed that by pre-multiplying Cbkg and Cfgd by Abkg and Afgd, respectively, Wallace and Levoy's formulae could be simplified to [3]
    Aout = Afgd + (1 - Afgd) * Abkg
    Cout' = Cfgd' + (1 - Afgd) * Cbkg'
    Cfgd' = Cfgd * Afgd
    Cbkg' = Cbkg * Abkg
    Cout' = Cout * Aout

What is the significance of these differences in formulation?

Smith and Catmull's formulation describes how to composite together two "layers" (images). Using their formulation, three or more layers can also be composited together, but to produce the correct result, processing must occur in depth order from bottom to top. Formally, given three layers A, B, and C, where A is the foreground and C is the background, then these three layers must be composited together as A over (B over C). The recursive formulation of Wallace and Levoy, by contrast, permits layers to be composited in any order that obeys associativity. In other words, these three layers can alternatively be composited as (A over B) over C.

Why is associativity important? In the context of Bruce Wallace's system, it conveys two advantages. First, if the foreground and midground layers are to be transformed by a common operator such as an image rotation, but the background is to be left alone, then the former two layers can be composited together before applying the rotation, thereby saving computation. Formally, given an expression containing two or more instances of the binary compositing operator (A over B) and two or more instances of a unary operator T(.), if the unary operator is distributive with respect to compositing, i.e. if T(A) over T(B) = T(A over B), then associativity permits the expression to be rewritten to reduce the number of unary operations. In other words, T(A) over (T(B) over C) can be rewritten as T(A over B) over C. Most image transformations are distributive with respect to compositing, including panning, zooming, rotation, and intensity fading [2]. Many lighting computations are also distributive in this way, a fact used to advantage in Rob Cook's shade tree system [4].

The second use to which associativity is put in Wallace's system is that if the background layer changes on successive frames of an animation sequence, but the midground and foreground layers do not, then the latter two layers need to be composited together only once, yielding another savings in computation. Formally, if the frames of an animation are treated as statements in a computer program, then this simplification is equivalent to finding common subexpressions among statements - a common compiler optimization. The equivalence between shading (and compositing) expressions and programming language statements was first demonstrated by Ken Perlin [5]. It also forms the basis for the RenderMan shading language [6]. Finding common subexpressions in shading and compositing expressions is explored further in [7] and [8], respectively.

The associativity of digital compositing is also important in modern volume rendering systems. As described in [9], some approximations to volume rendering can be computed using digital compositing. In this approximation, associativity permits slices of the volume, which are images with an opacity per pixel, to be composited from top to bottom as well as from bottom to top. In the context of a volume ray tracer, this ordering permits early ray termination, and it improves the performance of occupany accelerations such as octrees [10]. It also permits disjoint sets of adjacent slices to be composited together to form intermediate images, which are then composited together to form a final image. This in turn facilitates parallel and hardware implementations of volume rendering. In general, associativity permits any sequence of compositing operations to be represented in a binary tree, e.g. (A over B) over (C over D), where the two interior nodes of the tree, (A over B) and (C over D), can be computed independently and in parallel.

To finish our comparison, the formulation of Porter and Duff permits all four channels of an image (red, green, blue, and alpha) to be treated identically. This, combined with the fact that their formulation (like Wallace's) obeys associativity, facilitates implementation of digital compositing in hardware. Specifically, it permits two 4-channel images to be composited to yield a new 4-channel image, a basic operation in OpenGL [11] and therefore in most hardware-accelerated graphics systems. Finally, their formulation leads to an elegant algebra containing 12 operators, of which over is only one. OpenGL implements all 12 operators.

Summarizing, Smith and Catmull invented digital matting and the integral alpha channel, Wallace and Levoy showed how to blend partially opaque images recursively, thereby making compositing associative, and Porter and Duff introduced premultiplication by alpha, simplifying the computation and leading to a compositing algebra.

A didactic footnote

Although the compositing formulas are straightforward, it is surprisingly difficult to find a formal derivation of them, beginning strictly from the goal of an image-based method for combining superimposed layers of unknown, uncorrelated sub-pixel geometry.

Inspired by Bruce Wallace's and my work in 1981, I have worked out one such derivation. It begins by modeling pixels in a four channel image as small domains randomly covered by specks of color C with probability alpha, and it derives an expression for the expected color in a domain when two or more such images are superimposed. I teach this derivation in my Stanford course CS 248 (Introduction to Computer Graphics). Click here for my course notes giving the derivation.

As noted earlier, there is an intimate connection between digital compositing and volume rendering, which at its heart consists of compositing together a large number of images. In particular, Blinn has used the expectation of discrete random variables to model the passage of light through a participating (i.e. attenuating and scattering) medium [12]. His derivation leads to a formula that looks a lot like digital compositing. Taking this one step further, beginning with the integro-differential equation governing light transport in such a medium [13], and by applying certain simplifying assumptions, one can directly derive the over operator! I use this approach when teaching volume rendering. Click here for my course notes giving this derviation.


[1] Smith, A.R., Painting Tutorial Notes, SIGGRAPH '79 course on Computer Animation Techniques, 1979, ACM.

[2] Wallace, B.A., Merging and Transformation of Raster Images for Cartoon Animation, Proc. ACM SIGGRAPH '81, Vol 15, No. 3, 1981, ACM, pp. 253-262.

[3] Porter, T., Duff., T., Compositing digital images, Proc. ACM SIGGRAPH '84, Vol. 18, No. 3, 1984, ACM, pp. 253-259.

[4] Cook, R., Shade Trees, Proc. ACM SIGGRAPH '84, Vol. 18, No. 3, July, 1984, pp. 223-231.

[5] Perlin, K., An Image Synthesizer, Proc. ACM SIGGRAPH '85, Vol. 19, No.3, July, 1985, pp. 287-296.

[6] Hanrahan, P., Lawson, J., A Language for Shading and Lighting Calculations, Computer Graphics (Proc. Siggraph), Vol. 24, No. 4, August, 1990, pp. 289-298.

[7] Guenter, B., Knoblock, T.B., Ruf, E., Specializing shaders, Proc. SIGGRAPH '95 (Los Angeles, CA, August 6-11, 1995). In Computer Graphics Proceedings, Annual Conference Series, 1995, ACM SIGGRAPH, pp. 343-350.

[8] Shantzis, M.A., A Model for Efficient and Flexible Image Computing, Proc. SIGGRAPH '94 (Orlando, Florida, July 24-29, 1994). In Computer Graphics Proceedings, Annual Conference Series, 1994, ACM SIGGRAPH, pp. 147-154.

[9] Drebin, R., Carpenter, L., and Hanrahan, P., Volume Rendering, Proc. ACM SIGGRAPH '88, Vol. 22, No. 4, August, 1988, pp. 65-74.

[10] Levoy, M., Efficient Ray Tracing of Volume Data, ACM Transactions on Graphics, Vol. 9, No. 3, July, 1990, pp. 245-261.

[11] OpenGL Reference Manual, third edition, Dave Shreiner (ed.), Addison-Wesley, 2000.

[12] Blinn, J., Light reflection functions for simulation of clouds and dusty surfaces, Proc. ACM SIGGRAPH '82, Vol. 16, No. 3, July, 1982, pp. 21-29.

[13] Krueger, W., The Application of Transport Theory to Visualization of 3D Scalar Data Fields, Proc. IEEE Visualization '90, October, 1990, pp. 273-280.

The invention of two-background matte extraction

In Smith and Blinn's Siggraph '96 paper on blue screen matting [1], they describe a technique for extracting an alpha matte by photographing an object in front of two backgrounds of different color. They call this "Solution 3: triangulation".

As it turns out, Bruce Wallace, Chris Odgers, and I had independently invented the same solution at Hanna-Barbara Productions about 15 years earlier. We used it to derive alpha channels for what the artists called "overlays", i.e. paintings made on acetate sheets. We photographed these overlays over a white background and over a black background, then applied equations equivalent to Smith and Blinn's in order to derive an alpha matte.

Although the role of overlay artwork in the Hanna-Barbera system is described in Bruce Wallace's paper (see top of this web page), the two-background matte extraction was only described in Bruce's Master's thesis [2] (pages 51-53), not in his Siggraph paper. Smith and Blinn should therefore not be blamed for believing that their derivation of the technique was original. Our implementation of it was used in production from 1983 until 1996. In a humerous twist, I have for many years been assigning this technique as a homework problem to students in my computer graphics courses [3], [4].


[1] Smith, A.R., Blinn, J.F., Blue screen matting, Proc. SIGGRAPH '96 (New Orleans, LA, August 5-9, 1996). In Computer Graphics Proceedings, Annual Conference Series, 1996, ACM SIGGRAPH, pp. 259-268.

[2] Wallace, B.A., Automated production techniques in cartoon animation, Master's thesis, Cornell University, August, 1982.

[3] Levoy, M., Curless, B., CS 348B handout #28: homework assignment #2 (excerpt), Winter, 1992.
Click here for a PDF file

[4] Levoy, M., Curless, B., CS 348B handout #40: homework assignment #2 solutions (excerpt), Winter, 1992.
Click here for a PDF file

This page © Copyright 2001 by Marc Levoy
The paper © Copyright 1981 by ACM