Changes in sound over time cause changes in a scene. The purpose of this project is mapping a sound into a scene. The audio and visuals are then played together, so that the sound is seen as well as heard. This is accomplished by generating a polygonal model of the sound for each frame via a suitable mapping from Sound Into Graphics and rendering the resulting model using scan conversion techniques.
Implications
The long-term goal is to synthesize both ways, i.e. to map sounds into objects, and objects into sounds. One could then hear changes in a scene that were hidden from the observer by other objects. A deaf person could watch music, and a blind person could hear graphics. Those with both senses intact could have a deeper insight into the nature of the signal that the artist is trying to communicate.
Background
This project began in 1982 at the University of Illinois. The ideas became refined during discussions between Jim Bozek a computer scientist/musician, Patrick Kane an engineer/graphicist who was experienced in computer music and myself an engineering student and occasional musician. One of the problems was choosing a suitable visual model to connect sound and visual spaces. One evening, after viewing the Van Cliburn piano competition, it became apparent that a modified piano key concept embodied the appropriate motion model. A displaced object (not a vibrating one), the piano 'key' produced a sound with a given duration. A convenient inverse map also existed; that is, a sound could be made to produce the displacement of a key as in the case of a player piano. This became labeled as the 'keys' interpretation, and seemed appropriate since it had a root in everyday experience, so that its implications could be understood intuitively by a non-technical audience.
Technique
The fast Fourier transform was used as a conversion between audio space and visual space. Music of interest was digitized at 40,000 samples per second and the discrete samples were converted to characteristic frequency values via FFT for each frame of 'action'. This meant that about 1667 samples contributed to each frame. The characteristic frequency values were then clustered and mapped onto a grid, that, in the 8 x 12 case, had a one to one correspondence with the keys of a piano.
Execution
Pat Kane and Jim Bozek did the FFT's in Illinois using the IEEE signal processing package and utility routines written by them in the 'C' language. Pat Kane also produced single frames of bicubic patches using the Raster Test-Bed (RTB) by Turner Whitted and David Weimer. The author produced single frames of "Piano Keys" polygon model using RTB and subsequently special purpose rendering software. The University of Illinois CSO VAX 11/780 was utilized for the FFT’s and for work done with bicubic patches. The Utah graphics VAX 11/750 was utilized for rendering.
Initial Goals
Five Second Leader
The Keys Interpretation
The Soap Film Interpretation
Another mapping of interest is that of using the sound transform to correspond to z values of a forcing function acting on a soap film that spans a rectangular domain. The surface is generated by solving the boundary value problem generated by Poisson's equation:
in a rectangular domain, using the 'key' heights as the magnitude of the forcing function for a particular point within the domain. The resulting surface is a 'minimal' surface in the mathematical sense, in that is has the minimum surface area that satisfies the constraints. It is, in effect, a stable global interpolant for the surface generated by the influence of the keys. This version was never done, although Pat Kane did test frames involving spline patches, one for each ‘key’.
The movie frames were generated using UNIX™ shell scripts. The ith frame of the 'keys' animation was produced using a statement like this:
where:
keyconv: generates the polygon model for the ith frame.
fft.i:
is the ith array of key altitudes.
apply:
applies the 4 x 4 transformation in file pos.i to object.
fb_clip:
clips the polygons to frame buffer dimensions.
scnv:
scan converts the resulting polygon model.
dd:
writes the resulting picture directly to magnetic tape.
The position files, which also include corrections for pixel aspect ratio, contained the representative transformations:
where:
ident: generates a 4 x 4 identity matrix.
obj_pos:
does the translation and rotation of the current 4 x 4.
perspec:
does the perspective transformation.
fb_aspect:
corrects for the nonsquare pixels on the output device.
pos.i:
is the resulting 4 times 4 transformation matrix.
The location values were computed using a view specification program that interpolated key frame values of these parameters. The view path program took the key frames and enabled the preview of the motion on a line drawing display using an iconic cube to represent the orientation of the keys platform with respect to the viewer. Correction for aspect ratio went as follows:
•
picture tube was 3 in y by 4 in x
•
pixels were 15 in x by 16 in y
So
aspect ratio was 3/4 x 15/16 = 45/64
Programming Timeline
TASK
PURPOSE PROGRAM TIME
clean
up scan conversion scnv.c 2 days
implement
anti aliasing scnv.c 4 days
rewrite
polygon generation keyconv.c 4 days
write
object transformation orient.c 2 days
write
object clipping fb_clip.c 1 day
write
matrix generation ident.c 1 day
write
translation/rotation obj_pos.c 1 day
write
perspec transformation perspec.c 1 day
write
aspect correction fb_aspect.c 1 day
write
shell script movie 1 day
Production Timeline
TASK TIME
frame generation 1 week
frame recording 1 week
film development ??
film evaluation 1 day
sound transfer 2 weeks
Important Changes in Approach - 1983
It was found that parsing the polygon model using yacc was slow. For the 'city of keys' which contained approximately 4500 polygons, it required two hours to parse and render the description file after the polygon description grammar had been compacted. Using the grammar as a way of specifying the model, allowed a great deal of expressive power, as it was much easier to produce and verify a text description of the model rather than a binary description. Debugged versions of the model generation program, the scan conversion program, and the object transformation program were combined into one 'mega' program. This was done to effect quick conversion of the input fft and transformation matrix into output rendered images. Times for the mega version varied from 1.5 to 4.5 minutes per frame, a much more reasonable time for animation.
Update - 1995
When the same ‘mega’ program was ported in its original condition to a PowerPC Macintosh 8500/120, the frame time dropped to 1.6 seconds. This constitutes a 60-fold speed up in 12 years.
Shot List - 1983
Shot List - film for SIGGRAPH '83
'*' indicates photography completed
-------------+----------------------------+------------------------------------
DURATION
| PICTURE | SOUND SHELL SCRIPT
-------------+----------------------------+------------------------------------
1
sec | second 9 of academy leader | silence * Leader/leader.sh
1
sec | second 8 of academy leader | silence *
1
sec | second 7 of academy leader | silence *
1
sec | second 6 of academy leader | silence *
1
sec | second 5 of academy leader | silence *
1
sec | second 4 of academy leader | silence *
1
sec | second 3 of academy leader | silence *
1/24
sec | academy leader - "pop" | silence *
1
23/24 sec | black screen | silence *
-------------+----------------------------+----------------------------------
9
sec | LEADER TIME SUBTOTAL 0 fr. rndr + 169 fr. util + 216 fr. mtrx
-------------+----------------------------+----------------------------------
4
sec | Title: Sound into Graphics | silence * Title/title.sh
1/6
sec | key only black screen | silence *
4
sec | Title: Experiment by BKW | silence *
1/6
sec | key only black screen | silence *
-------------+----------------------------+----------------------------------
8
2/6 sec | TITLE TIME SUBTOTAL 1 fr. rndr + 2 fr. util + 200 fr. mtrx
-------------+----------------------------+----------------------------------
5
sec | Title: Piano Map | silence * Title/scale.sh
1/6
sec | key only black screen | silence *
3
sec | Title: City of Keys | silence *
1/6
sec | key only black screen | silence *
25
sec | Piano Scale Up and Down | scale * Shoot/scale.sh
1/6
sec | key only black screen | silence *
-------------+----------------------------+----------------------------------
33
1/2 sec | SCALE TIME SUBTOTAL 600 fr. rndr + 2 fr. util + 804 fr. mtrx
-------------+----------------------------+----------------------------------
3
sec | Title: 33.3 | silence * Title/33.sh
1/6
sec | key only black screen | silence *
9
sec | Tumbling Keys 1032 - 1247 | space bells Shoot/harp
16
sec | Peak Keys 1248 - 1631 | harpsicord
10
sec | Receding Keys 1632 - 1871 | space bells
-------------+----------------------------+----------------------------------
38
1/6 sec | HARPSI. TIME SUBTOT 840 fr. rndr + 1 fr. util + 916 fr. mtrx
-------------+----------------------------+----------------------------------
1/6
sec | black screen | silence Shoot/credits
3
sec | Title: Special Thanks to | silence
1/6
sec | black screen | silence
3
sec | Title: UU CS Dept. | silence
1/6
sec | black screen | silence
3
sec | Title: UI CSO & CS Dept. | silence
1/6
sec | black screen | silence
3
sec | Title: NSF ARMY NAVY | silence
1/6
sec | black screen | silence
3
sec | Title: lewie spence todd | silence
1/6
sec | black screen | silence
-------------+----------------------------+----------------------------------
16
sec | CREDITS TIME SUBTOTAL 0 fr. rndr + 5 fr. util + 384 fr. mtrx
-------------+----------------------------+----------------------------------
GRAND TOTALS
-------------+----------------------+-------------+------------+-------------
SCREEN
TIME | MOVIE SECTION | RENDER FRMS | UTIL FRMS | MATRIX FRMS
-------------+----------------------+-------------+------------+-------------
9
sec | LEADER | 0 | 169 | 216
8
2/6 sec | TITLE | 1 | 2 | 200
33
1/2 sec | SCALE | 600 | 2 | 804
38
1/6 sec | HARPSI. | 840 | 1 | 916
16
sec | CREDITS | 0 | 5 | 384
-------------+----------------------+-------------+------------+-------------
105
sec | | 1441 | 179 | 2520
-------------+----------------------+-------------+------------+-------------
RENDER CPU TIME: (1441 fr.) x(180 sec/fr.) / (3600 sec/hr.) = 72.1 cpu hours
CAMERA TIME: (2520 fr.) x (30 sec/fr.) / (3600 sec/hr.) = 21.0 con hours
This
works out to 63.0 feet of 16mm color negative film.
Important Changes in Approach - Motion Control
After the first footage was processed, it was learned that the most pressing problem the unnatural suddenness of the motion, an artifact of linear interpolation of the key frames. This was fixed by interpolating the key frames using cubic splines.
Surprises
The application of perspective was vital to a correct look to the 'city of keys'. This was most likely due to the presence of a large number of rectilinear features whose interpretation was enhanced by the transformation.
Other Tools Developed
A tool for specifying a standard motion picture academy leader was developed and subsequently improved (See plates). A swept time analog frame counter, center number, crosshairs, were placed in a rainbow boundary, against a fractal background created especially for the project by Todd Fuqua.
Footage Processed
At
present five seconds of the 'city of keys' has been rendered and transferred
to 16mm film. The results have been encouraging. The 'City of Keys' version
was rendered using scan conversion software written by the author, that
rendered a polygon at a time into a full screen z buffer for hidden surface
elimination. A Phong lighting model was used, with the assumption of an
infinite light source. A finite light source version wherein the light
vector and the eye vector were recomputed for each visible point on the
surface was also tried but the rendering times proved to be too long for
practical animation.
Acknowledgments
Several individuals have provided assistance of various essential sorts, without which this short film could not have been made. Lewis Knapp, Spencer Thomas and Dino Schweitzer provided valuable technical assistance and kind criticism that was very helpful in improving the quality of the final product.
Date:
15 Jun 83 22:05 MDT
From:
Lewie Knapp <knapp>
Subject:
acknowledgement
Message-Id:
<8306160406.AA14421@UTAH-GR.ARPA>
To:
warren
_________________________________
>From
RIESENFELD@UTAH-20 Wed Jun 15 17:36:20 1983
>Date:
15 Jun 1983 1730-MDT
From:
RIESENFELD@UTAH-20 (Rich Riesenfeld)
Subject:
ACK
To:
knapp@UTAH-20, knapp@UTAH-GR
Title
of Work*
*
This work was supported in part by the National Science Foundation
(MCS-8203692
and MCS-8121750) and the U.S. Army Research Office
(DAAG29-81-K-0111
and DAAG29-82-K-0176) and the Office of Naval Research
(N00014-82-K-0351).
-------
Looks like a mouthful. Small print, I guess.
I probably (definitely) won't make it in tonight.
Give me a call if you need anything.
LK
1995 Update
The previous part of this document was accomplished largely in 1983, with some corrections and additions made. During the computation of the movie, there was some conflict regarding just how much CPU time was being occupied by the project. This was barely resolved in time to enable the project to continue. The film was completed in time for the 1983 SIGGRAPH computer graphics conference, but the referees would not accept material showing up the day of the conference. It was refereed formally in winter 1984 and shown at the summer 1984 SIGGRAPH conference where it was met with the cheers of a technical audience exceeding 8000. For some reason, it was not included in the conference summary video, so no historical record of this work exists except for the 16mm original.
1995 Additions
This document is the collection point for important details regarding the making of the original film and adoption of the techniques to modern processing platforms. For convenience, these additions will be appended to this document.