Transducer: Digital Audio-Visual Performance System

Reed Kram and John Maeda

MIT Media Laboratory, 20 Ames Street, Cambridge, MA 02139 USA. {kram, jtm}



This paper describes Transducer, a prototype digital system for live, audio-visual performance. The system allows a performer to build constructions of sampled audio and computational three-dimensional form simultaneously. Each sound clip is visualized as a "playable" cylinder of sound that can be manipulated both visually and aurally in real-time. The transducer system demonstrates a creative space with equal design detailing at both the construction and performance phase.


Entertainment, Interaction Design, Performance, Direct Manipulation, Real-time.


Figure 1: Sound Objects in Transition


Currently the process of editing sounds or manipulating three-dimensional structures on a computer remains a frustratingly rigid process. Current tools for real-time audio or visual performance using computers involve obtuse controls, either heavily GUI'ed or overstylized, making it difficult for the audience to understand exactly what the performer is doing. Considerable work has been done in the design of gesture recognition in the manipulation of three dimensional form, particularly by Zeleznik[2]. The most notable aspect of this approach is the sense of comprehensibility that the system appears to have, in spite of its gesture complexity. We akin the simple interactive language of Transducer to that of a DeeJay's mixer, encapsulating the magic of disk-jockey performance with concise visuals that are clearly in tune with the music.

The intent of this system is not to act as a traditional tool for editing audio, nor as a three-dimensional modeler. Transducer asks one to envision a space where the process of editing and creating on a computer becomes a dynamic performance which an audience can easily comprehend. The content of this performance may be sufficiently complex to elicit multiple interpretations, but Transducer enforces the notion that the process of creation should itself be a fluid and transparent expression.

An event on a computer screen is seldom thought of as a performance, though for interactive systems the exact moment of activity or participation is of primary importance. Thus any interactive system that emphasizes the process of action or the event of interaction may be thought of as a "digital performance," though it may only have an audience of one. However, for the purpose of this paper we will focus on a performance system designed to be enjoyed by a larger audience.

This type of digital performance is a special type of CHI situation where the standard metrics of interface quality do not directly apply. An audience expects something out of the ordinary, immediate, live. We focus on a simple input grammar and a deceivingly simple visual representation as the gateway to the environment.

Mouse action  Event
Click and drag vertically up or down Increase/decrease sound frequency, increase/decrease height of object
Click and drag horizontally towards or away from object center Increase/decrease sound amplitude, move object towards/away from center, increase/decrease object transparency
Short click on object Stop playing sound, return object to palette
Click anywhere not on object Bring up palette of all sound objects with active sounds visible behind

Table 1: Gestures for Sound Object Manipulation


The performer and audience of the Transducer system view a video projection which illuminates a screen hanging from above and listen to audio from two speakers. The performer acts upon the system with a single-button mouse. There are no menus in the Transducer system. All interface actions occur on cylindrical sound objects.

The system first presents a palette of cylindrical objects (as at the top of Figure 1). As the user/performer moves his or her mouse over each of the cylinders, he or she hears the sampled sound stream associated with that object. Each of the objects has a representative color and size corresponding to characteristics of the sound stream associated with it.

By clicking the left mouse button while over the object, the user selects the sound object to be manipulated. The palette of objects unfolds and drifts behind the camera (this transition can be seen in Figure 1), the sound associated with the chosen object begins to play, and the selected object moves to the "manipulation zone." In this area, four types of mouse actions control the sound object as shown in Table 1.

Additional sound objects can be previewed and any number of sound objects can be brought into the manipulation zone.

In this way a single user or performer is able to build simultaneous visual and audio constructions in real-time. The user can layer and examine interrelationships between multiple, diverse sound sources and corresponding visual forms.



Figure 2: Sound Object Structure



When audible, a given audio source is split into a range of pitches. Each pitch range is represented as a single cylinder. The amplitude of a pitch can be seen in the diameter of its corresponding cylinder. A complete sound stream forms a solid structure of combined cylinders arranged in increasing pitch from bottom to top. The sound structures shift and transform based on the changing audio source.

By parameterizing each aspect of the performance, we can create continuous (rather than discrete) relationships between the different dataspaces (audio and visual). Since our interface to the system is primarily gestural and our primary focus is on the ability to immediately conceptualize expression, we are not concerned with the precise representation of sound or visual structure. Rather, we are concerned with an acceptable simultaneous approximation of both, able to be realized in real-time. Along these lines, the system makes extensive use of interactive physics algorithms for scene transition and effects.

The motion of each object is modeled with a unique physics model. Thus two objects react differently to user input based on the internal "mass" and "drag" of each, as in Kram/Maeda[1]. This allows for smooth, dynamic reactivity of objects to both user input as well as that of a changing audio source.

Transducer consists of two computers: a Silicon Graphics Octane and an Intel PentiumPro 200. The Silicon Graphics computer handles visual computation and output and the Intel handles audio computation and output.


We intend to include the ability to add a microphone or microphones to drive additional sound objects from live sound sources. By using these basic geometric building blocks, we can imagine bending, twisting, and tapering sound structures.


As always, many thanks to the Aesthetics & Computation Group.


1. Kram, R. and Maeda, J. Dynamic3: Interactive Physics and Physicality in Three Dimensions, in Visual Proceedings of SIGGRAPH 96, ACM Press, 140.

2. Zeleznik, R. Sketch: An Interface for Sketching 3D Scenes, in Proceedings of SIGGRAPH'96, ACM Press, 163-170.