Interface
Metaphors and
Signal Representation for Audiovisual Performance Systems by Golan Levin Thesis Proposal for the degree of Master of Science
in Media Arts and Sciences |
Table of Contents |
This thesis proposes an investigation into the development of software environments that enable the simultaneous performance of moving image and sound. The goal of such systems is to provide easily apprehensible yet extremely malleable environments for creative expression and self-discovery. Current systems for audiovisual authoring have little or no knowledge about gestural data, and employ interaction metaphors and interface devices that substantially constrict the space of possible results. The proposed works will introduce new metaphors and new technologies for mapping between the dimensions of an audiovisual simulation, and signal representations of gestural information captured from a variety of physical interfaces. The thesis document will include an analysis and taxonomy of the instrument design space, an evaluation of the introduced techniques, and a condensation of design principles for audiovisual instruments generally. |
"In the impossibility of replacing the essential element of color by words or other means lies the possibility of a monumental art. Here, amidst extremely rich and different combinations, there remains to be discovered one that is based upon the principle [that] the same inner sound can be rendered at the same moment by different arts. But apart from this general sound, each art will display that extra element which is essential and peculiar to itself, thereby adding to that inner sound which they have in common a richness and power that cannot be attained by one art alone." Wassily Kandinsky (1912) A few weeks ago the New York Times reported the discovery of a 9,000 year old flute in China. Remarkably enough, the flute was still playable. As I listened in awe to soundfiles of the flute that the Times had posted to the Web, I was struck by an awareness that the human drive toward creative expression, as it is realized through such vehicles as musical instruments and drawing materials, must be among the oldest and most universal of human desires. The thesis I propose here seeks to fulfill our will to creative expression, by making new expressions possible, and by advancing the state of the art in our contemporary means. My focus is the design of systems which make possible the simultaneous performance of animated image and sound. I have chosen to implement these systems by making use of the digital computer's capacity to synthesize graphics and sound in response to real-time, high-bandwidth gestural inputs. I am not the first person to attempt to design such a system. In fact, the vision of a performance medium which unifies sound and image has a long history, as Wassily Kandinsky's quote suggests. Instead, I hope to bring to this history a provocative new set of questions and answers about the power, beauty, sophistication and personality that it is possible for an audiovisual instrument to have. A successful work to emerge from this thesis would be a meta-artwork whose interface was supple and easy to learn, but which also yielded interesting, infinitely variable, and personally expressive performances in both the visual and aural domains. I hope to produce several examples of just such works, by bringing two things to bear on the problem space of audiovisual instruments: new technologies, such as knowledge representation and multi-dimensional gestural interfaces, and a new aesthetic, which seeks to substantiate such works with an underpinning of perceptual motivation. This work is important as it represents a vision for creative activity on the computer, in which uniquely ephemeral dynamic media blossom from the expressive "voice" of a human user. At the end of my design cycle, I intend to analyze my software artifacts in order to tease apart and taxonomize the elements of their design space, evaluate the success of the techniques I have introduced, and extract principles for the design of future audiovisual instruments. |
The synchrony of abstract image and sound, variably known as ocular music, visual music, color music, or music for the eyes, has a history that spans several centuries of work by dozens of gifted practitioners [17]. Despite the breadth and depth of this history, however, a casual Web search reveals an unfortunate ignorance of it, as numerous sites continue to advertise "an entirely novel concept, relating graphics and music" or something similar [5]. Adrien Bernard Klein, in his 1927 book Color-Music: the Art of Light, deftly characterized this myopia: "It is an odd fact that almost everyone who develops a color-organ is under the misapprehension that he, or she, is the first mortal to attempt to do so" [9]. The earliest known device for performing visual music was built in 1734 by the Jesuit priest and mathematician, Father Louis-Bertrand Castel. Castel's Ocular Harpsichord coupled the action of a harpsichord to the movement of transparent tapes, whose colors were believed by Castel to correspond to the notes of the occidental musical scale [14]. In 1789, Erasmus Darwin suggested that visual music could be produced by projecting light from oil lamps through colored glasses; his proposal was implemented in 1844 by D. D. Jameson, whose "color organ" filtered light through liquids of various colors and reflected it off metal plates onto a wall [14]. Thereafter followed a steady development of audiovisual instruments, employing a wide range of technologies and materials: Frederic Kastner's 1869 Pyrophone, for example, opened flaming gas jets into crystal tubes to create both sound and image [15], while an 1877 device by Bainbridge Bishop sat atop a pipe organ and produced light with a high-voltage electric arc [14]. An instrument patented by William Schooling in 1895 controlled the illumination of variously-shaped vacuum tubes with a keyboard and set of foot-pedals [14]. Other historic examples include George Hall's Musichrome (1930s), Morgan Russell and Stanton Macdonald-Wright's Kinetic Light Machine (1931), Charles Dockum's Mobile Color (1952), Gordon Pask and McKinnon Wood's Musicolour machines (1953), and Jordon Belson's liquid-based instruments from the late 1950's [13, 14, 15]. Apart from these, two twentieth-century instruments deserve special mention: Thomas Wilfred's Clavilux (1920's) [21], and Oskar Fischinger's Lumigraph (1948), both of which achieved considerable critical acclaim through international high-art performances. Both were optomechanical; the Clavilux filtered light through several stages of multicolored glass disks, while the Lumigraph interupted colored beams of light with a flexible fabric surface. Interestingly, these instruments also became modest commercial successes as home entertainment systems, and as such penetrated the collective cultural consciousness to an unprecedented degree. While these innovators developed "real-time" tools for the performance of visual music, other pioneers composed elaborate visual statements in the off-line laboratory of the newly invented animation studio. Operating from deeply held beliefs in a "universal language of abstract form," animators like Walter Ruttman, Viking Eggeling, Len Lye, Norman McLaren and Oskar Fischinger began systematic studies of abstract temporal composition in order to uncover "the rules of a plastic counterpoint" [18]. Landmark events in abstract cinema included the 1921 Frankfurt run of Ruttmann's short Lichtspiel Opus I, thought to have been the first screening ever of an abstract film for a general audience [18], and the 1924 release of Eggeling's Diagonal Symphony, which was the first entirely abstract film. The painstakingly constructed efforts of these and other artists dramatically expanded the language of dynamic visual form, at a time when the language of cinematic montage itself was only beginning to be understood. Of particular inspiration to the work I propose in this thesis is the cinematic vocabulary developed by the New Zealand animator Len Lye (active 1930-1960), who explored the dynamic properties of cameraless animation techniques such as drawing, scratching and painting directly on celluloid. Lye's work vaults the gulf between the vitality of performance and the precision of composition, for even though his movies were meticulously constructed in his animation studio, his process of improvisation survives on-screen in frenetic and biomorphic works that are a direct connection to his own experience, thought and mark-making [20]. Physical color organs are burdened by an inherent tradeoff in their ability to yield specific versus general content [19]. The control of detailed or precise images requires a specificity of generative means, whereas the use of highly general means tends to produce amorphous and difficult-to-control results. To display the image of a triangle in the physical world, for example, requires a triangular chip of transparent material, or a triangular aperture—and that triangular element can do little else but make triangles. By projecting light through a tray of immiscible colored liquids, on the other hand, one can produce an infinity of outcomes, but its inchoate and complex results can be only vaguely directed. Computer technology has made it possible for visual music designers to transcend the limitations of physics, mechanics and optics, and overcome the specific/general conflict inherent in electromechanical and optomechanical visual instruments. One of the first artists to take advantage of these means was the California filmmaker John Whitney, who began his studies of computational dynamic form in 1960 after twenty years of producing animations optomechanically. Shortly thereafter, Myron Krueger made some of the most fundamental developments in the connection between interaction and computer graphics; his 1969 Video Place used information from motion capture to direct interactions with abstract forms [10]. Since that time, the fields of computer graphics and human-computer interaction have burgeoned considerably. Three important and relatively recent computational precursors to this research are Timepaint by John Maeda, the Motion Phone by Scott Snibbe [20], and Music Insects (later sold as SimTunes) by Toshio Iwai, all of which were developed in the early 1990's. John Maeda's Timepaint is a delicate illustration of the dynamic process by which apparently static marks are made: by extending a gesture's temporal record into the third dimension, Maeda's work can flip between a flat animated composition and a volumetric diagram of temporality. Snibbe's Motion Phone is an application for interactively authoring dynamic animations; it accretes recordings of gestures into an abstract animation loop, creating lively and rhythmic patterns of colorful triangles, squares, circles and lines. Although it produces no sound, it is nonetheless an important example of a purely "visual instrument." Music Insects, on the other hand, is a paint program in which the pixels deposited by the user operate as scorelike elements in a music-producing simulation. Music Insects differs from the work I propose here insofar as its users produce static rather than dynamic images, and in the fact that it treats user input as positional (discrete) rather than gestural (continuous) data. Over the past three years, I have developed approximately twenty small interactive systems which interpret the dynamism of two-dimensional gestures in abstract animation spaces. Six were developed when I was at the Interval Research Corporation in Palo Alto: Disctopia, Blebs, Streamer, Escargogolator, Schizosticks, and Polygona Nervosa, the last four of which were produced in collaboration with Scott Snibbe. The remainder were developed over the last year and a half at MIT: Molassograph, Splat, Stripe, Ribble, Telephone, Polka, Directrix, Meshy, Curly, Floccus, Floo, and Aurora. What these all share is the treatment of temporal gestures as inputs to dynamic simulations. My two most recent applications, Yellowtail and Loom, are my first to additionally permit the performance of synthesized sound, and represent early forays into the work I propose in this thesis. In the next section, I detail the principal design goals, and concomitant space of technical challenges, that I have crystallized from my three years' research into the development of digital performance systems. It is my hope that by applying new techniques in the service of a personal aesthetic, it shall be possible to provide provocative new answers to some very old questions. |
In the process of building the body of work which has
led to this proposal, and through conversations with numerous collaborators
and mentors, I have come to articulate a set of desiderata for the design
of audiovisual instruments. These stipulations, taken together with the
technical challenges they entail, comprise the methodology which will
direct the execution of my thesis work:
The above goals are chiefly aesthetic desiderata, but they entail many important technical questions. In the course of pursuing the above design goals, I expect to encounter and develop solutions to the following technological challenges:
|
In order to evaluate the success or failure of the proposed work, it is helpful to establish the context in which the work is positioned and according to whose standards it should be measured. As with many Media Laboratory theses, this is made difficult by the interdisciplinary nature of the work; the software systems that support this thesis inhabit a domain at the juncture of art, design, and the engineering of tools and instruments. As artworks, they fit within and extend an established Twentieth Century tradition in which artworks are themselves generative systems for other media; in Marshall McLuhan's terms, such systems are characterized by an "outer medium" (in my case, gestural performance and interaction) whose forms make possible the articulation of yet other expressions in an "inner medium" (for this work, synthetic animation and sound). Distinguishing such meta-artworks from the kinds of artifacts we conventionally call "tools" or "instruments" is largely a question of semantics and context; certainly the works I propose fit well within the usual definitions of these categories. I take exception to these labels only insofar as they carry with them the implication that a given tool or instrument is successful only if it is held to be useful and desirable by a broad base of consumers. I am not developing these systems with an audience of general users in mind, but rather as vehicles through which I can explore and present a strictly personal vocabulary of design practice, and suggest new technological solutions for human-machine interaction. In this sense this thesis will bear greater similarity to a "Hyperinstruments model" of artistic activity and technological craft (e.g., in which an artist originates specialized tools for himself or herself), than to a commercial, "Adobe model" of populist software development (e.g., in which market-driven usability specialists refine plug-and-play solutions for efficiency-seeking consumers). Thus, although my software may coincidently have some potential marketability—an opinion drawn merely from my own observation that numerous people have enjoyed its use—I leave its evaluation by such metrics to those who are customarily concerned with maximizing this sort of value. Instead of the marketplace, I choose as contexts of evaluation the music hall and the art gallery, and submit that the software artifacts supporting this thesis should minimally be able to support (A) a public performance by expert users, and (B) an engaging experience for interested gallerygoers. In the next section of this proposal, Deliverables, I outline specific plans for just such situated review. |
|
|
For software development I will need the regular use of one SGI Octane computer and one dedicated Windows NT computer. For sound performance and recording I will also need the occasional use of a small multi-channel mixer and an electronic reverb unit. All of these resources are currently in place or easily accessible. |
John Maeda is Sony Career Development Professor of Media Arts and Sciences, Assistant Professor of Design and Computation at the MIT Media Laboratory, where he also directs the Aesthetics & Computation Group (ACG). His mission at MIT is to foster the development of individuals who can find the natural intersection between the disciplines of computer science and visual communication. Tod Machover is Professor of Music & Media, Head of the Opera of the Future/ Hyperinstruments Group, and Co-Director of the Things That Think (TTT) and Toys of Tomorrow (TOT) consortia at the MIT Media Laboratory. He is also a composer, respected for his innovative syntheses of music and novel technologies. Marc Davis is Chairman and Chief Technology Officer of Amova.com. His mission is to revolutionize popular culture with highly personalized video media. Marc received his doctorate from the Machine Understanding Group of the Learning and Common Sense Section at the MIT Media Laboratory, and has a diverse background in literary theory, media technology, film theory, and artificial intelligence. |
|