Media Streams:
An Iconic Visual Language for
Media Annotation and Retrieval

by Marc Davis



Over the next decades, we will take part in a revolution in media creation by people who could not earlier afford to produce video in their homes, schools, and offices. Just as desktop publishing gave consumers the power of the printing press on their desks (but it took the Internet to make everyone a publisher since without it the distribution channel was lacking), and digital audio samplers gave birth to a whole new genre and population of music makers, home media tools will enable consumers to make video creation a part of their daily communication and entertainment. In the spirit of garage bands, we can think of this new population of motion picture producers as practitioners of "Garage Cinema." These are the people who in the next century will be running a TV station/movie studio out of their homes.

For home media producers, a major obstacle is getting access to media content in order to be able to tell a wider range of stories than they can record or synthesize. The other major challenge lies in having tools which enable them to manipulate media according to their content rather than requiring the specialized skills needed in current motion picture and video production. Once these challenges have been addressed, we can imagine a world in which digital media are produced anywhere by anyone and are accessible to anyone anywhere.

In the future, annotation— the description of the structure of media content— will be fully integrated into the production, archiving, retrieval, and reuse of media data. However, there will remain many annotations which computers won’t be able to automatically encode. A central challenge for computational media technology is to develop a language of description which both humans and computers can read and write and which will enable the integrated annotation, creation, and reuse of media data. In order to overcome the inherent limitations of current keyword-based video annotation and retrieval systems, we need representations that capture the temporal, semantic, and relational content of video data. These representations also need to be convergent and scaleable to a global media archive. We have developed a language for the representation of video content that addresses these issues.

Our prototype system, Media Streams, is an iconic visual language for annotating, retrieving, and repurposing digital video and audio. Within Media Streams, the organization and categories of the Icon Space allow users to browse and compound over 7000 iconic primitives by means of a cascading hierarchical structure that supports compounding icons across branches of the hierarchy. A Media Time Line enables users to visualize, browse, annotate, retrieve and repurpose streams of video and audio content. Media Streams was first developed at the MIT Media Laboratory by Marc Davis, Brian Williams, and Golan Levin.

 

For more information on Media Streams, please see the following articles:

  Davis, Marc. "Media Streams: An Iconic Visual Language for Video Representation." In: Readings in Human-Computer Interaction: Toward the Year 2000, ed. Ronald M. Baecker, Jonathan Grudin, William A. S. Buxton, and Saul Greenberg. 854-866. 2nd ed., San Francisco: Morgan Kaufmann Publishers, Inc., 1995.

Abstract:
In order to enable the search and retrieval of video from large archives, we need a representation language for video content. Although some aspects of video can be automatically parsed, a sufficient representation requires that video be annotated. We discuss the design of a video representation language with special attention to the issue of creating a global, reusable video archive. Our prototype system, Media Streams, enables users to create multi-layered, iconic annotations of streams of video data. Within Media Streams, the organization and categories of the Icon Space allow users to browse and compound over 3500 iconic primitives by means of a cascading hierarchical structure that supports compounding icons across branches of the hierarchy. A Media Time Line enables users to visualize, browse, annotate, and retrieve video content. The challenges of creating a representation of human action in video are discussed in detail, with focus on the effect of the syntax of video sequences on the semantics of video shots.

Keywords: video archiving, visual language, video indexing, video retrieval, knowledge representation, multimedia, visualization, iconic language, film theory, graphical user interface design, repurposing, computational cinema.

 

Davis, Marc. "Garage Cinema and the Future of Media Technology." Communications of the ACM (50th Anniversary Edition) 40 (2 1997): 42-48.

Abstract:
The twentieth century saw the invention and development of two fundamental, new technologies for creating and manipulating representations of the world: motion pictures and computation. Motion pictures gave us the ability to capture and construct sequences of moving images that enabled the creation of a new language of storytelling and visual experience. Computation provided us a method of constructing universal machines which, by manipulating representations of processes and objects, can create new processes and objects, and even new machines. The deep integration of computation and motion pictures has not yet occurred. The implications of their deeper integration over the next fifty years will have profound technological, linguistic, and social effects. This article traces part of the history and future of computational motion pictures as well as the cultural factors this technology will draw on and foster.

Keywords: visual language, iconic language, semasiography, film theory, multimedia, computational cinema, mass media, popular culture, linguistics, repurposing.