Over the next decades, we will take part in a revolution in media creation
by people who could not earlier afford to produce video in their homes,
schools, and offices. Just as desktop publishing gave consumers the power
of the printing press on their desks (but it took the Internet to make
everyone a publisher since without it the distribution channel was lacking),
and digital audio samplers gave birth to a whole new genre and population
of music makers, home media tools will enable consumers to make video
creation a part of their daily communication and entertainment. In the
spirit of garage bands, we can think of this new population of motion
picture producers as practitioners of "Garage Cinema." These are the people
who in the next century will be running a TV station/movie studio out
of their homes.
For home
media producers, a major obstacle is getting access to media content in
order to be able to tell a wider range of stories than they can record
or synthesize. The other major challenge lies in having tools which enable
them to manipulate media according to their content rather than requiring
the specialized skills needed in current motion picture and video production.
Once these challenges have been addressed, we can imagine a world in which
digital media are produced anywhere by anyone and are accessible to anyone
anywhere.
In the future,
annotation— the description of the structure of media content— will be
fully integrated into the production, archiving, retrieval, and reuse
of media data. However, there will remain many annotations which computers
won’t be able to automatically encode. A central challenge for computational
media technology is to develop a language of description which both humans
and computers can read and write and which will enable the integrated
annotation, creation, and reuse of media data. In order to overcome the
inherent limitations of current keyword-based video annotation and retrieval
systems, we need representations that capture the temporal, semantic,
and relational content of video data. These representations also need
to be convergent and scaleable to a global media archive. We have developed
a language for the representation of video content that addresses these
issues.
Our prototype
system, Media Streams, is an iconic visual language for annotating,
retrieving, and repurposing digital video and audio. Within Media Streams,
the organization and categories of the Icon
Space allow users to browse and compound over 7000 iconic primitives
by means of a cascading hierarchical structure that supports compounding
icons across branches of the hierarchy. A Media
Time Line enables users to visualize, browse, annotate, retrieve
and repurpose streams of video and audio content. Media Streams
was first developed at the MIT Media Laboratory by Marc Davis, Brian Williams,
and Golan Levin.
|
|
Davis,
Marc. "Media Streams: An Iconic Visual Language for Video Representation."
In: Readings in Human-Computer Interaction: Toward the Year 2000,
ed. Ronald M. Baecker, Jonathan Grudin, William A. S. Buxton, and
Saul Greenberg. 854-866. 2nd ed., San Francisco: Morgan Kaufmann Publishers,
Inc., 1995.
Abstract:
In order to enable the search and retrieval of video from large
archives, we need a representation language for video content. Although
some aspects of video can be automatically parsed, a sufficient
representation requires that video be annotated. We discuss the
design of a video representation language with special attention
to the issue of creating a global, reusable video archive. Our prototype
system, Media Streams, enables users to create multi-layered, iconic
annotations of streams of video data. Within Media Streams, the
organization and categories of the Icon Space allow users to browse
and compound over 3500 iconic primitives by means of a cascading
hierarchical structure that supports compounding icons across branches
of the hierarchy. A Media Time Line enables users to visualize,
browse, annotate, and retrieve video content. The challenges of
creating a representation of human action in video are discussed
in detail, with focus on the effect of the syntax of video sequences
on the semantics of video shots.
Keywords:
video archiving, visual language, video indexing, video retrieval,
knowledge representation, multimedia, visualization, iconic
language, film theory, graphical user interface design, repurposing,
computational cinema.
|
|
|
Davis,
Marc. "Garage Cinema and the Future of Media Technology." Communications
of the ACM (50th Anniversary Edition) 40 (2 1997): 42-48.
Abstract:
The twentieth century saw the invention and development of two fundamental,
new technologies for creating and manipulating representations of
the world: motion pictures and computation. Motion pictures gave
us the ability to capture and construct sequences of moving images
that enabled the creation of a new language of storytelling and
visual experience. Computation provided us a method of constructing
universal machines which, by manipulating representations of processes
and objects, can create new processes and objects, and even new
machines. The deep integration of computation and motion pictures
has not yet occurred. The implications of their deeper integration
over the next fifty years will have profound technological, linguistic,
and social effects. This article traces part of the history and
future of computational motion pictures as well as the cultural
factors this technology will draw on and foster.
Keywords:
visual language, iconic language, semasiography, film theory,
multimedia, computational cinema, mass media, popular culture, linguistics,
repurposing.
|
|