The Open Video Project’s Dual Purposes

posted November 14, 2012

The Open Video Project’s rich and varied collection of video clips continues to make for fascinating browsing. Here, a screen grab from a 1944 film of Apa Tani shamans and their rituals from the Digital Himalayas project of the University of Cambridge and Yale University. The footage was recorded in Arunachal Pradesh (“land of the dawn-lit mountains”), a province in the far northeast of India, most of whose residents are Tibeto-Burman.

Sooner or later, we will all have access to large digital libraries of video.

The dream is for such access not only to come about, but to be easy. Free, would be nice. Many historical, research, and educational collections are already providing unhindered public access, at least for viewing and sometimes for download.

So, three core factors are being negotiated in these still-early days of Internet librarianship: cost, access, and collections. This is all, of course, to state the obvious. Less clear is how quickly and smoothly technical means will evolve to facilitate the process. For how long will users – the casually curious public, the research community, the commercial sector – be patient while the kinks are worked out?

Mindful of the challenge of easy access, in 1998 the Baltimore Learning Community Project at the University of Maryland set up the Open Video Project, an effort to standardize access to online moving-image content. The project’s initial goal was to gather video of various kinds, digitize it, and offer it to researchers interested in developing standard practices and protocols for accessing it – ways to categorize material, shape and present it, search it.

The story of why that task remains incomplete makes for a cautionary tale of standardization – one that resembles early railroad coordination, or rather ill-coordination that resulted in clashing track gauges meeting at borders, causing delay and added cost. These breaks-of-gauge necessitated various forms of jerryrigging such as piggybacking, and nothing much has changed in the annals of large-scale engineering endeavors.

The material on the Open Video Project’s shared-video collection is available to researchers in digital video, multimedia retrieval, and digital libraries to help them to study such issues as algorithms for automatic segmentation, summarization, and creation of surrogates that describe video content. Users can also use the collection to experiment with the development of face-recognition algorithms, or to create and evaluate interfaces for displaying search results.

A screen grab from a 45-second silent film from 1902, from the Edison Video Special Collection. It shows dancers on New York’s Bowery Street, a man and woman in ragged clothes and caps. The man appears to assault the woman, but they then launch into a cocky strut, a jitterbugging parody of a waltz. From the Library of Congress, Motion Picture, Broadcasting, and Recorded Sound Division.

The collection’s founders reasoned that if researchers and developers working on such issues could proceed from a standard collection of video material, they could more readily compare their approaches, and work together to arrive at optimal solutions. Then, says Gary Marchionini, the project’s current co-director, the collection could serve as a test corpus that would be “a standard so people doing video retrieval, algorithms, or techniques could do their work and then compare it to other people’s work; because so much is dependent on the content and quality of the video.”

The Project set about collecting material from a variety of collections, and soon had material from several sources with assistance from the National Science Foundation, a large U.S. governmental funding agency. The repository began with the development of a basic framework and the digitization of initial content of about 195 video segments. Carnegie Mellon University’s Informedia Project, the Howard Hughes Medical Institute, and the Prelinger Archives were among the donors of video. Major additions then came from various U.S. government agencies such as the U.S. Records and Archives Administration and the National Aeronautics and Space Administration, NASA. All contributors of materials agreed to an open-access model that would facilitate researchers’ use of the collection. U.S. governmental contributions of videos had been prepared with US taxpayers’ money, making public access to them largely automatic.

How has the project fared? As is so often the case, not as its founders had envisaged. Says Marchionini, a professor of Information and Library Science at the University of North Carolina: “By the time Open Video got put up here at Carolina, in late 2000, it quickly became clear that people just wanted to use the video. So even though there have been examples of the collection being used as a test collection to create algorithms and do research on video retrieval, by far the greater use has been by teachers and individuals who wanted to download or use video in instruction or for artistic purposes or what have you.”

Even though there have been examples of the collection being used as a test collection to create algorithms and do research on video retrieval, by far the greater use has been by teachers and individuals who wanted to download or use video

It may initially seem odd that many of the videos that the Project provides are also available through various free-standing collections. Says Marchionini: “That’s the case, now, but it wasn’t necessarily the case in 2000. For example, the NASA videos, which have been one of our more popular collections, we actually got those videos from NASA and digitized a bunch of them, and then they eventually sent us digital videos once they shot in digital. Those, we were one of first to put them up. We then shared them.”

Later, he says, “Google contacted me before they bought YouTube. They had the Google retrieval system, and they wanted open access to video, so I sent them the entire NASA collection of about 1,500 videos. So those got onto Google and ultimately YouTube when they bought YouTube.”

He adds: “Those have propagated all over. The Internet Archive has taken some of them, as well. Some of the Internet Archive things that were in Rick Prelinger’s collection, he actually gave us tapes to digitize back in 1999 or 2000, and we just couldn’t digitize them fast enough. So he joined with Brewster Kahle, [founder] of the Internet Archive, and they were able to get a lot of those out much more quickly than we were. And we went and grabbed big chunks of the Prelinger Collection and added them to our site. So there is a lot of this back and forth. Also, there’s an open-source group in the U.K. that does educational videos, and we provided them with most of the NASA videos and some others.”

A screen grab from a kinescope of the John F Kennedy and Richard M Nixon presidential debate of 1960. The approximately 15-minute black-and-white film is from the Internet Archive.

Organizations called from all over, offering video collections, small or large. Those included Densho: The Japanese American Legacy Project, which provided several videos relating to the internment of Japanese Americans during World War II. Also contributing was the Association for Computer Machinery – it provided, for example, its Conference on Computer Supported Cooperative Work and Social Computing – juried videos shown at conferences over the last 15 years. Says Marchionini: “Because I was active in those communities, we took those videos that we got as part of the conference and put them up with ACM saying ‘yeah, it’s OK.’ Those, I would be less inclined to say you can have access to because something that came out of Xerox Park, for example, clearly copyrighted by Xerox Park, they would like these things to be shown to people, but they don’t want anybody making money from them.”

Much of the content is copyright free. But there are wrinkles, says Marchionini: “We get a lot of requests from commercial entities that want to use the video, but I always say ‘No.’ They can download it, but they need to clear the rights, because even though most of the video is not under copyright because it was produced by, let’s say, a government agency in the U.S., there still might be music excerpts or particular shots done by a photographer who wants to claim rights. So we say this is available under Creative Commons, with attribution, but we make no claims on these videos. A lot of them come from government agencies, but not all. They come from a variety of sources, so as a university we were not willing to say we would own this or you can claim rights to it.

To a degree, then, the Project has been overrun by other collectors who have been working overtime to join the boom of online provision of digital video films, clips, and files. The Project’s web portal [http://www.open-video.org/index.php] has attracted as many as 60,000 unique users a month. It is stable, and only sometimes under active curation – when a graduate student at the University of North Carolina is interested in working on, say, how to organize, store, and present metadata: data about the content’s contents.

The project contents have served as a test corpus, as intended, but no centralized test corpus has gained wide enough acceptance to do what organizers hoped. Approaches have, instead, proceeded in a piece-meal manner. Development is taking place, for example, through such agencies and groups as the U.S. National Institute of Standards and Technology (NIST), which has been running an annual Text Retrieval Competition and Conference (TREC) since 1992 to facilitate work in computational linguistics. That is an interdisciplinary field that since the 1950s has tackled the thorny issue of how to create algorithms (a set of instructions in calculations or problem-solving operations) and software for intelligently processing language data. The competition was initially for text retrieval but during the last decade started a video component, too.

Other groups will create their own test collection, but there’s no standard test collection. Is that still a problem? “I think so,” says Marchionini. “We put up some high-definition videos with well-defined kinds of distortions, and then published each of the different versions on Open Video so people could go in and do tests to detect video alterations. That was a very specialized case. But otherwise to my knowledge there’s not a concerted effort to build a test collection.”

– Peter Monaghan

Categories: Features