The dream is for such access not only to come about, but to be easy. Free, would be nice. Many historical, research, and educational collections are already providing unhindered public access, at least for viewing and sometimes for download.
So, three core factors are being negotiated in these still-early days of Internet librarianship: cost, access, and collections. This is all, of course, to state the obvious. Less clear is how quickly and smoothly technical means will evolve to facilitate the process. For how long will users – the casually curious public, the research community, the commercial sector – be patient while the kinks are worked out?
Mindful of the challenge of easy access, in 1998 the Baltimore Learning Community Project at the University of Maryland set up the Open Video Project, an effort to standardize access to online moving-image content. The project’s initial goal was to gather video of various kinds, digitize it, and offer it to researchers interested in developing standard practices and protocols for accessing it – ways to categorize material, shape and present it, search it.
The story of why that task remains incomplete makes for a cautionary tale of standardization – one that resembles early railroad coordination, or rather ill-coordination that resulted in clashing track gauges meeting at borders, causing delay and added cost. These breaks-of-gauge necessitated various forms of jerryrigging such as piggybacking, and nothing much has changed in the annals of large-scale engineering endeavors.
The material on the Open Video Project’s shared-video collection is available to researchers in digital video, multimedia retrieval, and digital libraries to help them to study such issues as algorithms for automatic segmentation, summarization, and creation of surrogates that describe video content. Users can also use the collection to experiment with the development of face-recognition algorithms, or to create and evaluate interfaces for displaying search results.
The Project set about collecting material from a variety of collections, and soon had material from several sources with assistance from the National Science Foundation, a large U.S. governmental funding agency. The repository began with the development of a basic framework and the digitization of initial content of about 195 video segments. Carnegie Mellon University’s Informedia Project, the Howard Hughes Medical Institute, and the Prelinger Archives were among the donors of video. Major additions then came from various U.S. government agencies such as the U.S. Records and Archives Administration and the National Aeronautics and Space Administration, NASA. All contributors of materials agreed to an open-access model that would facilitate researchers’ use of the collection. U.S. governmental contributions of videos had been prepared with US taxpayers’ money, making public access to them largely automatic.
How has the project fared? As is so often the case, not as its founders had envisaged. Says Marchionini, a professor of Information and Library Science at the University of North Carolina: “By the time Open Video got put up here at Carolina, in late 2000, it quickly became clear that people just wanted to use the video. So even though there have been examples of the collection being used as a test collection to create algorithms and do research on video retrieval, by far the greater use has been by teachers and individuals who wanted to download or use video in instruction or for artistic purposes or what have you.”
Even though there have been examples of the collection being used as a test collection to create algorithms and do research on video retrieval, by far the greater use has been by teachers and individuals who wanted to download or use video
Later, he says, “Google contacted me before they bought YouTube. They had the Google retrieval system, and they wanted open access to video, so I sent them the entire NASA collection of about 1,500 videos. So those got onto Google and ultimately YouTube when they bought YouTube.”
He adds: “Those have propagated all over. The Internet Archive has taken some of them, as well. Some of the Internet Archive things that were in Rick Prelinger’s collection, he actually gave us tapes to digitize back in 1999 or 2000, and we just couldn’t digitize them fast enough. So he joined with Brewster Kahle, [founder] of the Internet Archive, and they were able to get a lot of those out much more quickly than we were. And we went and grabbed big chunks of the Prelinger Collection and added them to our site. So there is a lot of this back and forth. Also, there’s an open-source group in the U.K. that does educational videos, and we provided them with most of the NASA videos and some others.”
Much of the content is copyright free. But there are wrinkles, says Marchionini: “We get a lot of requests from commercial entities that want to use the video, but I always say ‘No.’ They can download it, but they need to clear the rights, because even though most of the video is not under copyright because it was produced by, let’s say, a government agency in the U.S., there still might be music excerpts or particular shots done by a photographer who wants to claim rights. So we say this is available under Creative Commons, with attribution, but we make no claims on these videos. A lot of them come from government agencies, but not all. They come from a variety of sources, so as a university we were not willing to say we would own this or you can claim rights to it.
To a degree, then, the Project has been overrun by other collectors who have been working overtime to join the boom of online provision of digital video films, clips, and files. The Project’s web portal [http://www.open-video.org/index.php] has attracted as many as 60,000 unique users a month. It is stable, and only sometimes under active curation – when a graduate student at the University of North Carolina is interested in working on, say, how to organize, store, and present metadata: data about the content’s contents.
The project contents have served as a test corpus, as intended, but no centralized test corpus has gained wide enough acceptance to do what organizers hoped. Approaches have, instead, proceeded in a piece-meal manner. Development is taking place, for example, through such agencies and groups as the U.S. National Institute of Standards and Technology (NIST), which has been running an annual Text Retrieval Competition and Conference (TREC) since 1992 to facilitate work in computational linguistics. That is an interdisciplinary field that since the 1950s has tackled the thorny issue of how to create algorithms (a set of instructions in calculations or problem-solving operations) and software for intelligently processing language data. The competition was initially for text retrieval but during the last decade started a video component, too.
Other groups will create their own test collection, but there’s no standard test collection. Is that still a problem? “I think so,” says Marchionini. “We put up some high-definition videos with well-defined kinds of distortions, and then published each of the different versions on Open Video so people could go in and do tests to detect video alterations. That was a very specialized case. But otherwise to my knowledge there’s not a concerted effort to build a test collection.”
– Peter Monaghan