The start of an ambitious project: Archiving the Web

| October 15, 2011 | 0 Comments

Ever wondered how to preserve that Pulitzer prize-worthy e-mail for posterity? The American non-profit Internet Archive is now able to provide libraries and other institutions with the tools to preserve what it calls “the ephemera of the Web” – websites and their various documents, images, videos and links. Internet Archive hosts collections of archived websites for more than 60 different American colleges and universities.

The American University in Cairo, for example, has a collection called ‘2011 Egyptian Revolution’, which includes blogs, Twitter feeds, photos, videos and online news coverage of the political tumult that engulfed the Egyptian capital earlier this year. Internet Archive’s main task, however, is to copy and preserve entire domains that researchers can navigate just as they would have at any point in the site’s history – even if it moves, changes or disappears. Though it sounds complex, Internet Archive works in much the same way that libraries have long preserved newspapers via microfilm.

And, says Robert Wolven, Associate University Librarian for bibliographic services and collection development at Columbia University, Manhattan, NY, USA, it has become necessary to create the Archive because, as the Internet has increasingly become society’s medium of record, it has become common for the authors of scholarly papers to cite Web content that has no corresponding print documents. It is also increasingly the case that Web addresses cited in footnotes sometimes point to a website that has expired, changed or moved. Archiving the Web requires skills and patience. Websites are fluidly ‘3-D’ – new content is added, while the old disappears, seemingly without a trace.

Websites are also notoriously unstable: Internet Archive estimates that the average lifespan of a website is between 44 and 75 days. Archiving the pages, files and embedded objects that make up websites can take weeks. Librarians at Columbia, one of Internet Archive’s more active partners, use an open-source ‘crawling’ tool, called Heritrix, to copy certain websites once every three months. Field dispatches, commission findings, annual reports and press releases are examples of content that are potentially valuable to scholars in a number of fields, including history, international affairs, sociology, law, political science and social work, says Wolven. Those working with Internet Archive point out, though, that its work will only really start showing its value decades down the line, when it offers access to archived versions of extinct websites that otherwise would have been lost to history.

Category: e-Education, Spring 2011

About the Author ()

News posts added for Independent Education by Global Latitude DMA

Leave a Reply

Your email address will not be published. Required fields are marked *