skip to content rich footer

StevenClark.com.au

subscibe to the StevenClark.com.au rss feed

The Internet Archive Bollocks

Step right up, step right up, ladies and gentlemen of the Internet. The frantic effort is underway to archive the Internet warts-and-all so that future digital archeologists can turn back time and rediscover our 2011 web design rock stars in way-forward 2070. Or, more modestly, so people like Jeremy Keith can still find their social networking identities long after Zeldman has left the building.

Jeremy Keith tells me Pruning is Bollocks

Here’s my premise::: I would suggest that the biggest problem we have with archiving anything substantial from the Internet isn’t mass archive. Our problem is digging out what should be archived & what should be let to fade away gracefully into the vacuum of history.

Jeremy Keith, in a tweet, says “That’s utter bollocks… in my humble opinion.

Somehow I doubt that opinion is humble on the subject of Internet archival and I’d humbly reply that in my opinion, regardless of my lack of rock star conference speaking and book authoring status, when Jeremy Keith says it’s about archiving everything (culture) then bollocks to that. Double bollocks. The problem is bigger than bookmarks, LOLcats or relying on businesses stupid enough – like Delicious – to think their business model of FREE was going to take them anywhere in the long-term.

But I have a number of comments and questions regarding the arrogance of a total archival of the Internet (which we’re really synonymously using to mean the World Wide Web, an application that runs on the Internet). Let’s start with the value of information.

The Quality & Value of Information going into an Archive

My comment about sifting out the crap to find the value relies on a scientific fact of life – any repository of data (and therefore information – data + context) is only as valuable as the quality of the data. Any business knows that data becomes outdated because people move addresses and telephone numbers change. Hey, people even lie about stuff – so there you go with more crap into any repository. My suggestion that pruning is a part of any challenge to archive the Internet is based on that premise… a big load of rubbish pushed into an unordered box without pruning would have limited value. It would be information soup.

Leading to my next question… considering you did work out a specific time and date to gain a snapshot of the entire web then you have to accept the errors in that snapshot devalue it’s worth. Wikipedia pages are dynamic and not static… you are going to snapshot half edited and incorrect content at an arbitrary time. My own website often edits within 24 hours after publication. A snapshot archive is without correction or retraction and loses a lot of the relevance that makes the Internet of NOW valuable.

Then I wonder about the business case: who is going to use this archive? The assumption is that someone at some time will immerse themselves in this ‘old archived Internet’ – then who? And why? How will this make money? Who will perpetually maintain it and where? Is this a private organisation? Who will have access to that information – because if some of that information is Jeremy Keith’s then I can assure you some of that information must be mine and yours.

The Privacy & Legal Issues of Archiving ‘Our’ Information

I do not believe any private citizen has the right to make choices on my behalf about what is kept in an archive. If I am as mad as hell about Facebook’s perception of my right to control my own information then I certainly don’t think a group of private citizens on an obsessive collection bee should be allowed to hold AND share that information. Who are these archive gatekeepers? If I delete from Facebook then not only should my data not be publicly visible online but ALSO it should be deleted from Facebook’s database AND any Jeremy Keith archive had better be willing and able to scour it out of their system.

Begging the question: any information currently in that snapshot that has been found to be libelous must be removed… so how will this affect a snapshot of everything on the Internet? Because if you disseminate libelous statements then you are, from my understanding, committing an offence. The next question is: an offence in what country and jurisdiction? That can be a dangerous legal minefield. For example, what about copyright of that archival content? What right does a private individual have to make copies and keep copies of websites full of proprietary data? Our relationship is with Facebook NOT Jeremy friggen Keith.

Leading to my next issue… it’s inevitable that in the creation of an Internet archive there will be infringements on people’s human rights. A common example would be the case of workplace or schoolyard bullying via Facebook – an example this morning on the radio was of a large woman who had workmates post unflattering photos of her on their Facebook profiles for ridicule. This was subsequently removed within 24 hours but any snapshot of the Internet as a whole for posterity taken in that period would store HER embarrassment into the archive. That is simply wrong. And that would be as wrong as storing other lies or rubbish written about people (particularly school children of today) for posterity. How does an archive deal with this bullying content, or does it just assume everything on the Internet is golden dust? Dare we mention issues such as the archival of pornography or criminal scams?

How dare any private citizens making an Internet archive believe they can scrape all of our children’s content for their own purpose? I’m surprised that this hasn’t been raised earlier. How secure will it be? Who will it be shared with? Why can’t these people just omit personal information from this archive as an ethical imperative? I’m lost for words.

The Traditional Garbage In / Garbage Out Problem

You should start to see a pattern here… this mass archive offers up a traditional database problem we should be aware of – GIGO (Garbage In / Garbage Out). It may sound romantic to ‘save a copy of our beloved Internet’ or any of the free social services we’ve come to exploit en masse but somebody isn’t thinking this through. Somebody is so fixated on their loss of a trivial bookmarking service and an inflated social self that they are willing to overlook privacy, data accuracy and social decency. So no, Jeremy Keith, I have no doubt that you’re a JavaScript rock star (my office bookshelf has several of your books for reference)… but on this occasion you are a misguided human being with a personal goal that I simply disagree with. Your bollocks may be well licked by the in-crowd but from my understanding of an Internet archive of everything I can only see peril and danger for the private citizens of the world.

It’s bad enough that we live in an electronic era where everything is monitored by Big Brother and the content that we’ve trusted to Facebook and other platforms will never be deleted off their servers (only from public view). To have private citizens wanting to grab that mass of spam and troll comments and everything we’ve said drunk, sober or sideways is, to my ears, repugnant. Save your own bookmarks… leave my bookmarks to die in the wilderness as fully intended when I used the service several years ago.

The Eventual Next Archive… and the Next

Which brings me to the next issue, how many times does this archival process occur? Every 12 months? Because tomorrow will have a different culture and then the year after that. Therefore multiple versions of this archive would probably be made to contain the snapshots of this Internet culture. Be sure and certain when I tell you one thing… I want and expect much of what I have online to fade away into obscurity over time. That’s the best outcome for me as a human being. Although the data is publicly available I do not believe you have the moral right to take it – much like people taking photographs through your kitchen window because they may see you from the roadway.

Yes, on the surface it does sound like a noble pursuit to archive the Internet… and what’s the harm? But just think about those important topics for a few minutes and tell me there isn’t a problem on Space Ship Adactio. He says we don’t need to sort this stuff out before it goes into the archive… damn right it needs to be sorted. If not, I hope the project gets sued out of existence sometime in the near future. By all means, save your own content for posterity. And by all means take a snapshot of certain things relevant to humanity that does not impinge on others’ rights and liberties. But don’t impinge simply because you’re technically savvy enough to do so and have lost the respect for other people’s data and privacy along the way.

I have stated a number of reasons why I find it abhorrant that we’re just talking about the difficulty of mass archiving the Internet. Pruning is key… must be key. I don’t see how any of those points amount to ‘bollocks’.

Comments are closed.

Social Networking

Keep an eye out for me on Twitter

About the Author

Steven Clark Steven Clark - the stand up guy on this site

My name is Steven Clark (aka nortypig) and I live in Southern Tasmania. I have an MBA (Specialisation) and a Bachelor of Computing from the University of Tasmania. I'm a photographer making pictures with film. A web developer for money. A business consultant for fun. A journalist on paper. Dreams of owning the World. Idea champion. Paradox. Life partner to Megan.

skip to top of page