Feb 26, 2010 0
Happy Friday! I am declaring today to be particularly good, even as Happy Fridays go, since (1) I went for a run this morning, (2) the cubefarm has exploded into the amusing chaos of Yet Another Desk Reshuffle and (3) the first person I saw as I walked towards my building this morning was oliverm, smiling and waving at me. Good things.
There will be a Day 2 of Happiness Posting – in fact I will post it tonight, because tonight I miraculously have a night to myself which I intend to use for blogging and painting and the like. (I’ll also take this moment to point out that I never used the words ‘consecutive days’. And in the words of Nick Hornby, yes, that is a sneaky lawyer’s trick.)
In the meantime, have a little of my pontification on one of the Many Shiny Things I am excited about. This particular shiny thing is data. And datasets. Datamining. Visualisation. Information. I accept that this is a field that is dead sexy only to a very specific sub-set of people. However – trust me on this, unbelievers – for those of us wired in that particular way, it can be an intricate, exquisite, fascinating thing.
The web is beginning to engage with data in increasingly interesting ways. For one thing, free datasets are becoming more and more accessible and people are using them in ways that are sometimes artistic, sometimes functional, and very often both. While the plague of inaccessible data, siloed in institutions and organisations, still represents an incredible waste of potential, the situation is certainly improving. And, from an entirely different direction, Web2.0 technology has delivered the tools to easily collect one’s own raw data.
On the latter point, I’ve been running a small personal data collection project recently. Applications such as MapMyRide, FourSquare, Last.fm, LibraryThing, Sleep Cycles for iPhone and Delicious track a whole lot of stats already in a fairly passive, low-effort manner. In addition to those, I’ve adopted Your Flowing Data (YFD) to aggregate information on a number of other variables. There’s a YFD iPhone app, and a spiffy hack for Latitude users. (The very pretty Daytum tool also provides similar functionality.)
In a move that most consider an odd choice, I’ve made most of my staggeringly banal YFD data public, on a page called Banalytics (yes, I’m proud of that one). The reasons I’ve decided to open it up are various, but I’m particularly interested in the way it massively reduces my tendency to tell small, pointless lies, and feels like a gesture to towards understanding that people will choose to like me or not like me just as I am. (And of course there are the cynical days when I wonder whether maintaining privacy for the sake of privacy is a drain on my resources, and no more than a shared delusion.)
On a less personal and more academic level, I’m utterly fascinated by people who create large scale projects of this kind. Nicholas Felton is one of the best-known examples, and his personal Annual Reports have received plenty of coverage. (He also posts some really lovely stuff over at Tumblr!) His passion for design, information, for the appreciation of the very small – these things resonate with me and I can lose myself for great lengths of time in the existential detail of his work.
I struggle to find the right words to explain why I find this field so enchanting; it is a discipline of numbers and forms, not well suited to words. The attraction for me has much to do with shapes and patterns and relationships. Both the analysis and the visualisation are acts of beauty; acts of untangling immense webs, and of deft slicing and assembly. They are acts of perceiving the interconnectedness of things, and acts of holding that up and saying ‘see what I have found; see that it has meaning’. And they are the great heart – each heartbeat counted and illustrated – of the the intersection between the analytical and the designed.
For anyone interested in reading further, see below for a rambling assortment of the data blogs, tools, resources and datasets currently available on the web.
Australian Bureau of Statistics
UK Data Archive
WHO Data and Statistics
UCI Machine Learning Repository
Time Series Data Library
Statistical Data Mining Tutorials