<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>tales from urban dilettantia &#187; datamining</title>
	<atom:link href="http://flyingblogspot.com/tag/datamining/feed/" rel="self" type="application/rss+xml" />
	<link>http://flyingblogspot.com</link>
	<description></description>
	<lastBuildDate>Mon, 23 Aug 2010 08:19:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Strange Attraction</title>
		<link>http://flyingblogspot.com/2010/02/117/</link>
		<comments>http://flyingblogspot.com/2010/02/117/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 09:13:06 +0000</pubDate>
		<dc:creator>Helen</dc:creator>
				<category><![CDATA[geeking it up]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://flyingblogspot.com/2010/02/117/</guid>
		<description><![CDATA[Happy Friday! I am declaring today to be particularly good, even as Happy Fridays go, since (1) I went for a run this morning, (2) the cubefarm has exploded into the amusing chaos of Yet Another Desk Reshuffle and (3) the first person I saw as I walked towards my building this morning was oliverm, [...]]]></description>
			<content:encoded><![CDATA[<p>Happy Friday! I am declaring today to be particularly good, even as Happy Fridays go, since (1) I went for a run this morning, (2) the cubefarm has exploded into the amusing chaos of Yet Another Desk Reshuffle and (3) the first person I saw as I walked towards my building this morning was <a href="http://oliverm.livejournal.com/"><strong>oliverm</strong></a>, smiling and waving at me.  Good things.</p>
<p>There <em>will </em>be a Day 2 of Happiness Posting &#8211; in fact I will post it tonight, because tonight I miraculously have a night to myself which I intend to use for blogging and painting and the like. (I&#8217;ll also take this moment to point out that I never used the words &#8216;consecutive days&#8217;. And in the words of Nick Hornby, yes, that <em>is </em>a sneaky lawyer&#8217;s trick.)</p>
<p>In the meantime, have a little of my pontification on one of the Many Shiny Things I am excited about. This particular shiny thing is data. And datasets. Datamining. Visualisation. Information. I accept that this is a field that is dead sexy only to a <em>very </em>specific sub-set of people. However &#8211; trust me on this, unbelievers &#8211; for those of us wired in that particular way, it can be an intricate, exquisite, fascinating thing.</p>
<p>The web is beginning to engage with data in increasingly interesting ways. For one thing, free datasets are becoming more and more accessible and people are using them in ways that are sometimes artistic, sometimes functional, and very often both. While the plague of inaccessible data, siloed in institutions and organisations, still represents an incredible waste of potential, the situation is certainly improving. And, from an entirely different direction, Web2.0 technology has delivered the tools to easily collect one&#8217;s own raw data.</p>
<p>On the latter point, I&#8217;ve been running a small personal data collection project recently.  Applications such as <a href="http://www.mapmyride.com/">MapMyRide</a>, <a href="http://www.foursquare.com/">FourSquare</a>, <a href="http://www.livejournal.com/www.last.fm">Last.fm</a>,  <a href="http://www.librarything.com/">LibraryThing</a>, <a href="http://www.lexwarelabs.com/sleepcycle/">Sleep Cycles for iPhone</a> and <a href="http://www.delicious.com/">Delicious</a> track a whole lot of stats already in a fairly passive, low-effort manner.  In addition to those, I&#8217;ve adopted <a href="http://your.flowingdata.com/">Your Flowing Data</a> (YFD) to aggregate information on a number of other variables.  There&#8217;s a <a href="http://flowingdata.com/2009/08/27/your-flowingdata-gets-an-upgrade-free-iphone-app/">YFD iPhone app</a>, and <a href="http://technicalcredit.blogspot.com/2009/07/location-data-on-yourflowingdata.html">a spiffy hack for Latitude users</a>.  (The very pretty <a href="http://daytum.com/">Daytum</a> tool also provides similar functionality.)</p>
<div><a href="http://www.flickr.com/photos/flyingblogspot/4324122811/"><img src="http://farm5.static.flickr.com/4042/4324122811_c3d53dc01c_o.jpg" alt="Obsessed, any?" /></a></div>
<div><em>Obsessed, any?</em></div>
<p>In a move that most consider an odd choice, I&#8217;ve made most of my staggeringly banal YFD data public, on a page called <a href="http://your.flowingdata.com/flyingblogspot/page/822/">Banalytics</a> (yes, I&#8217;m proud of that one). The reasons I&#8217;ve decided to open it up are various, but I&#8217;m particularly interested in the way it massively reduces my tendency to tell small, pointless lies, and feels like a gesture to towards understanding that people will choose to like me or not like me just as I am. (And of course there are the cynical days when I wonder whether maintaining privacy for the sake of privacy is a drain on my resources, and no more than a shared delusion.)</p>
<p>On a less personal and more academic level, I&#8217;m utterly fascinated by people who create large scale projects of this kind.  <a href="http://feltron.com/">Nicholas Felton</a> is one of the best-known examples, and his personal Annual Reports have received <a href="http://infosthetics.com/archives/2009/01/2008_feltron_annual_report.html">plenty</a> of <a href="http://www.geekosystem.com/felton-report-2009/">coverage</a>. (He also posts some <a href="http://feltron.tumblr.com/">really lovely stuff</a> over at Tumblr!) His passion for design, information, for the appreciation of the very small &#8211; these things resonate with me and I can lose myself for great lengths of time in the existential detail of his work.</p>
<p>I struggle to find the right words to explain why I find this field so enchanting; it is a discipline of numbers and forms, not well suited to words. The attraction for me has much to do with shapes and patterns and relationships. Both the analysis and the visualisation are acts of beauty; acts of untangling immense webs, and of deft slicing and assembly. They are acts of perceiving the interconnectedness of things, and acts of holding that up and saying &#8216;see what I have found; see that it has meaning&#8217;. And they are the great heart &#8211; each heartbeat counted and illustrated &#8211; of the the intersection between the analytical and the designed.<br />
<em><br />
For anyone interested in reading further, see below for a rambling assortment of the data blogs, tools, resources and datasets currently available on the web.</em></p>
<p><strong>Data &amp; Visualisation Blogs:</strong><br />
<a href="http://www.datawrangling.com/">Data Wrangling</a><br />
<a href="http://flowingdata.com/">Flowing Data</a><br />
<a href="http://www.guardian.co.uk/news/datablog">DataBlog (The Guardian)</a><br />
<a href="http://www.informationisbeautiful.net/">Information is Beautiful</a><br />
<a href="http://infosthetics.com/">Infosthetics</a></p>
<p><strong>Datasets</strong><br />
<a href="http://www.abs.gov.au/websitedbs/D3310114.nsf/home/home?opendocument">Australian Bureau of Statistics</a><br />
<a href="http://www.data.gov/">Data.gov (US)</a><br />
<a href="http://www.data-archive.ac.uk/">UK Data Archive</a><br />
<a href="http://data.un.org/">UN Data</a><br />
<a href="http://www.who.int/research/en/">WHO Data and Statistics</a><br />
<a href="http://stats.oecd.org/Index.aspx">OECD.Stat Extracts</a><br />
<a href="http://numbrary.com/">Numbrary</a><br />
<a href="http://infochimps.org/">Infochimps</a><br />
<a href="http://dbpedia.org/About">DBPedia</a><br />
<a href="http://archive.ics.uci.edu/ml/datasets.html">UCI Machine Learning Repository</a><br />
<a href="http://www.robjhyndman.com/TSDL/">Time Series Data Library</a></p>
<p><strong>Meta-lists of Datasets:</strong><br />
<a href="http://www.datawrangling.com/some-datasets-available-on-the-web">DataWrangling List</a><br />
<a href="http://www.kdnuggets.com/datasets/index.html">Datasets for Data Mining</a></p>
<p><strong>Techniques:</strong> <a href="http://www.autonlab.org/tutorials/"><br />
Statistical Data Mining Tutorials</a></p>
]]></content:encoded>
			<wfw:commentRss>http://flyingblogspot.com/2010/02/117/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Links, Many Links!</title>
		<link>http://flyingblogspot.com/2010/01/81/</link>
		<comments>http://flyingblogspot.com/2010/01/81/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 09:27:55 +0000</pubDate>
		<dc:creator>Helen</dc:creator>
				<category><![CDATA[links]]></category>
		<category><![CDATA[craft]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[flickr]]></category>
		<category><![CDATA[foursquare]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[link spammage]]></category>
		<category><![CDATA[photography]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://flyingblogspot.com/?p=81</guid>
		<description><![CDATA[It appears to be Random Links-I-Like Round-up Wednesday here at The Flying Blogspot, as I have some random linkage for you. Also, it happens that today is a Wednesday.  (Happy Hump Day, hipikat!) The Voynich Manuscript &#8211; I love this mystery and all the theories that have grown up around it. Light-bulb terrariums &#8211; these [...]]]></description>
			<content:encoded><![CDATA[<p>It appears to be Random Links-I-Like Round-up Wednesday here at The Flying Blogspot, as I have some random linkage for you. Also, it happens that today is a Wednesday.  (Happy Hump Day, <a href="http://hipikat.livejournal.com/"><strong>hipikat</strong></a>!)</p>
<div style="padding-left: 30px;">
<p><a href="http//en.wikipedia.org/wiki/Voynich_manuscript"><strong>The Voynich Manuscript</strong></a> &#8211; I love this mystery and all the theories that have grown up around it.<a href="http://www.apartmenttherapy.com/sf/gardening/lightbulb-terrariums-and-planters-102264"><strong> </strong></a></p>
</div>
<div style="padding-left: 30px;">
<p><a href="http://www.apartmenttherapy.com/sf/gardening/lightbulb-terrariums-and-planters-102264"><strong>Light-bulb terrariums</strong></a> &#8211; these are so very pretty, and I do have some old incandescents sitting around the the craft room.  Something else for my infinitely expandable maybe-someday list?</p>
</div>
<div style="padding-left: 30px;">
<p><a href="http://www.farbeyondthestars.com/?p=825"><strong>The Ultimate Guide to the Minimalist Workweek</strong></a> &#8211; a nice reminder for the start of a new work year; while not all of these suggestions can be applied in every workplace, many of them are broadly applicable.</p>
</div>
<div style="padding-left: 30px;"><a href="http://158.130.17.5/%7Emyl/languagelog/archives/000350.html"><strong>The word &#8216;snowclones&#8217;</strong></a>- although the word was coined in 2004, I only discovered it recently; there&#8217;s also a nice list of common snowclones and their sources <a href="http://en.wikipedia.org/wiki/User:JackSchmidt/List_of_snowclones?oldid=209006404">here</a>.</p>
<p><a href="http://www.flickr.com/photos/edrabbit/galleries/72157623103181304"><strong>&#8216;Looking Into The Past&#8217; Flickr gallery</strong></a> &#8211; check out the way these images mash up time, narrative and geography; they make me simultaneously want to research and to photograph more.</p>
<p><a href="http://userscripts.org/scripts/show/38475"><strong>Facebook Event to Google Calendar button Greasemonkey script</strong></a> &#8211; this is a nice, time-saving little script; I found I had to write an extra &lt;br&gt; into the code to get it to position the button correctly.<strong> </strong></p>
<p><a href="http://infochimps.org/"><strong>Infochimp</strong><strong>s</strong></a> &#8211; masses and masses of beautiful public datasets; I&#8217;ll post more on the beauty of datamining shortly.</p>
<p><a href="http://foursquare.com/"><strong>foursquare</strong></a> (and on Wikipedia <a href="http://en.wikipedia.org/wiki/Foursquare_%28service%29">here</a>) &#8211; I bypassed foursquare originally, as it was restricted to specific cities and because I wasn&#8217;t seeing the functionality. However the offers similar basic geolocation functionality to BrightKite and (in some respects) Google Latitude but combines this with a focus on discovering the urban landscape and populating the map with useful information about your area.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://flyingblogspot.com/2010/01/81/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
