Our Blog

where we write about the things we love



Data Science Dust Off

Last month’s Briefing in the Economist covers the topic of Artificial Intelligence; more specifically the supervised machine learning technique broadly termed artificial neural networks. Machine Learning, Artificial Intelligence, Data Science; call it what you will, to the general public, indeed to the majority of tech professionals, they are one and the same.

Data and the application of evolutionary type statistical tools to those data are hot property right now. It’s seen me dusting off my undergraduate Information Science degree, ashamed in reflection of the hard time* I gave my professors as to the practical utility of what we were being taught.

Machine Learning, Artificial Intelligence, Data Science; call it what you will, to the general public, indeed to the majority of tech professionals, they are one and the same.

I shall not bore you with too many of the details of artificial neural networks… just yet. I commend the Economist, May 9th 2015, for being ballsy enough to attempt to explain them to a mainstream (albeit a certain type of ‘mainstream’) audience.

Many of the terms we use in a technical context may carry quite different meanings to a layman. Let’s start with one of those. An artificial neural network passes information through layers of neurons, strengthening or weakening those neural connections based on the feedback of a corresponding known output.

Our concept of information is a little different to yours. Take these three lines, which all carry the same amount of information:

  • The quick brown fox jumped over the lazy dog
  • Le rapide renard marron saute par-dessus le chien paresseux
  • 20,8,5,,17,21,9,3,11,2,18,15,23,14,,6,15,24,,10,21,13,16,5,4,,15,22,5,18,,20,8,5,,12,1,26,25,,4,15,7

The first line above was probably immediately recognisable to you because your brain has an a priori model for that information input. You’ve probably got a rough model for what French looks like too, so taking the second line and sticking it into your search engine of choice will give you the information in English.

The third one will be of great interest to any 10 year olds out there. It’s in code that can be broken down to the numeric index (1-26) of each letter in ten minutes flat. Now, merely by virtue of appearing in the same list above, a connection will be made in Bing, in Google, between those three phrases. Neither search engine will understand my primitive code; rather they’ll draw and encode the correlation from their appearing in that same list. That last item, then, is my contribution to the ‘intelligence’ of the machine. ‘Google Bombing’ is the pop-culture manifestation of this trick. All three phrases carry the same information; knowledge is represented by the correlation between them.

Machine learning is a process of correlation of information. Hyper-dimensional, hugely complex, often organic brain-inconceivable correlation, sure, but jut statistical correlation nonetheless. Feed it enough information in my code like above and you’ll be able to ask “What is The Quick Brown Fox Jumped Over The Lazy Dog in Letter Index Code?”. It’s easy enough to learn. But, ask “Is The Quick Brown Fox Jumped Over The Lazy Dog more useful in French? Or in English?”.

The inductive reasoning required to impute that “the utility lies in the appearance of every letter of the alphabet in a comprehensible sentence”. That level of learning is a long way off. Bringing a descriptive, inductive frame to the output of the data science process remains a human endeavour.

The Data Scientist is there to perform the science (part statistician, part hacker), but also to interpret that science with creativity and flair. Rarer than rocking-horse shit, Data Scientists are about the only thing in this big data area not in abundant supply and trending towards cost = $0. We’ll come back to that skill set in later posts.

The tale above also holds an important lesson for the here-and-now. The tools of Data Science allow us to extract from information, knowledge that we have no idea is there. You’d have had no idea about my code unless I told you about it (or you had a 10 year old code breaker). There will often be relationships in the data, the existence of which are far from intuitive … until, of course we reveal them, and then, in retrospect, they’d seem to have made perfect sense all along.

In classic business intelligence processes, we extract the right data from our operational systems into our analytic systems to be able to ask the questions we want of that data. As the costs of capturing, storing and processing data continue to fall, an intuition-driven mind-set will see us disposing of data from which knowledge could cost effectively be revealed.

If you’re currently throwing that potential asset away, then we need to talk again. As Tim Berners-Lee put it, Data is a precious thing and will last longer than the systems themselves. But, that’ll be the topic for post #2.

* My other undergrad degree was in Law so I could be quite the opinionated person...

Posted by: Chris Auld, Chief Technology Officer, Executive Director | 15 June 2015

Tags: Data Visualisation, Data, Big Data, Data Intelligence, Machine Learning

Top Rated Posts

Blog archive

Stay up to date with all insights from the Intergen blog