Our Blog

where we write about the things we love

19

Aug

Musings from a Big Data conference

Recently I attended the annual Big Data conference held in Auckland. The attendance was three times greater than the previous year’s conference which reflects the increasing global interest in the area of big data. 

Intergen is currently delivering a number of big data projects and this conference provided an opportunity to listen to and engage with New Zealand organisations and vendors to share how organisations are approaching big data, the benefits being realised and the challenges faced.  

Session highlights

There was a range of topics presented by various speakers. Highlights of the day were presentations from Chris La Grange from Kiwibank presenting how to develop a convincing business case for big data, Renee Styles from Russell McVeagh, who presented the importance of understanding privacy issues related with big data, and Roberto Garrido a Senior Data Insight & Modelling Analyst from New Zealand Post who presented on the importance of quality statistical approaches when analysing big data datasets. I particularly enjoyed Renee’s presentation as it raised my awareness of the issues of privacy when combining internal data with external data sources that may result in insights that breach privacy law. This is an interesting situation that I believe many organisations will neglect to address and which may cause inadvertent privacy issues.

Listening to the feedback: differentiating between the hype and the real benefits

Feedback during the conference matched what we are seeing in the market: as an industry there is a lot of interest in big data and how to leverage it, but there is little evidence of ‘real’ big data projects underway. It would seem that the hype is at risk of exceeding the real benefits of harnessing big data. Big data is real and is likely to deliver competitive advantage when harnessed effectively, but as an industry we still have a way to go to take big data, make it a commodity and express that potential into a workable business case. This is not to say we cannot leverage the value of big data now – we certainly can – but making it happen and justifying its value against the cost and risks requires careful consideration. 

How should we define big data?

It is evident from my experience so far and the feedback during the conference that our industry is struggling to categorise what big data actually is. For many it is a very large dataset, but this is only one aspect of big data. Gartner helpfully categorised big data into the 3 Vs (Volume, Variety and Velocity) which the industry has adopted and is becoming more prevalent in describing big data opportunities.  Additional Vs have been added by various commentators including the term Veracity which I feel warrants inclusion as well. I recently found this image below which I think summaries the 4 Vs nicely.

Defining big data

In more detail:

  • Volume – The most common term used to describe a big data opportunity. Enterprises of all industries are having to address the need to handle the ever-increasing volume of data created by every day processes, people and systems.
  • Velocity - Velocity describes the frequency at which data is generated, captured and shared. The faster we can collect and process data, the more opportunity we have to leverage the information for competitive advantage. Traditional BI approaches commonly do not effectively address the needs of an organisation to collect, analyse and disseminate insight in near-real time.
  • Variety – We are seeing an ever-increasing range of digital assets, each containing potentially valuable information. Obvious examples include the 80% of non-structured/semi-structured information and intellectual property stored internally in our organisations in the form of documents, emails, intranets, video, voice etc. These datasets do not fit into our well-structured traditional data warehouses, as the data is constantly changing, is non-exact and is often unpredictable. Other emerging data types include geo-spatial and location, log data, machine data, metrics, mobile, RFIDs, search, streaming data, social, text and so on. Consider this for a moment: globally we expect to see – in the short term – 30B RFID sensors, 4.6B smart phones with cameras and GPS, 100sM GPS devices and 200M smart meters to name a few – these will all be able to be utilised in various ways providing additional metadata/information to our typical structured data sets and providing unique insight.
  • Veracity - Big data is often not verified, verifiable or validated (yet more Vs!). Analysis can’t always be duplicated easily as data keeps growing/changing; and duplication, omission, and general incompleteness is to be expected. This is an important characteristic of big data that needs to be addressed early in terms of how we deal with it and the type of insight we expect to gain from it.

Where do the big data opportunities lie?

Big data opportunities stand out where data sets categorised by one or more of the above Vs challenge traditional approaches and architectures of business intelligence – primarily around the collection, management and analysis. And whilst big data requires a different approach, importantly it does not live in isolation to the rest of your BI investment. Big data is just another tool in your toolbox and needs to be considered as such. It is not a replacement for your traditional BI stack, but a complementary solution.  A new term, BI 2.0, has emerged that represents traditional and big data co-existing in one architecture/approach – I’ll leave this topic for another day, but it’s worth a look.

As we commonly see with new concepts, the big data phenomenon is not new and as SAS presented in the conference a number of vendors have been doing big data analysis for a number of years now. The difference is that with affordable technology, including both hardware and software, we are able to capture and process big data datasets in a way that previously were inhibited due to cost, complexity and effort. Adding to the momentum, we are also seeing an unparalleled increase in the types of new data sets that present the 4 V characteristics that are forcing us to reconsider our approach and change and adapt.

What are the challenges?

A number of challenges were discussed throughout the conference which I believe are important considerations when you decide to embark on your big data journey:

  • Skills shortage – There is currently a serious lack of capability within the industry to deliver big data projects and it is proposed that this will continue to worsen over the next one to three years as demand for these skillsets increase. It is estimated that in the United States alone there is a current shortfall of more than 120,000 vacancies. The big data skillset is different to that of the traditional BI professional and this is the challenge: it is not an easy evolution of these core skills, but a new skill to invest in. 
  • There is a lack of real-world examples to learn from and discuss. This was highly evident in the conference where a lot of theory was presented, but currently very little implementation evidence.
  • Many organisations do not have the capability (emerging role of the data scientist) to derive quality information and insight from big data sources. Many examples of big data datasets require strong statistical analysis applied which is often very different to structured data analysis.
  • Organisations attempting to leverage big data do not have the BI maturity to extract the value that this data presents. Many New Zealand organisations have a low BI maturity and it is likely that big data will deliver limited value until these organisations actively address their maturity. Data governance, a vital component in our traditional BI solutions, is even more essential when considering big data.
  • Big data does not live in isolation of your existing BI architecture and capability. Big data is a component that needs to seamlessly fit into your existing BI architecture to deliver value.  As such, organisations need to apply careful consideration into how this will occur and what it will look like.
  • The current cost of building big data solutions is high (people and training costs are the primary costs) and the technology is still evolving. The decision to commence a big data project needs to carefully assess the expected total cost of ownership, taking into account that the technology used today may look quite different in 1-2 years’ time and may have to be re-implemented. Chris La Grange proposed that an organisation should take an overly pessimistic approach when building the business case for big data and only if the business case still measures up at that point should you accept it.
  • “Greater volumes of data does not equate to greater insight”. There are many instances where collecting greater volumes of data will deliver limited improvements in insight. You need to understand your data well and qualify that the data is delivering insight worthy of the investment.
  • “All problems look like big data solutions through a big data lens.” A number of customers that I have worked with have eagerly looked to use big data techniques to resolve an issue, whereby a traditional approach was more suitable. In these cases a big data solution would have delivered an outcome, but would have cost a great deal more and introduced unnecessary complexity and risk. It is best to qualify early on that a big data solution cannot be implemented in a traditional approach that may present lower risk and cost.

Where to from here?

I have presented above a number of the big data challenges which may imply that I am not an advocate of the emerging big data opportunity – but this is certainly not the case. We live in a world that information is growing at an ever-increasing rate and data sources exist now that no one would have ever imagined 10 years ago and we cannot ignore the fact this information may be full of rich insight. 

If you can utilise this untapped information to drive positive business outcomes then big data presents another way to drive competitive advantage. Just make sure that you check and double check that your problem is indeed a big data problem (have a read of this excellent post Most data isn't "big," and businesses are wasting money pretending it is), you have the BI maturity to use this information effectively and that these new datasets do actually contain valuable information. If you do this you should be rewarded with additional insight that would not have been able to be harnessed through traditional approaches.  

In my opinion, time is on your side; start with a strong business case, prove the key concepts and iteratively deliver small projects and you’ll be in a good position to realise the value that big data offers. 

If your organisation is interested in understanding more about big data or implementing a project leveraging these latest techniques, platforms and tools, come and talk to us.

Posted by: Tim Mole | 19 August 2013

Tags: Business Intelligence, Microsoft, Intergen, BI, Big Data


Blog archive

Stay up to date with all insights from the Intergen blog