In follow on from a recent blog from Craig Carpenter, VP, Marketing, Recommind, I wanted to expand on how organisations can best extract maximum value from Big Data, in particular the importance of unstructured information.
We all know that data growth is accelerating at a rapid pace and storage space seems to be moving in lock step with that – but there’s no value in simply storing it and in fact there are considerable costs to that approach.
Most companies today have the ability to analyse data to identify facts – e.g., we sold this product to this person on this date. But organisations are missing valuable information without considering the relationships between those facts. Through the use of machine learning and automatic categorisation technology, organisations can identify the relationships between entities such as people, titles, instances, dates and departments. And from that they can better predict future trends, being able to figure why someone did something, rather than just what they did and thus ascertaining whether the same conditions may exist in the future and if so, what the result will be.
Big data is valuable (when analysed) because it’s different from what was considered data before. Data now is not only numbers, its words and numbers (and potentially images and audio and video too). Those words could be regular words but they could also include slang and vernacular. They could be in multiple languages. Put those together with the numerical data and you have a much bigger, if sometimes messier picture of what’s going on. But today’s tools are starting to be able to extract value from that messier data and include data sources that weren’t previously analysed using software.
Let’s look at a couple of examples of where big data analytics is being put to work.
According to recent statistics from WIPO the number of patents filed each year, worldwide is almost 1.5 million, with more than half a million patents granted in that time. Without the tools to accurately index and categorise information, R&D departments may find it difficult to cross check patents they hold against potential breaches of their patents. Similarly when searching for prior art they need to pull in all sorts of data sources in various formats to make sure they’re not about to infringe on other company’s patents. We’ve seen recently how costly that can be in various high profile cases. Such automation of the patent analysis process is big data analytics in action.
Fraud in healthcare is a huge problem worldwide, but especially in the US. The US Federal Bureau of Investigation estimates that between $70 and 234 billion is lost to healthcare fraud annually – effectively stolen money that results in higher healthcare costs for the rest of the population. Software exists today that can accurately index and categorise information and ultimately identify relationships between entities that in fact indicate fraudulent acts.
By surfacing information that resides in the countless repositories and extracting intelligence that aligns with business goals, organisations and can reduce fraud, or identify new opportunities for valuable intellectual property. And there are many other examples.
By embracing technology to help them identify these opportunities, businesses can embrace and take advantage of the data explosion, rather than hiding from what could end up a data minefield. In the third instalment in this series we will look at some of the risks associated with big data – and how best to ameliorate them.