Predictive analysis in the age of Data-D2

Predictions are booming! They are revolutionizing business and redefining key decisions in everyday life. 

Predictive analysis gives us a peek into the future through historical data and statistical algorithms – so, while predicting the future is still a distant dream, making educated guesses of what may happen is very much a reality. Using the power of predictive analysis, businesses are forecasting outcomes like customer behavior and risk management more accurately and to their advantage. This key insight into the future is making a strong business case for enterprise adoption as a Forrester report indicates a doubling of investments in enterprise PA over the last two years. 

The bottom-line – everyone is using it!

The elephant in the room

But predictive analysis is only as good as the data that is feeding the algorithm. Big data is often seen as the end all when driving analytics. However, that is far from the case. Having more data can be a huge asset, but only if the quality of that data is accurate, which is often not the case. So, in effect, it’s not really the quantity but quality of data that matters when making predictions analysis. But how do you know whether the data you’ve collected is in fact high-quality? 

In order for predictive analysis to work for your organization, considerable effort needs to go into measuring and optimizing the quality of the data. Here are three factors to help you get started: 

  1. When it comes to the level of detail (LOD) more is always better

When analyzing a data set, it’s important to go into minute details to correctly represent patterns in the data. This includes historical time series data, categorical data, text-based data and other forms of data. This will allow decision makers to react better, predict smarter, and plan the next course of action with a greater degree of trust.  

To understand why the level of detail is so important in predictive analysis, let’s take an example of a doctor patient relationship. A doctor who is aware of the ins and outs of their patient will be able to provide a more comprehensive diagnosis of the patient’s medical problems. This includes their medical history, lifestyle choices, habits – good and bad, and other details about their patients’ behaviors. The more details, the stronger the diagnosis. 

The same example can be applied when understanding customer journey or when mapping to level of granularity (figure right), showing geographic details moving from less granular to more granular.   

Data collection is a critical component here as it allows for greater accuracy of predictive analytics. It’s important to remember that the concept of “context of data” is completely irrelevant in an analytics framework. Predictive analytics is context agnostic and will not comprehend the need to ask for more data in case it doesn’t meet its end goal. Therefore, it’s imperative to collect the right data which is both contextually relevant and detailed in order to get the most accurate predictions. 

  1. The need for data connections through data governance 

Connecting data across systems within a database is a key step in building a strong predictive analysis framework. Having data sitting idle in silos is a waste. Any data, irrespective of its source, can be an asset worth analyzing for its correlation to the predicted outcome. For instance, if your dataset consists of four variables that deliver a set intended outcome; by considering only one variable in the equation your prediction gives rise to skewed results (to say the least). 


In a real-life setting, these variables could be indicative of things like geography, market value, buyer persona, customer interest, and more. Looking at any of these data points individually will invariably lead to inaccurate predictions; a combined analysis on the other hand will give rise to powerful foresight.

However, data connections also lead to challenges of their own. The key is to create multiple data connections through data governance. This allows gathering of data from various sources, which include EAM/ERP/CMMS/IIoT, while ensuring that the single source of truth and the variables being factored in are accurate.

Additionally, data governance can provide the necessary framework to ensure that data connections are established and managed in a controlled and secure manner. For example, the data governance framework can establish data standards, ensuring accuracy and consistency across connections. 

  1. Data Completeness 

The last piece of the predictive analysis puzzle is data completeness. Incomplete data can lead to inaccurate and incomplete insights, which can in turn lead to poor predictions. Ironically, a report by ScienceDirect shows that incomplete data is often a result of human error. Upon closer inspection, much of this study points to biases in analyzing data in all its completeness at a given time for various reasons. To understand this human element a bit better, let’s look at an example.

Let’s assume you’re conducting a survey to identify the best possible market for a skincare product. From a pool of, let’s say, 200 respondents, if 20 individuals have declined from sharing their contact information, it doesn’t impact the completeness of the survey. However, if critical fields like age, location or gender are missing from the report, then it will negatively affect your ability to identify a target demographic for your product. Depending on the specific requirements of your business, you’ll have a threshold beyond which incomplete data will render the survey results unusable for your particular use case. 

Similarly, different fields within a particular data set will have differing levels of impact on the outcome of predictive models. For our previous example, while the primary purpose of the survey was to identify a potential market for a skincare product, the results would also be used to enable sales and marketing strategies for that particular product; what is the gender and age composition of the target market, what are their existing skincare preferences, how do these preferences shift along with age and geographic location, and other considerations.

Evaluating data completeness is by no means an easy task and businesses have to evaluate whether it’s a fault within the data entry problem or if the respondents are not comfortable sharing the particular data to begin with. Depending on the importance of that data for predictive analysis, businesses will have to either adopt a different approach or re-evaluate the accuracy of predictive model in the absence of the required data.

Next Steps

For businesses dealing with hundreds of thousands of data sets daily, it’s not easy to connect disparate data sources while simultaneously evaluating the parameters for detail and completeness. Data integration plays a key role in connecting predictive analysis models to an ever-increasing list of data sources. 

Data has an incredibly short shelf life and modern data integration tools make it possible to extract business-ready insights from the available data in near real time. Powering business excellence with predictive analytics requires collaboration between the right technologies and the technology team. As such, consulting a seasoned veteran in the sphere of data and data integration might help businesses identify mission critical areas and avoid common pitfalls in their bid to unlock predictive analysis through data transformation.