Data Preparation:
Refining Raw Data into Value


A modern approach to data preparation is needed

The increasing digitalization of business processes is making it necessary for companies to enable as many users as possible to gain insights from data (democratization of analytics). Many companies today view data preparation as the key to increasing their ability to efficiently use data in a distributed manner to optimize business processes, or to enabling new, innovative business models in the first place.

In today’s economy, achieving efficient and agile data preparation is of utmost importance. Increasingly volatile and saturated markets create a complex business environment where the ability to differentiate by leveraging the power of analytics is vital. Organizations struggle to keep up with the demand for data for analytics to gain insight into changing market conditions. The pressure on analytical landscapes to provide data for in-depth analysis is high and addressing these needs requires skilled personnel and a modern approach to data preparation.


What is data preparation (definition)

Data preparation is the process of preparing and providing data for data discovery, data mining and advanced analytics.

The goal of data preparation is to support business analysts and data scientists by preparing different kinds of data for their analytical purposes. The preparation of data can take place either in business departments or be performed centrally by IT. Data preparation is a sub-domain of data integration that can be executed with dedicated tools or traditional tools for data integration like ETL tools, data virtualization or data warehouse automation.

To find out more about current thinking on data preparation, BARC conducted an independent survey of over 695 BI professionals from a range of industries worldwide. The BARC Survey ”Data Preparation - Refining Raw Data into Value” is one of the largest studies focusing on the conditions, benefits and challenges of data preparation. Scroll down to explore the results.

Data preparation
Data preparation Challenges

Data preparation serves real business needs and is widely used

Top 3 drivers for data preparation


Today’s businesses face great challenges, as they have done throughout history. What is new is that the ability to use data systematically has become a decisive competitive advantage. Many companies have recognized this and are striving to solve many of their data usage problems by introducing or improving data preparation. The main drivers behind projects show that the hype around data preparation, which undoubtedly exists, is backed by "concrete" requirements.

High expectations of the benefits of analytics and the need for agility are driving the use of data preparation. The share of companies already using data preparation to quench their thirst for information is correspondingly high. Almost 70 percent of respondents reported they already use data preparation.

Data preparation benefits

Expectations already exceeded – broad benefits from data preparation

When companies adopt trending technologies, they often have high expectations. This is also the case with data preparation. Data preparation tools and methods are used to tackle major challenges, and our survey results show they indeed provide benefits.

The already high expectations of users are consistently exceeded. If suitably embedded in the organization, data preparation offers a real opportunity to provide data for analytics in better shape and faster, and thus generate immediate benefits for the enterprise.

Expected and achieved benefits from data preparation


Data preparation usage

Companies are still searching for the right implementation

The magic formula for anchoring data preparation within the organization is yet to be found. Although the use of data preparation methods and tools is now widespread, a clear division of tasks between IT and business users has yet to crystallize. Respondents to our survey indicate that organizations have adopted a broad range of approaches.

IT plays an important role in the two most popular approaches (i.e. in at least half of the companies surveyed). The proportion of enterprises in which business users are actively working on preparing data is also high. Company size is less decisive in determining who and how data preparation is performed than the technical complexity of the requirements and the skills profile that exists in each department. Therefore, there is no sign of a one-size-fits-all magic formula, not least because the data preparation tools market is evolving rapidly.

Use of data preparation


Hover/Touch if labels are hard to read

Problems with data preparation

The value of data is recognized but the skills to make data useful are in short supply

Data preparation especially supports business departments and data scientists when it comes to preparing data for data discovery or advanced analytics, data mining and data science. The ability to gain valuable insights from data through efficient and largely independent data preparation in business departments is currently seen as patchy. Therefore, there is an urgent need for training and coaching in order to be able to implement sophisticated digitalization strategies. To do this, dedicated resources and budgets need to be allocated.

According to survey participants, these are among the greatest challenges in setting up data preparation initiatives. This highlights the lack of focus on data preparation from management and the inadequate strategic anchoring of initiatives for the systematic use of data.

As with many aspects of data management, data preparation cannot be done "in passing". It has to be viewed as a valuable step in the process of creating value from data: not a one-off project that can be outsourced and completed externally, but an ongoing endeavor that requires a high degree of competence. Therefore, data preparation needs to be deeply and extensively embedded in the organization.

Challenges when using data preparation


Data preparation Collaboration

How can companies maximize the benefits?

1. Collaboration between IT and business departments

A promising approach emerging in the context of data preparation for analytics is the division of labor across departmental boundaries.

The companies reporting the greatest benefit and highest satisfaction with data preparation are those that have made data preparation a shared task between IT and business departments, where users with a strong sense of the business are able to prepare data independently in the main, but with the support of technical experts who ensure compliance with standards. This enables business problems to be answered quickly when they arise. Developed solutions are then automated by technical specialists and made available to a wider audience. This can prevent the formation of separate solutions and silos caused by the isolated preparation of data.

In order to increase the achievement of objectives, not only do business users and IT have to cooperate, but management must also provide the appropriate resources for training (especially for business users) and suitable tools, and thus also strengthen their own position as a driver for improvements in the area of analytics.

Benefits by form of data preparation


Importance of Data Governance for Data Preparation

2. Data governance

Data governance uses standards and rules to enable efficient value creation from data. The prerequisite for creating value from data and therefore the task of data governance is to establish clear responsibilities, define transparent goals and structures, provide standard definitions, and ensure data quality and data security.

Our survey participants are aware of the relevance of data governance for data preparation, in particular to ensure data quality, data security and the use of standard definitions. However, the need for action still seems to be high in order to implement these points satisfactorily.

This is not always a simple undertaking as a successful implementation often requires adjustments to the organization. In the context of data preparation, it is primarily about finding the right balance between central stability and the desired level of decentralized flexibility in data preparation.

Importance of and satisfication with data governance


Tools for Data Preparation

3. The right tools

The use of Excel is widespread and it also seems to be the first choice tool for data preparation today. The fact that Excel does not support complex data preparation should be clear to everyone, and should also explain why predominantly simple applications are implemented using data preparation.

Spreadsheets lack functionality for advanced integration tasks as well as the ability to map automated, high-performance and stable processes for data preparation. The key question here is whether Excel is inhibiting the development of data preparation's potential due to its limited scope of functions, or whether there is simply a lack of use cases that require specialized tools for data preparation.

Overall, the performance of standard tools seems to be underestimated when it comes to data preparation, either as an instrument for IT-centric approaches or to provide direct support in business departments.

Tools in use for data preparation or planned for use


For further findings and BARC recommendations, download the entire study

The BARC Survey ”Data Preparation - Refining Raw Data into Value” is one of the most comprehensive studies into the current conditions, benefits and challenges of data preparation. The global version of the study has been made available to readers free of charge thanks to sponsorship from Denodo, SAS, Tableau and TimeXtender. The German version can be downloaded free of charge thanks to sponsorship from Alteryx, SAS and TimeXtender. The interactive version is also available free of change thanks to the generosity of Tableau.

The online user survey was conducted worldwide in March and April 2017. BARC promoted this survey through websites, at events and in email newsletters. A total of 695 people participated in the survey. Most participants came from Europe (71 percent), with a further 18 percent coming from North America and 7 percent from Asia Pacific.

Respondents came from a wide range of industries, most notably manufacturing (18 percent), IT (15 percent), financial services (12 percent) and retail (12 percent).