Make Data Accessible to Everyone with Data Democratization
A Quick Guide for transforming your workforce into data scientists
Add bookmarkData is more than just the currency of business, it’s the lifeblood of growth, innovation & profitability. As enterprise data and analytics emerge as critical drivers of competitive advantage and strategic performance, ensuring that all employees, not just IT analysts and data scientists, have access to actionable data is more important than ever. This is where the concept of data democratization comes in.
Until relatively recently, most companies have relied on traditional Main Data Management (sometimes known as Master Data Management) Frameworks to store and provide accessibility to enterprise data. These approaches typically fell into one of two categories:
- The IT-Ownership Model: All data-related requests, and their natural follow-up queries, must flow through the IT department. Even where data has been unified, it’s kept in unstructured formats accessible within environments not conducive to non-technical users.
- The Data Silo Model: Data is created, collected and stored within siloed systems that are usually accessible and purposed for role-specific tasks. Each business unit works off of a fragment of the big picture and lacks the ability to benefit from data generated by the others.
However, what companies are realizing is that if data is their greatest asset, shouldn’t everyone be using it?
Simply put, data democratization is the process of making data accessible to everyone. As Bernard Marr, bestselling author of “Big Data in Practice” explains in an article he wrote for Forbes,
“Data democratization means that everybody has access to data and there are no gatekeepers that create a bottleneck at the gateway to the data. The goal is to have anybody use data at any time to make decisions with no barriers to access or understanding.”
Beyond providing non-technical users access to enterprise data systems and data lakes, the real goal of data democratization is to transform all employees into data scientists; a pivotal step towards establishing a data-centric enterprise. Accomplishing this requires more than just implementing new, user-friendly self-service analytics dashboards. It also necessitates a comprehensive cultural transformation and reskilling of talent to ensure data literacy.
The 6 Pillars of Data Democratization
The Road to Self-Service Business Analytics
As defined by Gartner, self-service analytics “is a form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support.” It is the key enabling technology of data democratization as it allows business users to build and manipulate their own reports with little to no training.
Self-service analytics goes beyond simple BI reporting by enabling the user to interact and engage with data on their own terms while also protecting it against corruption or misuse.
However, accomplishing this requires years of careful planning and a complete reimagining of enterprise data systems and architecture. Enterprise Data Management (EDM), the process of inventorying and governing the entirety of an organization’s business's data, is one common approach to ensuring enterprise data is properly controlled, integrated and usable.
A robust enterprise data management strategy enables data democratization by:
- Breaking down data silos by supporting the creation of a single enterprise destination for high-quality, secure, trusted data
- Balancing access and control
- Unifying data from disparate sources into common enterprise data models and entity hierarchies
- Promoting a data-centric culture
According to Miratech, the 4 pillars of successful EDM are:
- Data governance – a collection of standardized policies and processes that control the management, usage and accessibility of data within an organization.
- Data integration – the process of consolidating data from different sources into one centralized system.
- Data accessibility – consolidation of data to one easy-to-access location or view
- Data security – the process of protecting data from unauthorized access and data corruption throughout its lifecycle
Data Democratization Tools & Technology
Enabling data democratization, at least from a technical standpoint, starts with the analytics stack. An analytics stack is an integrated system of applications that collect, combine, analyze, and realize the value of data. It includes all of the tools an organization uses to transform raw data into transformational insights. Ensuring that every element of the data stack is integrated and aligned with the strategic objectives of the business are paramount to achieving data democratization.
Common tools included in data stacks that promote increased visibility and accessibility include:
Data Warehouses & Data Lakes
A data warehouse is a system that pulls together data from many different sources within an organization for reporting, analysis and decision making. Modern data warehouses do much more than simply “store” historical data, they also structure data - in other words, gather, contextualize, and enrich it - to ensure it’s analytical readiness. As data is processed before it is stored, data warehouses are best suited for stable but complex analytical queries. For example, data warehouses are ideal for any sort of recurring report.
Data warehouse integrated data using ETL - which stands for extract, transform and load. ETL works by collecting (extract) data from a source system, converting (transform) it into a format that can be analyzed, and storing (load) it into a data warehouse or other system.
Some of the leading data warehouse solution providers include Snowflake, Oracle, Talend and Amazon Redshift.
Data lakes are similar to data warehouses in that they are capable of storing vasts amounts of data. However, unlike data warehouses, this data is unstructured, disparate and left in its native format. Many data lakes leverage object storage and open formats to allow multiple applications to access and action data at once. In other words, processing and analytics layers are built on top of a centralized, format-agnostic data repository.
Data lakes typically integrate data using ELT. An alternative to ETL, ELT (extract, load, transform) pushes the transformation step downstream making it so data is only “transformed” on demand once inside it’s final destination. As storage is seperate from computation, data lakes tend to be easier to scale and adjust based on real-time needs making them ideal solutions for ad-hoc reports.
However, proper data governance and management practices in place, a “data lake” can easily turn into a “data swamp” where valuable data is lost in a sea of corrupt or unusable data. In order to avoid this, data governance should be incorporated into data lake design to only high quality data enters the data. In addition, successful data lakes also include data catalogues that use metadata to organize/classify data sets and enhance their discoverability.
Though data lakes theoretically make data more accessible to regular business users, this doesn’t always turn out to be true in practice. Many organizations that adopt a data lake approach still rely on data scientists and engineers to run queries and build reports as doing so at many organizations still requires significant technical training. In addition, data warehouses are becoming increasingly user friendly with built-in-interfaces designed for the average, non-technical business user.
With that in mind, instead of choosing one or the other, organizations are increasingly opting to use both as the two technologies complement each other quite well. While the data lakes support newer, more experimental use cases such as streaming analytics and machine learning, data warehouses remain ideal for producing standardized BI reports, dashboards and OLAP (online analytical processing).
Some of the major data lake solution providers include Microsoft Azure, Qlik and IBM
*Image sourced from "Data Lake vs Data Warehouse: Which one should you go for?", https://www.grazitti.com/blog/data-lake-vs-data-warehouse-which-one-should-you-go-for/
Dashboards & Data Visualization
other visual format. Data dashboards, on the other hand, are tools that provides a centralized, interactive means of monitoring, measuring, analyzing, and extracting relevant business insights from different datasets in key areas while displaying information in an interactive, intuitive, and visual way. In other words, dashboards use data visualization techniques to communicate data-driven insights.
Dashboards and data visualization solutions are the primary tools used to convert raw data into the “language of the business” making them indispensable to data democratization efforts. In fact, one could say they are the “face” of self-service BI and analytics. While other data management tools may make data more accessible, dashboards and data visualization are all about making data-driven insights understandable and actionable to everyone.
Tableau, Sisence and Microsoft Power BI are some of the leading vendors in this space.
Machine Learning and AI Automation
Data democratization, Artificial Intelligence (AI) and other cognitive computing tools are inherently intertwined. One of the long-term goals of data democratization is to, essentially, crowd-source innovation. If everyone at an organization is trained to be a“data scientist,” in addition to making more evidence-driven business decisions, at least some of these people should be able to make meaningful contributions to the development of better, more accurate AIs. Or so goes the thinking.
On the flip side, AI and machine learning are also key enabling technologies of data democratization. By automating tasks that once took one or more experts to accomplish, AI and machine learning not only make self-service analytics easier to use but more powerful as well. For example, next gen tools such as predictive and prescriptive analytics leverage machine learning algorithms to analyze historical data and formulate outputs. AI-powered text mining tools such as Natural Language Processing (NLP) help analytics systems understand and work with unstructured data contained in documents, transcripts and other written documents. And the list goes on.
Some key vendors in this space include DataRobot, dotData and Indico.
Enterprise Data Catalogues
Similar to the data lake catalogue mentioned before, an Enterprise Data Catalogue is an inventory of all data assets across an organization. More than just a simple index of data sources, EDCs also include information on how to properly access and work with data.
Automated enterprise catalogs automate the curation of and access to enterprise data. Using AI and machine learning, these tools continuously scan internal systems looking for and indexing new data sources. They then classify and organize data assets based on metadata making the easily discoverable and interoperable.
Notable EDC solution providers include Informatica, Alation, Oracle and Microsoft.
What is Data Literacy and Why Does it Matter?
Providing unfettered access to high quality data means little if your workforce doesn’t know what to do with it.
The ability to decipher, derive meaning from and use data as information, much like one would do with knowledge gleaned from reading a book, is known as Data literacy. As Gartner defines it, “Data literacy is the ability to read, write and communicate data in context, with an understanding of the data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case application and resulting business value or outcome.”
Unquestionably, data literacy is one of the most critical, in-demand skillsets of our current era. Despite this, according to Gartner, about 50% of organizations lack the AI and data literacy skills to necessary achieve business value.
In addition, a 2020 Accenture survey report found that:
- 75% of employees are uncomfortable working with data
- 1/3 of employees have taken at least one sick day due to data-induced stress
- A lack of data literacy costs employers 5 days of productivity per employee per year. For large companies this equates billions of dollars in losses
To combat this, many organizations are developing more comprehensive data literacy training programs. In addition, solution providers such as Experian, Tableau and Qlik are also ramping up their customer facing data literacy programs to help ensure their subscribers/users get the most value out of their products as possible.
Data Democratization Use Cases & Success Stories
Cisco won the 2020 SuperNova Award for its data democratization efforts. In a statement announcing the prize, Abra Le, Senior Manager, Change Management, at Cisco stated “At Cisco we strive to enable data democratization while simultaneously ensuring our data is properly governed. To succeed, we needed a platform that supported a people-first approach and provided business units with the ability to utilize data to drive their business forward. With Alation we consistently deliver high-quality governed data with context in a single platform, providing our business users with visibility into what data exists and where it resides, allowing them to generate more value from that data. We are honored to be named a winner in the Constellation SuperNova Awards for pursuing digital safety, governance and privacy.” You can watch a video outline Cisco’s data democratization below.
Airbnb trained thousands of employees to become “citizen data scientists.” To do so, they built their own Data University tasked with ensuring Airbnb employees were equipped with the skills necessary to analyze the data independently no matter what department they worked in. In just the first six months the program was in operation, weekly active users on its analytic systems increased by 66%.
Pitney Bowes wanted to better use data to understand and serve clients. But to do that, they realized every department needed to be able to tap into and employ data.
Boehringer Ingelheim successfully democratized data by creating a shared framework of metadata across clinical trial phases to give researchers real-time data and ensure a seamless exchange of information across the entire process. Doing so it was able to achieve better, faster data flow in its clinical trials processes to accelerate its drug pipeline.
The Royal Bank of Scotland embraced data democratization to uncover new ideas on how to deliver seamless, fully integrated customer experiences. “Raising visibility from our digital marketing platform and data-driven strategies was vital to the shift,” bank’s head of analytics, Giles Richardson, told InfoWorld. “We had to have concrete, measurable insights and ways for our cross-functional teams to act on them to propel RBS into its next chapter.”
Popsugar. Using democratized analytics tools—including customizable data dashboards that made the right data available to the right people at the right time—the Popsugar’s content strategists were able to create a continuous flow of social media content, native advertising videos, and events.
How is your organization approaching data democratization? Take 3-minutes to complete our survey and share your perspective.
Create your own user feedback survey
Alternatively, you can access the suvey here: https://www.surveymonkey.com/r/T5LHM5H