Content

About

Harnessing and Actioning the Power of Data

Elizabeth Mixson | 12/14/2020

Data is no longer merely an asset, it’s the lifeblood of the modern, digital economy. 

As legacy businesses increasingly evolve into data-centric organizations, investments in data and analytics infrastructure is on the rise. According to IDC, the Big Data and business analytics industry reached $189 billion in 2019 and is projected to reach $274 billion by 2022. Furthermore, a recent industry survey conducted by Sisense revealed that 49% of business leaders say that analytics initiatives have taken on increased importance since the onset of the COVID-19 pandemic.

However, when we talk about “data,” what exactly do we mean? What many articles, surveys and reports neglect to mention is that there are two types of data that leading-edge organizations use to deliver advanced analytical insights: internal and external data. 

 

What is internal data and where does it come from?

Internal data refers to data that a company generates, collects and controls on its own. Basically any information pulled from internal databases, software, customers, and reports would be considered internal data. 

Examples of internal data include:

Financial and transactional data

  • Purchase orders
  • Payroll
  • Billing & accounting

Customer data

  • Purchase history
  • Demographics
  • Preferences
  • Incidences
  • Clicks or website behavior

Business Systems & Applications

  • Collaboration software
  • Project Management Tools 
  • Marketing and CRMs

Operational data

Employee data

  • Timesheets
  • Salary
  • Work history
  • Performance reports

Internal Documents & Archives

  • Emails
  • Memos
  • Invoices & receipts
  • Contracts

IoT sensor data

  • Space usage
  • Temperature tracking
  • Production data

 

Companies can accomplish a lot just using internal data. For decades, organizations have relied on internal data to power continuous improvement methodologies such as lean six sigma. By identifying performance gaps and areas of waste, the insights gleaned from enterprise data alone can help businesses dramatically reduce overhead costs and boost productivity. 

Companies are now taking things one step further by not only using internal data to optimize processes, but to drive innovation as well. For example:

  • Tesla is using the massive amount of real world road data it collects through its robust enterprise data architecture to improve upon existing car features as well as develop new products such as car insurance
  • In the wake of COVID-19, hospitals are using internal data to optimize both clinical and operational processes to not only reduce costs, but maximize resource planning to ensure they’re able to fulfill all aspects of their mission
  • Zappos tapped into its vast customer data to build an ML-powered predictive model that accurately predicts shoe size. Since its inception, this new feature has dramatically decreased returns due to sizing issues.
  • Insurance companies can use internal data to identify and mitigate potential incidents of fraud and agent gaming.

From using process data to drive operational efficiency to transforming customer insights into data monetization strategies, internal business intelligence (BI) is a powerful as well as reliable transformational tool. However, internal data and analytics doesn’t always tell the whole story. What leading-edge companies are beginning to realize is that in order to unlock the power of predictive and prescriptive analytics, they’re going to need to start incorporating external data sources into their analytics frameworks as well. 

 

What is external data?

External data is data that is generated, collected and stored outside of the business. Often unstructured, it can include anything from public, government issued information to user-generated social media content. While internal data helps organizations run their business and optimize operations, external data is often used to build more complex predictive models and gain deeper insight into customer behavior and competitive landscapes. 

Generally speaking, there are 3 types of external data: open data, paid data and shared data.

 

Open Data

Information that's freely available to everyone to access, use and share. Typically this data is either unstructured or semi-structured. Examples of open data include. 

 

Paid Data

Data that is sold and packaged by 3rd party vendors. Paid data runs the gamut when it comes to quality, type and price. Organizations such as Thomson Reuters, Morningstar  and FactSet provide finance organizations with a vast array of global market insights to guide investment-related decision making. Axciom collects, analyzes and sells customer and business information used to create targeted advertising campaigns. Nielsen gathers and sells all types of consumer data ranging from shopping habits to media consumption trends. 

And the list goes on. According to Gartner, data service providers tend to fall into one of 3 categories:

  • Simple data services. Data brokers collect data from multiple sources and offer it in collected and conditioned form. The data is used as additional input to a decision process by a person, an application system, or a device in an IoT ecosystem.
  • Smart data services (DaaS). Data is enhanced by applying analytical rules and calculations. The results often take the form of scores or the tagging of objects, similar to the services provided by marketing data providers and credit ratings agencies.
  • Adaptive data services. Customers submit data pertaining to specific analytical requests. Providers combine that data with data from other sources.

Paid data is generally purchased in one of two ways:

  • Database level - meaning a company purchases or licenses access to an entire database
  • Column level - meaning they only buy a small amount or a single column of targeted data 

 

As you can imagine, purchasing at the database level is much, much more expensive than at the column level, especially when one considers that customers, on average, use less than 20% of the data they buy. 

 

Shared data

Refers to data that is shared between two entities. For example, pharmaceutical companies might share data with various partners across the supply chain such as shipping providers, customers, warehouse, manufacturers, etc. to track high value products as they travel from the manufacturing stage to customer delivery. 

Complimentary companies may also form a “informational partnership” whereby they exchange raw data. For example, healthcare organizations might share clinical data to help accelerate the development of new treatments. 

Companies can share data in a number of ways: data sharing platforms, portals or even excel sheets. One solution for data sharing across entities is blockchain. Blockchain platforms allow organizations to safely and securely share data by creating a database - known as a distributed digital ledger - with an immutable record of every transaction which has ever taken place.

Due to its high level of accuracy, companies are starting to export static and real-time blockchain data into their analytics and data visualization platforms. From developing enhanced fraud detection capabilities to predicting customer behavior, blockchain and similar technologies are poised to play a pivotal role. 

 

Other types of external data

Social media posts, publicly accessible websites and published documents are also potential sources of external data. Unstructured, complex and often highly dynamic, transforming this data into actionable insights can be especially difficult. 

To extract data from these sources, most companies use natural language processing (NLP) or computer vision tools such as web crawlers, RPA bots and social media scraping tools. The data extracted could include anything from text, images, likes, shares, etc. Once the data is harvested and cleansed, it can then be converted into a numerical format and, using data mining techniques, analyzed. 

 

The Challenges of Using External Data

Despite its promise, acquiring and leveraging external data is not without its challenges. 

To start, simply identifying what external data is needed and where it can be reliably sourced can be quite difficult. Secondly, paid data solutions can be prohibitively expensive. It’s not unusual for blue-chip data purchases to cost upwards of hundreds of thousand dollars per year. 

The data vendor market is also incredibly complex and can be difficult to navigate. As the industry is rather new, many companies are still figuring how to best work with external data vendors and effectively negotiate preferable purchase and liability terms. Plus, performing due diligence when it comes to data quality remains difficult if not impossible for many companies, making such purchases a true gamble.

In addition, before external data can be used in any capacity, it must be fully integrated into business intelligence systems. Though there is no one standardized approach to external data integration, tools such as middleware, application based integration (API) and uniform access integration can be used to translate external data into a format that is compatible with internal systems. Also, as it is often not as high quality as internal data, external data must be thoroughly tested and cleansed before it can be integrated.  

There are also legal implications to consider. For example, certain external data may be subject to different privacy laws than internal data. Establishing a game plan for how external data should be stored, catalogued and leveraged is critical to preventing the misuse or mishandling of potentially sensitive data. 

Last but not least, external data must be just as readily available and accessible as internal data. User friendly, centralized data storage solutions such as databases, warehouses and lakes not only ensure data is protected, but easily pulled via queries. 

Despite these challenges, companies are still investing in new, external data sources. In fact, as of  2018, 46% of companies were leveraging external data in some capacity. In a different survey, an additional 92% of data analytics professionals said their firms needed to increase use of external data sources. 

 

External Data Use Case

Remember, it’s not internal vs. external data. But rather, external data + internal data = actionable intelligence. 

The use of external data can enhance a company’s wider data science approach in a number of ways. Chief among them is the development and use of AI-powered predictive modeling. By integrating internal data with external data using AI and ML, organizations can deliver more accurate, meaningful predictive and prescriptive insights

Below are a couple of examples of how companies are harnessing the power of external intelligence to deliver deeper, more meaningful insights:

  • In January 2020, Delta announced they were creating a full-scale, AI-enabled, digital simulation environment for its global operation. Amongst other things, this new data science ecosystem combines internal data (i.e. flight schedules, aircraft positions, customer service data) and external data (i.e. weather data, airport conditions, geo-political events) to help Delta’s professionals make critical decisions before, during and after large-scale disruptions.

 

  • Spotify uses Natural Language Processing (NLP) to rapidly and continuously scrape the internet for articles, blogs and metadata to identify what new artists and songs are trending. It then uses this information to build playlists and provide users with personalized music recommendations. Spotify also uses NLP and convolutional neural networks (CNN) to “listen” to music and, based on what it “hears,” automatically categorize genre and group together similar titles. 

 

  • Using data collected by the Weather Channel, Pantene and Walgreens are able to predict when women will buy the most anti-frizz hair products. To capitalize on this trend, they developed a Your Local Haircast" marketing campaign that targeted these consumers with personalized discounts and social media content. This resulted in a 10% increase in sales of Pantene at Walgreens for the months of July and August that year.  

 

  • Walmart is using open source intelligence to crowdsource information on the competitive landscape, spot emerging trends and understand its policies impact the world at large. For example, a few years back, Walmart announced they were increasing salaries to above federal minimum wages. Using social analytics, they were not only able to better understand how the announcement was received by the public and news outlets, they were also able to see how it impacted global commodity markets.  

 

  • Tesco uses weather data to help predict sales volumes and stock requirements. According to reports, this approach saves them an estimated £6m ($7.5m) per year and “reduced out-of-stock by 30% on special offers.”

 


 

How is your organization leveraging internal and external data? Please take 2 minutes to participate in our survey to let us know.

 

Create your own user feedback survey

 

Can't view the survey above? Access is here: https://www.surveymonkey.com/r/X2V57WW 

Upcoming Events


Data Management for Generative AI in Automotive

11th - 13th February 2025
NH München Ost Conference Center, Munich, Germany
Register Now | View Agenda | Learn More


CDO Healthcare Exchange

February 18 - 19, 2025
Le Méridien Dania Beach, Fort Lauderdale
Register Now | View Agenda | Learn More

MORE EVENTS