Since its rise to prominence in the early 2000s, Amazon has been a pioneer of enterprise data & analytics innovation, especially when it comes to transforming customer behavior into strategic decision making.
It’s well documented that Amazon collects data on everything from website clicks to Alexa voice commands to biometric data, and that’s just on the consumer retail side of the house. Behind the scenes, Amazon also collects ample data on its employees, AWS clients and 3rd party sellers. It even submitted a patent a few years back for a “surveillance as a service” product that would allow people to schedule drones to keep tabs on their property and, as an added bonus, enable Amazon to collect massive amounts of aerial footage on roads, topography, etc.
With over 175 fulfilment centers around the world, over 1 million employees and close to 200 million website visitors per month, Amazon has one of the largest and complex data footprints out there. However, how does Amazon wrangle this massive amount of data into actionable insights? Let’s take a look.
Amazon’s Galaxy Data Lake
Given that Amazon hosts an estimated 1 trillion gigabytes of data across more than 1,400,000 servers, it’s no surprise that standard data management and analytics tools don’t really cut it. In addition, given Amazon’s scale, minor inefficiencies add up fast and result in millions of dollars worth of losses. Even small data inaccuracies and processing errors (i.e cost per unit, delayed data) could have catastrophic consequences.
To eliminate these issues once and for all, Amazon’s financial team developed and released its Galaxy data lake in 2019.
According to a blog post by Amazon’s CTO, Werner Vogels, Amazon’s Galaxy data lake has helped them:
- Break down data silos - Data lakes unite all the data into one central location
- Analyze diverse data sets - allows users to import any amount of data in any format because there is no pre-defined schema
- Manage data access - All data is stored in once centralized location so users only need one set of credentials
- Accelerate machine learning - Because ML and AI thrive on large, diverse datasets, data lakes make it much easier to combine datasets to train and deploy more accurate models
*Image sourced from "All Things Distributed," https://www.allthingsdistributed.com/2020/01/aws-datalake.html
Amazon’s Personalization Engine
As mentioned before, Amazon collects customer data across multiple touchpoints and, using AI, machine learning and item-to-item collaborative filtering, transforms that data into predictive insights that reveal what the customer is likely to search for or purchase next. In addition to onsite customer behavior (such as what one clicks on or customer reviews), Amazon also analyzes auxiliary data such as address and age to make estimates pertaining to income level and other factors that could shape a customer’s buying habits.
However, that’s not all. Amazon also uses customer data to develop dynamic pricing models whereby prices are set according to a customer’s activity on the website, competitors’ pricing, product availability, item preferences, order history, expected profit margin, and so on. As the system works on real time data, on average, Amazon shifts the price of products every 10 minutes.
Supply Chain & Logistics Optimization
One of Amazon’s most disruptive as well as celebrated achievements is its free, one day delivery model. However, without advanced data and analytics capabilities, this perk wouldn’t be possible.
Amazon’s data driven supply chain system automatically selects the warehouse closest to the vendor and/or the customer. This simple strategy enables the company to ensure swift delivery times and reduces shipping costs by more than 50%. In addition to predicting what individuals are likely to purchase next, Amazon is also adept at using advanced analytics to predict when a product is likely to sell out. This allows them to proactively stock shelves and advise shoppers to purchase low stock items ASAP.
When does competitive advantage become unfair advantage?
Amazon’s comprehensive and innovative approach to data analytics has fueled its rise to becoming one of the most valuable companies in the world. Though Amazon’s approach to acquiring and using customer data has drawn its fair share of scrutiny, Amazon’s use of shopper data has, so far, remained above board.
However, more than 50% of sales on Amazon come from third party sellers. Though those sellers are Amazon customers, in a sense, they are also the competition. Thus arises the questions, does Amazon’s access to nonpublic third-party seller data, such as order volume history, shipping data and sellers’ past performance, equate to an unfair competitive advantage?
Though the U.S. government has yet to file antitrust charges, a number of small business associations are lobbying for it to do so. In addition, in November 2020, the European Commission launched an investigation into whether Amazon’s use of third party seller data is a violation of antitrust law.
According to the EU Commission’s Competition chief, Margrethe Vestager ,“Our investigation shows that very granular, real-time business data relating to third party sellers’ listings and transactions on the Amazon platform systematically feed into the algorithm of Amazon’s retail business. It is based on these algorithms that Amazon decides what new products to launch, the price of each individual offer, the management of inventories, and the choice of the best supplier for a product.”
Online retailers are increasingly embracing the third-party seller model to boost revenues. Walmart, ASOS, Costco and Target are just some of the well known brands expanding into this arena. How the Amazon antitrust suit pans out will have massive implications on these retailers as well as the small businesses that use these platforms to sell merchandise.
In addition, as more and more companies develop data monetization schemes, conflicts of interest will arise. Though, historically speaking, regulatory bodies have focused less on data usage and more on personal data protection, this is clearly changing. As companies become more proficient at using data to drive competitive advantage, calls for accountability will only increase.
That being said, incorporating data ethics into data governance frameworks is about more than just future litigation avoidance. By minimizing data collection and usage, organizations can focus on the insights that deliver the most value while also maximizing agility, ROI and consumer trust.
Become a Member of the AI, Data & Analytics Network TODAY!