Customer Data Platform (CDP) and what can it can do for you

I was reflecting on the recent acquisitions of CDP companies Boxever and Zaius and the launch of Salesforce CDP and I thought it would be useful to offer a primer on what a customer data platform is and why they are a critical component in a modern architecture for managing digital customer experiences.

A Customer Data Platform (CDP) is software that collates data about customers and their activities across channels and combines it into unified customer profiles which can be used to target marketing more precisely and improve return on marketing spend.


Table of Contents

Single View of the Customer

A CDP's primary goal is to generate a consolidated, comprehensive store of customer profile data called a "single view of the customer". The single view of customer is a treasured goal in retail businesses because they can use this data to improve the targeting of marketing and other interactions with customers. More precisely targeted experiences encourage each customer to spend more money, more frequently, improving the return on investment (ROI) of marketing spend. Targeting includes presenting the most relevant products, offers and services to each customer. The more data that is collected in the single view of the customer, the more insight can be derived from it, and therefore the better the results can be.

There are several barriers in front of a retailer trying to create a single view of the customer. Retailers interact with their customers across multiple channels, including social, web, App, email, contact center and store. In order to build a complete picture of the customer, a retailer must be able to join together all of this information from these different channels into a unified view. It is not always obvious which interactions in all these channels, over an extended period of time, belong to the same person. The CDP overcomes these challenges.

CDP Key Functions

The key functions of a CDP are:

  • A data model in which unified customer profiles can be stored and maintained
  • Many mechanisms (JavaScript, SDK, API, batch interfaces, etc.) to consume any type of data in order to build up the most complete picture of each customer
  • Data matching and identity resolution - determining which customer profile a new piece of information belongs to, and creating new customer profiles as necessary
  • Grouping customers into segments (or audiences) and assigning customers to segments fluidly as data about them changes
  • Recording a history of changes to a customer profile and segment membership over time
  • Open interfaces allowing external systems to access customer profile data, especially marketing execution engines that deliver marketing messages and experiences to customers
  • Pre-built connectors for the most popular of these marketing engines, to make integration of the CDP simpler and quicker
  • Reporting and analytics functions to understand data, trends and statistics
  • Management screens to manage all of the above, putting as much power as possible in the hands of marketing teams rather than needing IT intervention

CDP Identity Resolution

One of the key functions of the CDP is to resolve multiple identities of visitors into a single view of the customer. This is surprisingly hard to do. Customers use multiple devices with multiple browsers and multiple applications on each device. They interact with a retailer across multiple channels, including social, web, App, email and store and over an extended period of time measured in days, weeks, months and years. Also, their core data (e.g. name, address, age), interests and behaviors change over that time. In order to build a complete picture of the customer, the CDP keeps track of all known interactions people have with the business, both those known to the business, and anonymous people, and attempts to detect when activities originate from the same person to collect more complete profile data.

Identity Resolution Example

The retailer in this example is capturing information about visitors to its website, users of its mobile App, and recipients of emails it sends. This is a simple, common scenario. 

CDP - Identity Resolution Example
CDP - Identity Resolution Example  (Copyright Full On eCommerce)

If there was no CDP, the retailer might see some or all of these interactions over time as belonging to three or more customers. In fact, in this example they all belong to one customer. Here is how the CDP detects this and resolves these multiple identities into a single customer view. Each numbered event in the timeline above could be separated by seconds, hours, or days.

  1. There is an anonymous visit from a browser on a cellphone - it is anonymous because this is someone using Incognito mode or just a first time visitor to the site. The website drops a cookie on the browser once permission has been given.
  2. Later, someone downloads and registers as a new user on the retailer's mobile App. The App does not have access to the cookie in step 1 therefore the CDP does not know this activity is from the same person.
  3. Next, a first time visit is made from a desktop PC browser to the retailer website. At this point all three of these activities so far are seen as three different people - there is nothing to suggest they are the same person.
  4. A second visit is made to the website from the same cellphone browser as in step 1. Since the cookie is in place, the CDP knows this is the same person as in activity 1 and can add to the single view of that particular visitor, even though it still does not know the name, it will know what products or content was viewed across both visits (1 and 4).
  5. The welcome email that was sent when the mobile App registration was performed is opened and clicked from web mail using the same desktop browser as in step 3. Since the cookie was set in step 3 and the clicked email link uniquely identified the App registration, the CDP now knows activities 2, 3 and 5 are all from the same customer. But it still does not know that activities in steps 1 and 4 are also the same person. Any marketing activity that is conducted now will target this known customer based on their App registration data and their website visits using this desktop browser, but will not be able to take into account any product interest shown during the visits using the cellphone browser.
  6. The visitor continues to browse the website on their desktop and this activity is also recorded against the same customer profile identified in the previous step.
  7. Finally, the mobile browser used in steps 1 and 4 is used to login to the retailer website using the email and password setup when the mobile App was first used. The CDP can now finally detect that all seven activities therefore belong to the same individual. Marketing that takes place from now will use the rich, unified customer profile that includes all of the interactions this customer has had with the retailer.

The above is a simple example, but shows the principle of exact matching of different streams of activity, so that over time more and more becomes known about a customer, and previously isolated sources of knowledge are combined into a single view. Having this rich view of a customer gives targeting tools like product recommendation engines more power to deliver messages and products that maximize the likelihood of the customer making a purchase.

Data Matching

While the streamed data in the above example is exactly matched and consolidated, some data sources require approximate matching to previous records, for example matching on name and address. Fuzzy logic and match scoring is used on each potential match to give a probability a record from source system belongs to a individual profile already in the CDP. The CDP can be configured to only match records with a high probability of being the same person. Even then errors can be made and it should be noted that all CDP identity resolution is done at a logical level. If further information is revealed that proves a previous assumption incorrect, the CDP can reorganize the data quickly to correct each customer's profile, in essence breaking previously formed links and relationships and reforming them in response to the new information received.

Customer Events in a CDP
Customer Events in a CDP Example  (Source: Acquia)

CDPs also provide the ability to group individual customers into households, which can be used to reduce direct mail spend.

CDP Household Profile Example
Household Profile in a CDP Example  (Source: Acquia)

CDP Data Sources

The above examples are not the only sources of data that is consumed by a CDP to form a single view of the customer. The CDP is specifically designed to be configured by a retailer to connect multiple, diverse data sources, including structured and unstructured data, batch data and streamed events, through its own APIs and custom connectors.

The following are common data sources that a CDP consumes.

Transactional Data

On of the key indicators of a customer's propensity to buy is what they have bought before (and when), so sales transactions are key sources to connect to the CDP. This includes ecommerce sales transactions, add-to-basket and checkout events, in-store purchases, and order returns and exchanges.

Behavioral Data

Vast amounts of data is generated about a customer's behavior. Digital channels in particular are rich sources of data about the customer. Customers' website visits and their usage of the mobile App speaks volumes about their interests in products, product categories, brands, price points and promotional offers. Clicks on items of interest, searches they perform, data entered into forms, newsletter sign-ups, even scrolling and mouse movements, can all be collated and interpreted by the CDP to reveal important insights into a customer. Website behavioral data is collected by inserting a simple JavaScript file into the HTML of every page of the website. Usage of Apps and other applications is tracked by implementing calls via an SDK provided by the CDP vendor.

Profile Data

Once customers make themselves known to a retailer they reveal or offer up data such as name, billing and shipping addresses, payment types, birthday, gender, interests and any responses to surveys they are sent. Permissions they grant or revoke must also be stored and honored. This data is captured and stored by an App or website and a copy is sent as a batch to the CDP to augment the customer profile.

First party data captured by the retailer can be supplemented with external, third-party data such as demographic data, based either on the individual or their neighborhood, to enrich the data model being assembled by the CDP.

Social Data

Customers reveal much about themselves as they traverse social networking sites and generate content for others to consume. Customers who like a retailer's Facebook page and who subsequently login to the retailer website using Facebook Connect are providing valuable social graph data to the business. This can all be consumed by the CDP.

Non-Customer Data

Depending on the CDP and how it has been configured and the role it will play, the CDP might also need to consume and manage data sets that are not specific to a customer, but rather allow the marketing systems that will consume the data in the CDP to operate more effectively. Examples include product data including brand, price and other attributes, also stock levels by location. This data can be used by the marketing execution system to ensure that an offer being sent to the customer is for a product they do not already own, at a price point they are likely to be attracted to and is in stock for delivery or in the location where they live.

Predictive Metrics

One of the more advanced features of a CDP is the prediction of customer metrics based on the information known about them. The CDP predicts, based on observation, history and machine learning, metrics for each individual customer, such as:

  • next purchase probability, date, value and channel
  • churn probability by date
  • upsell / cross-sell probability, by product category / product

The retailer can intervene based on these predicted metrics, to change the outcome. For example a telco could send an attractive renewal offer to customers predicted to churn, or a retailer could send a discount code limited to a larger basket size to a customer who is predicted to buy, to increase their spend. Here is how these interventions are managed:

  • Predictive metrics are generated and maintained automatically by the CDP
  • Marketing actions to take (offers, experiences, etc.) are designed by the retailer's marketing and commercial teams
  • Marketing communications are enacted by the marketing execution engines, e.g. to store personalized offers, send push notifications and emails
CDP Prediction of Churn
CDP Prediction of Churn  (Source: Treasure Data)

Segmentation, Audiences and Experimentation

The CDP is used to segment the customer profiles it builds. As data is received, a customer's membership of segments can change in real-time. This ensures any marketing system reading data from the CDP is always doing so based on the most up-to-date insight.

The differentiation between the terms segment and audience is subtle and not universally applied. Sometimes segment is used to mean a subset of a broader audience. Sometimes audience refers to a grouping of customers to which marketing communication will be sent whereas segment is any other grouping.

The term audience is widely used among Data Management Platform (DMP) providers, which is a class of software with some overlap with a CDP, but is designed primarily to identify which customers to target with online campaigns on third-party websites. A DMP consumes mostly third party data, and does not build persistent customer profiles or allow much broader uses like a CDP does.

The CDP you use will use these terms or others, and define them, so get used to whatever terminology they use.

Rule based segments are defined through rules designed by the retailer, for example VIP customers aged between 18-30 living in Ohio. The CDP provides a rule builder to enable these rules to be defined and the CDP will maintain membership of these segments as information about customers changes.

Predictive segments are defined not by the customer's known attributes, but by their expected or predicted behavior. By defining an outcome and a likelihood of that outcome being reached by each customer, the CDP can identify a group of customers that can be targeted. For example, by defining a outcomes of "customer cancels subscription" and "customer does not renew subscription", and likelihood of 80%, the retailer can direct messaging or offers at these customers to reduce their likelihood of churning, while saving the spend or margin that would be wasted by targeting customers unlikely to churn. 

Static segmentation is also normally supported, where a query is run in the CDP at a specific point in time and an audience of customers is extracted for a specific communication or campaign. Customers whose data has been lost in a data breach would be an extreme example of a static segment.

Event-triggered segmentation is evaluated in response to a specific event. For example using geofencing to identify a target group of customers near a specific store is only evaluated the first time a customer visits the website and no more frequently than daily thereafter. Once a customer is in this segment, they remain in it until the next event triggers a re-evaluation of whether they should remain in it or be removed.

Scheduled or recurring segmentation is evaluated on the basis of a fixed schedule, such as daily. Each day the CDP can be set to evaluate which customers have a birthday today, or identify customers who purchased a consumable supply a month ago but have not re-ordered a replenishment since. Another common segment calculated on a scheduled basis is recency, frequency, monetary (RFM) segmentation, where the most recent purchase, the number of purchases in a defined period and total spend in that period are all calculated for every customer and customers can be segmented accordingly.

RFM analysis from CDP
RFM analysis from CDP  (Source: Exacaster)


Experimentation, in the form of A/B testing, split testing, multi-variate testing, or similar techniques, can be a feature within the CDP, or this activity can be delegated to a specific A/B testing solution. Some vendors have both CDP and A/B testing software in their portfolio, e.g. Acquia and Bloomreach.

The idea is to test hypotheses made by the marketing team, to assess whether segment definitions and marketing activities are producing the expected results, or whether changes in definition and execution will produce a greater ROI. By testing a marketing action on a small group from the segment that has been defined, the results can be assessed and only successful activities launched for all customers in the segment. 

Use of CDP for Loyalty Programs

CDPs can be used as a key component of a customer loyalty program, since it has access to purchases, contact and permission information. It can be used to maintain rules on eligibility and membership and calculate tier levels. It can also be used to assign point values for other events it is aware of, such as mobile App registration or newsletter sign-up.

The CDP can also be used to motivate customers to ascend tiers, by segmenting customers who are close and offering incentives for actions they can take to reach the next level in the program. By automating these types of marketing activities, a retailer can make their program both more successful and cheaper to run.

CDP Loyalty Campaign SessionM
CDP Loyalty Campaign  (Source: SessionM)

CDP Build Versus Buy

CDP is a relatively new class of software application and many consumer-facing businesses have built software that does much of what a commercial off the shelf (COTS) product would do. The business case to rip out this bespoke software and replace it with a commercial CDP solution may be marginal, depending on the sophistication already in place or the issues the current solution has.

For businesses without a CDP and no significant CDP-like functionality in their existing systems, the question arises whether to build a CDP or to buy a package. This evaluation is similar to any other build vs buy analysis. The optimum route will depend on factors such as the skills of the IT team, the investment required, time-to-value, strategic importance of this capability and the retailer's view on the strengths of commercial packages available and uniqueness of their business model.

CDP and Web Analytics

CDP offers a view of individual customers and collates these into segments (or audiences) that have particular characteristics or intents. This segment information can be pushed into the retailer's web analytics tool (e.g. Google Analytics) in order to report on web traffic at the segment level. The CDP is far superior to the Web Analytics tool as it can consume and consider far more sources of data. Google Analytics has since 2019 allowed mobile App usage to be combined with web traffic for reporting purposes, but a CDP can consume far more sources and has much better identity resolution tools to give more complete customer profile data and hence derive more accurate, more useful, segments.

The other major difference between web analytics and CDP is that the CDP is designed with open interfaces that allow other systems to access customer profile data, whereas a web analytics tool's output is reports and visualization of aggregated insights. Marketing execution engines such as product recommendations, email marketing or ad serving technologies can more easily be integrated with a CDP, to make their marketing efforts more effective. 

Tealium Universal Data Hub Screenshot
Tealium Universal Data Hub Screenshot  (Source: Tealium)

CDP Vendors

CDP is a relatively new category of software and vendors are still in the making, being bootstrapped or funded by VC or corporate capital, and momentum in the market is shifting rapidly. The following is a list of Customer Data Platform software vendors at various scales & levels of maturity:- Adobe Experience Platform, Acquia AgilOne, Bloomreach Exponea, BlueConic, Lytics, mParticle, Salesforce Interaction Studio (Evergage), SAP (Emarsys and Gigya), SALESmanago, SessionM, Simon Data, Tealium AudienceStream & EventStream (part of Universal Data Hub), Twilio Segment, Treasure Data.

This article was updated on May 29, 2021

M Ryan

M Ryan is an ecommerce consultant with twenty years experience working with retailers, consumer brand manufacturers and other consumer-facing businesses helping them to develop their ecommerce strategy, implement ecommerce technology and improve their ecommerce operations. He works extensively throughout US and Europe, with clients including global brands, large retailers and household names in consumer goods.