What Are Data Contracts & How Do They Work?
Master the art of Data Contracts to safeguard your data quality and improve the impact data has on your business.
Craig Dennis
February 13, 2024
8 minutes
What are Data Contracts?
Data contracts are formal agreements that contain validation rules to define how your data is structured before it gets sent to other systems. These contracts help you enforce standards and improve your data quality to ensure you don’t corrupt tables in your data warehouse.
Within an organization, data providers (e.g., product teams building a website) and consumers (e.g., analysts, marketers, etc.) collaborate to create data contracts. Data contracts are the cleanest and most reliable way to ensure you maintain high data quality standards within your organization. These contracts allow you to define the parameters for what data you collect and what format you collect it in.
How do Data Contracts Work?
Data contacts require you to specify the format of the data you’re going to capture, which is listed in each of the events you’re tracking. You declare the properties that are collected and the format they should be. When a user triggers an event, the event data is validated against your data contracts. If any data is invalid, it gets flagged up by the enforcement rules you’ve set and dealt with accordingly.
To help you understand, take a look at the data below and ask yourself the following questions:
- Can the user_id be empty?
- Can the value be a string?
- Can the product_id contain letters?
- Can value contain a string?
This is the initial thinking required to set up and build a data contract properly. When it’s clear what the format of your data SHOULD look like, you can create data contracts. In this example, the following should be correct:
- User_id should always contain a value
- Existing customer should be a boolean (TRUE or FALSE)
- Product_id can contain letters and numbers
- Value should always be a number
If this data is captured via event collection and the data contract is enforced, the empty field, the “Y” character, and the number in a string will be flagged. Then, you’ll either be notified of these errors, or the data will be blocked for you to deal with accordingly.
How to Create Data Contracts
There are two ways to create data contracts: you create them yourself or leverage a managed solution. Let’s look at both methods.
Custom Data Contracts
The benefit of building your own data contracts is you can create them to the exact specifications of your use case. This allows you to control how you enforce your data contracts and what happens to any flagged data. Custom builds come with a trade-off, though. This can include everything from development time, fixing bugs, schema management, and general maintenance.
You can create the data contracts in dbt, Protobuf, and JSON, but they’re very similar. Here’s an example of a data contract written in JSON.
{
"type": "object",
"properties": {
"cost": {
"type": "number",
"description": "cost of item"
},
"coupon": {
"type": "string",
"description": "discount code"
},
"num_items": {
"type": "number",
"description": "number of items ordered"
}
},
"required": [
"cost",
"num_items"
]
}
This data contract includes three properties: cost, coupon, and num_items, with cost and num_items being numbers and coupon being a string. Each property has a description for clarity and cost, and num_items are required, meaning they must always be captured when the event happens.
Hightouch Events
Hightouch Events is a leading event tracking SDK for web, mobile, and server-side event collection that allows you to centralize your behavioral data in your warehouse. It has a built-in data contract setup and management that takes three simple steps to set up effective data contracts within the Hightouch user interface:
- Create Your Contract: These contracts allow you to plan, manage, and enforce the data you collect through event collection. For each contract, you decide which sources to assign the contract to.
- Specify Your Events: An event can be anything you want to capture (e.g., a form submission, signup, or even an order completion.) This is where you decide the format for each data point you collect. These events can be easily applied across all your environments, so you’re not duplicating work.
- Set Up Your Enforcement Rules: You can decide how to handle undeclared event types, undeclared fields, and invalid fields: either block the data or receive a warning so the event data flow isn’t stopped.
Is There a Difference Between SLAs and Data Contracts
A Service Level Agreement (SLA) defines the level of service expected from a service provider. It documents the metrics that will be measured and the consequences if those standards aren’t met. The main purpose of an SLA is to ensure that both parties have clear expectations for their responsibilities.
In contrast, a data contract is a set of conditions you set up to prevent downstream consumers from having data quality issues, often within one company or service. All data contracts contain specific logic to identify “bad data.” When your data cannot meet these standards, you can flag it and decide whether to amend, delete, leave it, or notify your data engineering team of potential issues.
Why Data Contracts Matter
The main purpose of data contracts is to catch bad events before they corrupt tables in your data warehouse and lead to inaccuracies. Bad data can impact your reporting accuracy and lead to ill-informed decisions that negatively impact your business. If you let your data warehouse become corrupt, resolving this can involve extensive manual work or implementing hacks, such as changing a column to varchar if there's a data mismatch.
Imagine you’re collecting transactional data from an iOS and Android app, but the value of the transaction was a number on iOS and a string on Android. If this data is ingested into your data warehouse, it can lead to query performance issues, errors in an application reading this data, and inaccurate analysis. Data contracts highlight any violations so you can rectify data issues to prevent future problems.
Benefits of Data Contracts
Data contracts act as the frontline defense for data quality in your business, which can lead to improved decision-making, optimized marketing campaigns, time-savings, and more personalized customer experiences.
- Improved Data Confidence: When all your teams trust the data they’re leveraging for data analysis and Data Activation, they have higher confidence, knowing that data is reliable for operational use cases. For example, if you’re analyzing a sum of revenue, you want to be confident you’re querying numbers and not strings that can’t be summed.
- Time Savings: If you had started collecting event data last year and discovered a problem today, fixing it could be time-consuming. Deleting everything and starting from scratch might not be well-received by management. Setting up data contracts before you begin collecting data will save you more time and money in the long run.
- Consistent Customer Experiences: You want all your customers to have a positive experience, and corrupt data can lead to bad experiences for your customers. If you’ve ever seen an email addressed to “first_name,” you’ve experienced this problem firsthand.
Final Thoughts
Data contracts create a union between your data providers and consumers to ensure your data has no discrepancies and align with the schema in your data warehouse. The impact of bad data can ripple throughout your business and ultimately result in negative customer experiences.
If you’re looking for a solution to streamline your event collection and enforce strong data quality, book a demo with one of our solution engineers to learn how you can implement Hightouch Events to automate your event collection and build and manage your data contracts in one reliable interface.