What is Entity Resolution?
Learn why and how you should deduplicate and consolidate data for the key concepts that drive your business.
Nate Wardwell
May 30, 2023
11 minutes
Common B2B entities include:
- Users
- Teams
- Companies
Why Does Entity Resolution Matter?
Companies need accurate, comprehensive data about the essential entities (such as user accounts) to drive revenue and save costs. A well-known concept that summarizes these benefits is the “1:10:100 rule”. It might cost you $1 to correct a data record as it is ingested. However, waiting to clean that data up later leads to more significant effort that might cost $10–and if you do nothing and never resolve that duplicate data you could miss out on $100, simply from bad decision making and hard-to-use data. Not performing entity resolution is the equivalent of leaving dollars on the table.
Source: https://www.grepsr.com/blog/1-10-100-rule-data-quality/
In addition to driving simple analytics for clear decision-making, having clean data for each entity is also a prerequisite for machine learning (ML) and artificial intelligence (AI) applications. If you want to predict behaviors and outcomes for a specific entity, you’ll need to model your data so that your ML algorithms can digest all the essential information about an entity. This is because entity resolution aligns with feature engineering that ML and AI applications require.
Entity Resolution Use Cases
These examples provide details on a few of the many ways that entity resolution can empower companies to make better decisions and earn more revenue:
- Unify customer records: Whether a “customer” is a person, a household, or an entire company, entity resolution helps you understand the actions your customers are taking. This enables you to personalize experiences with your customers based on their activities to maximize future revenue and avoid churn.
- Unify product records: You can better match the products you sell to your customers by holistically understanding each product’s performance to personalize product offerings and optimize your future products by better understanding the impact of your existing products
- Unify account records: You can understand multiple accounts that each customer may have (e.g., checking and savings accounts if you’re a bank) and analyze the performance of each account separately. You can link these account records to your unified customer records to better serve your customers and treat them like discrete product entities to offer a more cohesive product experience and optimize your overall account offerings.
B2B Entity Resolution Sample Use Case
Entity resolution is essential to understanding how customers interact with your product. It requires resolving every entity and aggregating those entities to both parent and related entities.
For example, the team at Hightouch measures the following entities:
- Product Entities
- Sources
- Models
- Destinations
- Syncs
- User Entities:
- Users
- Accounts
- Workspaces
- Organizations
To fully understand how an organization is using Hightouch, the company needs to perform entity resolution on all of the more granular entities for both the products and for users. Hightouch rolls that product and user information up to the organization entity, which is the level that the company ultimately books deals at. Entity resolution ensures that everyone at the company uses the same terminology and metrics for each level of the entity pyramid, and that there’s a full 360° view of each user, workspace, and organization.
What is an Entity Relationship Diagram?
An Entity Relationship Diagram (ERD) is an entity-relationship model that maps the relationships between different entities you care about. You can use an ERD to make a conceptual plan for the other entities your company cares about and ultimately figure out how you want to use the data from these interrelated entities to inform your record linking.
For example, let’s say you run a business selling plants online. In this case, the primary entity you care about is a customer who can buy your plants. You also will want to measure a separate entity for each plant a customer has purchased. This will allow you to personalize future offers to that user for related plants or products that will help them care for their existing plants.
Finally, you’ll want to tie those individual events that the user has taken back to the user entity. An ERD like the one below shows the entity representation between a user entity, a plant entity, and discrete user events like products viewed on the website.
What’s the difference between Deterministic and Probabilistic Entity Resolution?
Deterministic entity resolution, also known as “rules-based matching,” relies on defining precise table rows that can be used to unify and deduplicate existing records. Deterministic entity resolution is relatively straightforward and quick to implement and works best in simple use cases where your data follows a similar structure. For example, matching records and unifying zip codes on household entities is a good use case for rules-based entity resolution.
Probabilistic entity resolution, or “fuzzy matching,” relies on machine learning, AI, or predictive models to identify and unify entities via record deduplication. For many entity resolution use cases, data can be stored in many different formats and locations, and it would be impossible to define the precise rules to unify records proactively. Most entity resolution at enterprise-scale companies relies on fuzzy matching logic.
How Does Entity Resolution Work?
At a high level, entity resolution is comprised of four steps:
- Ingestion: Ensuring data is accessible to your entity resolution programs or machine-learning models in the same place. Often, unifying data into a data warehouse is the starting place of entity resolution.
- Deduplication: Consolidating any records that are true copies of each other to reduce the complexity and redundancy of each entity.
- Record Linkage: Using rules-based or fuzzy-matching logic from within the remaining data to identify which records relate to the same entity but contain distinct data, such as different interactions on different days.
- Canonicalization: Unifying and consolidating your data from the previously linked records to store all related data points within that entity.
What Companies and Tools Can Help Implement Entity Resolution?
If your company has a robust data team, you can resolve entities directly in your data warehouse. Numerous solutions groups, such as Big Time Data, can also assist with implementation. Hightouch has also built a robust rules-based identity resolution feature that also can solve for any entity you define, allowing users to resolve profiles in a code-free interface within their data warehouse.
Several machine learning algorithms are publicly available to assist with entity resolution, including:
Depending on your use cases or implementation needs, several companies also offer software to assist with entity resolution, including:
Finally, regardless of the state of your underlying data structures, data activation platforms like Hightouch enable you to extract the best value from your data and sync that data to downstream business tools. You can define models from multiple tables with a SQL-based interface or join related models and entities in a no-code schema builder to curate datasets based on linked entities for marketing teams to build audiences.
Final Thoughts
Entity resolution is the foundation that companies will rely on to understand the essential things that they care about, such as customers, households, and products. Whether companies build their custom solutions for entity resolution or leverage third-party algorithms or platforms, they need a 360° view of the entities that drive their business.
Finally, companies need to act on their entity data. Hightouch activates data directly from company data stores to tools that support business use cases. Hightouch can help data teams link entity data from disparate sources and create syncs to 200+ tools that business users rely on. To learn more about how Hightouch can help, talk to a Hightouch Solutions Engineer to build a plan to model and activate your data.