Skip to main content
Log inGet a demo

What is Data Curation? (Examples and Use Cases)

Learn the 8-step data curation framework to ensure your data assets are of high quality, usable, and accessible across teams and organizations.

Craig Dennis.

Craig Dennis

May 22, 2023

7 minutes

what is data curation.
  • What: What data is being used?
  • Where: Where is the data located?
  • When: When is it needed? How frequently does it need to be updated? How soon does the report/dashboard need to be created? How often does data need to be synced?
  • Who: Who needs access to the data?
  • Why: Why do they need the data?
  • How: How do they need to access it?
  • The 8 Steps of Data Curation

    The purpose of data curation is remove the complexity out of your data stack so you can maintain end-to-end visibility over each individual component in your data flows. Ultimately, there are eight steps to data curation and each is heavily dependent upon the last.

    • Step 1 - Collection: Gathering data from various sources such as databases, files, or external data providers.
    • Step 2 - Selection: Identifying the relevance and suitability of the data for a particular use case.
    • Step 3 - Validation: Assessing the collected data for its accuracy, completeness, and consistency to suit its intended use.
    • Step 4 - Transformation and Modeling: Shaping the data into a useful format by addressing errors, missing values, and inconsistencies. And merging and aggregating data source into a single cohesive model.
    • Step 5 - Documentation: Creating metadata and documentation describing the data’s characteristics, structure, and meaning of the curated data to help with understanding.
    • Step 6 - Digital Preservation: Implementing strategies such as version control, recovery procedures, and adherence to data governance to safeguard the curated data over time.
    • Step 7 - Access and Sharing: Making relevant data available to stakeholders and users for their role. Access control mechanisms should be taken to protect confidential data.
    • Step 8 - Lifecycle Management: Managing data throughout its lifecycle by updating documentation and conducting quality assurance to keep data relevant and up-to-date with the changing business needs.

    Benefits of Data Curation

    The data curation process can solve your data needs and benefit your business in various ways including:

    • Data Discovery: Data discovery is a process of identifying patterns, relationships, and insights in your data. It helps to understand your data better, so you know what’s needed to power your use cases and find relevant data assets.
    • Data Quality: Data quality is ensuring it fits the requirements and expectations of its intended purpose. The better the quality, the less time is needed to transform the data, so more time can be spent building models to power dashboards and downstream use cases.
    • Automation: Data curation introduces standardized processes and tools you can use to automate various components in your data flows allowing your team to focus on driving outcomes rather than maintaining data.
    • Data Confidence: Data confidence is the level of trust and certainty in the accuracy and relevance of data. When you can trust the data is error-free, consistent, and up-to-date to translates to more confidence.
    • Data Compliance: Knowing that the data you collect is properly managed and organized means you can be confident that you comply with regulatory requirements and data protection laws around HIPAA, GDPR, and CCPA.

    Data Curation Tools

    While data curation can be a challenging problem to tackle on its own, a number of management tools specialize in this exact problem.

    Monte Carlo

    Monte Carlo is the data observability tool that helps increase data trust and reduce data downtime. Monte Carlo helps to give you a 360-degree view of your data ecosystem. It automatically monitors any problems that might arise during digital curation.

    Monte Carlo gives you access to features such as machine learning, data anomaly detection, and data lineage to help find the root of a problem. Monte Carlo can also provide quality insights into your data to prevent poor quality.

    Alation

    Alation is a data catalog tool that can help you organize, understand, and manage your data, bringing better governance to your data. Alation uses automation to help increase the understanding of your data by taking technical terms within your data and providing a business glossary.

    Alation provides a natural language search so anyone in the business can search for data without knowing any technical terms. Alation can speed up curation by making discovering data easier than writing SQL queries and provides everything you need in a user-friendly interface.

    Informatica

    Informatica is a data integration platform that offers a variety of features, one for moderating data catalog content. This product uses the power of artificial intelligence to help with data discovery. Informatica can help discover, inventory, and organize your data and provide you with a single view of all your data.

    Informatica can help locate needed data confidently as it clarifies where data can come from and who owns it. This then makes it easy when required for data analytics and activation.

    Secoda

    Secoda is the data discovery tool that homes all your data in one place, giving you a searchable and collaborative platform for your data. With collecting so much data, it can be tough to know what data exists, how to use it, and if you can trust it. Secoda enables you to answer these questions whether you have the technical knowledge or not.

    Secoda makes searching your data as easy as a Google search, so digital curation gets easier when you can find the data you need.

    dbt

    dbt is a data transformation tool that lets you reliably build, orchestrate, and run SQL-based transformation jobs in your data warehouse. The platform eliminates the need to write ad-hoc SQL, so your teams can operate off of the same coherent models and understand exactly how they relate to one another.

    Final Thoughts

    Implementing a robust data curation framework not only helps you maintain visibility over every component within your data stack, but it allows you to easily understand your entire data lifecycle, from the point your data is collected to the point where it's consumed by your stakeholders. It helps to produce trust and confidence in your data.

    Want to get value from your curated data? Book a demo with Hightouch and find out how you can get fresh, accurate customer data into your business tools in under 23 minutes.

    More on the blog

    • What is Reverse ETL? The Definitive Guide .
  • Friends Don’t Let Friends Buy a CDP.
  • Snowflake

    Marketplace Partner of the Year

    Gartner

    Cool Vendor in Marketing Data & Analytics

    Fivetran

    Ecosystem Partner of the Year

    G2

    Best Estimated ROI

    Snowflake

    One to Watch for Activation & Measurement

    G2

    CDP Category Leader

    G2

    Easiest Setup & Fastest Implementation

    Activate your data in less than 5 minutes