What is Data Observability and How Does it Impact Data Teams?
Learn how data observability can produce better data products and reduce data downtime.
Craig Dennis
February 27, 2023
10 minutes
Monte Carlo
Monte Carlo is the data observability tool that is currently trusted by CNN, Fox, Intercom, and many others. The company's goal is to help increase trust in data and reduce data downtime by detecting freshness, volume, distribution, schema, and data lineage. Some of the features that they have to help with data observability are:
- Set up in 20 minutes and complete field lineage mapping within 24 hours
- Machine learning enabled data anomaly detection and targeted alerting
- Data lineage to assess the impact and fix root causes
- Helps reduce infrastructure costs by identifying high consumption and deteriorating queries
Datafold
Datafold is taking an unique approach to data observablity where is can help you observe code changes can impact the data, so rather than finding out about the data quality issues when it happens, Datafold can show you before there is an issue. One of the key features of Datafold is through data diffing. Data diffing is where it can identify if any changes will alter any values. These changes may not be something that causes a direct problem but might be something you should look into.
Some other features Datafold features are:
- Integrates with dbt Cloud and Core
- Produces impact reports before any deployments to avoid any surprises
- Easy to use UI
- View how changes can impact downstream data applications
Final Thoughts
If you want more confidence in your data products, then a data observability solution is key to capturing any problems that can arise with data. Currently, data observability has a focus on being reactive, letting you know as soon as a problem has occurred.
The future of data observability is working on getting deeper into the data lifecycle, moving from being reactive to proactive. This means you’d get alerts of data problems before they make it to your data warehouse. With more companies using a tool like Hightouch to activate their data, you’d be able to prevent sending poor data quality to your downstream tools, without impacting your customer.