New metadata-focused integrations enable data teams to detect, resolve, and prevent data quality issues across data lake and data lakehouse environments.
Monte Carlo, the data reliability company, today announced integrations with Delta Lake and Databricks’ Unity Catalog, becoming the first provider of end-to-end data observability across these data lake and lakehouse environments, down to the BI layer.
Traditionally, data lakes held raw data in its native format and were known for their flexibility, speed, and open source ecosystem. By design, data was less structured with limited metadata and no atomicity, consistency, isolation, and durability (ACID) properties.
As a result, data quality has been particularly challenging for data lake environments as they often hold large amounts of unstructured data, making data issues challenging to detect, resolve and prevent.
Delta Lake and Unity Catalog enable Databricks users to add more structure and metadata to their data lake and lakehouse deployments, which can now be leveraged by the Monte Carlo data observability platform to automatically detect data freshness, volume, and schema anomalies across structured and unstructured in their environment via machine learning.
Additional opt-in monitors can provide more granular and customized coverage for key assets and critical tables – monitoring data distributions and statistics.
“With Monte Carlo, my team is better positioned to understand the impact of a detected data issue and decide on the next steps like stakeholder communication and resource prioritization. Monte Carlo’s end-to-end lineage helps the team draw these connections between critical data tables and the Looker reports, dashboards, and KPIs the company relies on to make business decisions,” said Satish Rane, head of data engineering, ThredUp. "I'm excited to leverage Monte Carlo's data observability for our Databricks environment."
With these integrations, Databricks customers can now:
- Achieve end-to-end data observability across the lake or lakehouse. Get end-to-end data observability for Databricks data pipelines with a quick, no-code implementation process. Access out-of-the-box visibility into data freshness, volume, distribution, schema, and lineage just by plugging Monte Carlo into Databricks metastores, Unity Catalog, or Delta Lake.
- Know when data breaks, as soon as it happens. Monte Carlo continuously monitors your Databricks assets and proactively alerts stakeholders to data issues. Monte Carlo’s machine learning-first approach gives data teams broad coverage for common data issues with minimal configuration, and business-context-specific checks layered on top ensure coverage at each stage of the data pipeline.
- Find the root cause of data quality issues, fast. Monte Carlo gives teams a single pane of glass to investigate data issues, drastically reducing time to resolution. By bringing all information and context for pipelines into one place, teams spend less time firefighting data issues and more time improving the business.
“Metadata is a data lake’s secret weapon, and Monte Carlo is thrilled to be partnering with Databricks to help our mutual customers take advantage of it and bring their data reliability to the next level,” said Lior Gavish, co-founder and CTO, Monte Carlo. “When you combine the performance and flexibility of the data lake with high levels of data trust, it becomes a powerful foundation from which data teams can launch incredible projects and data products.”
Later this year, Monte Carlo plans to introduce support for end-to-end field-level Spark lineage–which maps how data assets are connected within the Databricks environment so teams can gain full visibility into their pipelines for root cause analysis and how they impact downstream reports and dashboards.
To learn more about Monte Carlo’s data lake integrations visit our developer hub or visit our website.
About Monte Carlo
As businesses increasingly rely on data to drive better decision making, it’s mission-critical that this data is accurate and reliable. Billed by Forbes as the New Relic for data teams and backed by Accel, GGV Capital, Redpoint Ventures, ICONIQ Growth, Salesforce Ventures, GIC Singapore, and IVP. Monte Carlo solves the costly problem of broken data through their fully automated, SOC-2 Type II certified Data Observability platform.
View source version on businesswire.com: https://www.businesswire.com/news/home/20220628005261/en/
Contacts
Molly Vorwerck
(949) 230-4860
mvorwerck@montecarlodata.com