Data & Analytics

Soda

4.38

Soda launched in 2020 out of Brussels, Belgium, with an open-source approach to data quality that resonated with the dbt-centric data community. The company’s core insight was simple: data quality checks should live alongside your data transformations, written in a language that both engineers and analysts can read.

Soda’s open-source tool, Soda Core, introduced SodaCL (Soda Checks Language) — a human-readable YAML-based language for defining data quality checks. Instead of writing SQL assertions or building complex testing frameworks, you write declarations like “row_count > 0” or “invalid_percent(email) < 5%." These checks run against your data warehouse on a schedule or as part of CI/CD pipelines. Soda Cloud is the commercial platform that adds a management layer on top of the open-source engine. It provides a web interface for monitoring check results, managing incidents, tracking data quality over time, and collaborating across teams. Automated anomaly detection watches metrics without requiring explicit thresholds. Data contracts let teams define SLAs between data producers and consumers. The platform connects to all major data warehouses and lakes: Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, and more. Integration with dbt is particularly tight — Soda checks can run as part of dbt workflows, validating data quality after each transformation. Soda targets data engineering teams that want data quality embedded in their development workflow rather than bolted on as an afterthought. The open-source foundation means teams can start for free and adopt the commercial platform when they need collaboration and governance features. The Brussels-based team has grown quickly, backed by venture funding and a community of data professionals frustrated with discovering data problems from angry stakeholder messages.

Tech Pioneers