
A large enterprise is in the middle of a major modernization initiative, moving from a legacy data warehouse and ETL environment to a modern cloud-based platform leveraging dbt, SQL pipelines, and Airflow. Hundreds of models and tables will be transitioned within the next few months, and the engineering team needs additional hands to ensure the new environment is correct, consistent, and ready for production use.
This role is focused entirely on data validation, reconciliation, certification, and delivery assurance—ensuring the new system behaves exactly as expected before go-live.
You will join a dedicated migration team responsible for validating that newly developed data models are accurate reproductions of their legacy counterparts. The focus is on delivering fast, reliable validation coverage—without sacrificing confidence in production data.
This engagement requires strong SQL skills, deep understanding of data quality best practices, and the ability to automate validation quickly and pragmatically.
Develop and execute a repeatable data validation framework that includes:
Table-level row and record count checks
Aggregate and metric comparisons
Key field and column-level matching
Targeted record sampling and side-by-side diffs
Write and run SQL test cases that confirm data accuracy, completeness, and fidelity.
Build lightweight automation using tools such as:
dbt tests
SQL scripts
Python notebooks
Data-diff utilities
Collaborate closely with engineers to:
Understand legacy transformation logic
Communicate discrepancies quickly
Align on remediation timelines
Maintain structured reporting on:
Validation progress
Defects and issue owners
Turnaround and release readiness
Produce clear documentation that helps downstream analysts trust the new data environment.