Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Data Quality Engineer (Databricks, Kafka, AWS)

Job

Plugins Inc

Dallas, TX (In Person)

Full-Time

Posted 2 days ago (Updated 6 hours ago) • Actively hiring

Expires 7/4/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

We are looking for a Data Quality Engineer to own validation across batch and streaming data pipelines. This role focuses on ensuring data correctness, reliability, and performance across platforms built on Databricks, Kafka, AWS, SQL, and Python. This is a hands-on role focused on building scalable data validation frameworks and ensuring production-grade data systems. Key Responsibilities End-to-End Data Validation

Validate data pipelines for accuracy, completeness, consistency, and timeliness
Build SQL-based validations for business rules and transformations
Implement reconciliation between source and downstream systems
Ensure data lineage and traceability

ETL / ELT

& Spark Testing

Test pipelines built on AWS (Glue, Lambda, EMR, Step Functions)
Validate transformations using SQL and Python
Test ingestion, transformation, aggregation, and serving layers
Handle backfills, reprocessing, and historical data loads
Validate Spark pipelines (PySpark/Scala) on Databricks Streaming (Kafka)
Validate data integrity, ordering, and delivery guarantees
Test producer and consumer logic and serialization formats (Avro, JSON, Protobuf)
Validate topics, partitions, offsets, retention, and schema evolution
Simulate late events, duplicates, and failure scenarios Automation & Frameworks
Build Python-based data testing frameworks
Develop reusable validation utilities and synthetic datasets
Integrate data tests into CI/CD pipelines
Enable automated alerts for data quality issues Performance & Reliability
Validate throughput, latency, and concurrency at scale
Test retry logic, idempotency, and recovery mechanisms
Perform regression, soak, and failover testing Observability
Validate logs, metrics, and alerts using tools such as CloudWatch, Prometheus, and Grafana
Define and monitor data SLAs and SLOs
Support incident response, root cause analysis, and postmortems Required Qualifications & Experience
7+ years of total experience in QA, SDET, or Data Quality Engineering
Minimum 4-6 years of hands-on experience working with data platforms, data pipelines, or data engineering ecosystems
3+ years of hands-on experience with Databricks and Apache Spark
Strong SQL skills for data validation, reconciliation, and complex analysis
Proficiency in Python for automation and data validation
Experience testing ETL/ELT pipelines (batch and streaming)
Hands-on experience with Kafka or similar streaming platforms
Strong understanding of AWS data services (S3, Glue, Lambda, Redshift, Athena)
Experience working with large-scale distributed data systems
Strong debugging, analytical, and problem-solving skills Nice to Have
Experience with data quality or observability tools such as Great Expectations or Monte Carlo
Knowledge of schema registry and data contracts
Experience with CI/CD tools such as GitHub Actions or Jenkins