Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Data Reliability Engineer

Job

BridgeView

Littleton, CO (In Person)

Full-Time

Posted 2 weeks ago (Updated 1 week ago) • Actively hiring

Expires 6/23/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Data Reliability Engineer ensures the reliability, stability, and operational excellence of an AWS-based data platform. Owns production data pipelines, monitors SLAs, diagnoses incidents, and implements durable fixes. Collaborates with engineering teams to enhance design and operational practices.

Key Responsibilities:

Own the reliability and stability of production data pipelines and platform services. Define and enforce data SLAs/SLOs for batch and streaming products. Diagnose and resolve pipeline failures, delays, and data quality issues in production. Investigate issues across distributed data systems, including Spark/EMR, ingestion pipelines, and warehouse performance. Lead or support incident response, including triage, mitigation, and long-term resolution. Perform root cause analysis and implement durable fixes to prevent recurrence. Design and enhance monitoring, alerting, and observability for data systems. Develop automation and tooling to reduce operational toil and improve resilience. Contribute to disaster recovery planning, including backup validation and recovery workflows. Partner with engineering teams to improve pipeline design, reliability, and readiness. Create and maintain runbooks, SOPs, and operational documentation. Participate in occasional off-hours support for production data systems when required.

Qualifications:

Bachelor's degree in Computer Science, Information Systems, Data Science, or related field. 5+ years in data engineering or analytics platform roles, with 3+ years operating production cloud data warehouses (Redshift, Snowflake, etc.). 3+ years building AWS data pipelines and managing them through production. 3+ years working with production data platforms in AWS, focusing on anomaly detection, reconciliation, and end-to-end validation. 3+ years experience with Python and SQL in real data systems. Hands-on experience troubleshooting distributed data processing systems such as Spark/EMR, Redshift, and streaming systems. Proven ability to debug and resolve production issues in data pipelines and platforms. Experience with AWS data services (EMR, Redshift, DynamoDB, S3, or similar). Proven ability in handling production incidents and performing root cause analysis. Strong problem-solving mindset and ability to work through ambiguous production issues.

Preferred Skills:

Experience handling real-world data issues such as pipeline delays or failures. Experience with data backfills and reprocessing. Experience influencing or guiding data pipeline reliability and operational practices. Exposure to streaming/event-driven systems (Kafka, Kinesis, CDC patterns).

Qualifications:

You have 5+ years data engineering with AWS production. You troubleshoot Spark/EMR pipelines and ensure data reliability. You design monitoring, alerts, and automation for data platforms