Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Site Reliability Engineer (SRE)

Job

Robert Half

Novi, MI (In Person)

Full-Time

Posted 1 week ago (Updated 6 days ago) • Actively hiring

Expires 6/28/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
98
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

We are looking for a Site Reliability Engineer (SRE) to support reliable, high-performing production systems for automotive operations clients. This position focuses on strengthening service stability across edge and cloud environments through automation, observability, and disciplined operational practices. The role works closely with engineering and technical stakeholders to improve uptime, manage incidents, and deploy changes safely in real-time manufacturing settings.
Responsibilities:
  • Maintain dependable and secure production environments across plant-edge and cloud-based systems, with a focus on uptime, responsiveness, and operational stability.
  • Design, refine, and support monitoring dashboards, alerting frameworks, and operational runbooks using tools such as Prometheus, Grafana, and modern telemetry solutions.
  • Build and manage infrastructure through code using Terraform, applying version control standards, peer reviews, and controlled deployment processes.
  • Create automation scripts and lightweight tools in Bash and Python to streamline routine operations, recovery procedures, backup workflows, and environment setup.
  • Take part in incident response and on-call coverage, troubleshoot service disruptions, coordinate initial communication, and document follow-up actions through blameless reviews.
  • Establish and measure service reliability indicators and objectives, helping stakeholders balance system dependability with release speed and operational risk.
  • Support secure connectivity between factory networks and cloud resources by configuring and maintaining VPNs, routing, private networking, and access controls.
  • Administer and optimize relational or time-series databases, including backup planning, replication, performance tuning, and long-term storage health.
  • Contribute to CI/CD delivery practices by improving deployment pipelines, supporting controlled release strategies, and preparing rollback procedures when needed.
  • Partner with controls, software, and data teams to enable reliable data flow from industrial systems and ensure safe deployment to edge infrastructure.