Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Job Description An Insight Global Fortune 500 client is seeking an experienced Site Reliability Engineer (SRE) to support enterprise-scale systems deployed across Google Cloud Platform (GCP) and on-premise/in‑store environments. This role is not focused on application development or coding, but instead centers on deployment support, observability, reliability, and operational excellence. The SRE will be embedded within the development lifecycle, partnering closely with engineering teams to ensure systems are resilient, reliable, and production‑ready. The ideal candidate is highly self-sufficient, leverages AI tools to accelerate troubleshooting and operational decision-making, and brings strong enterprise experience supporting complex, distributed environments. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.
To learn more about how we collect, keep, and process your private information, please review
Insight Global's Workforce Privacy Policy:
https://insightglobal.com/workforce-privacy-policy/. Skills and Requirements
4-6 years of experience working as a Site Reliability Engineer Hands-on experience supporting deployments to: ○ Google Cloud Platform (GCP) ○ On‑premise or in‑store server environments
No application coding responsibilities ○ Primary focus is on deployment support, configuration, validation, and building dashboards
Proven ability to:
Validate and test deployments to ensure production readiness
Confirm changes meet reliability and resiliency standards before release
Deep knowledge of:
Observability, telemetry, and monitoring
Resiliency, reliability, and system health validation
Experience with incident management, including detection, response, and resolution
Ability to assess and verify that infrastructure and deployment changes are stable and reliable
Comfortable being embedded within the development lifecycle, collaborating with engineering teams from pre‑deployment through post‑release
Demonstrated ability to leverage AI tools to solve traditional SRE/operational problems independently (high level of self‑sufficiency) Experience operating in enterprise-scale environments with complex systems and multiple stakeholders
Retail industry experience Leveraging AI in the SRE cycle