Onsite SRE Engineer
Job
Litmus7
San Francisco, CA (In Person)
Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Onsite SRE Engineer Litmus7 - 4.8 San Francisco, CA Job Details 2 hours ago Qualifications Performance dashboard reports Spring Boot Dashboard development Incident management Continuous Delivery (CD) implementation Automation Procedural guides Technical documentation DevOps IT system monitoring High availability architecture System design Corrective and preventive actions (CAPA) Scalable systems Improving operational efficiency Compliance management implementation Microservices Incident response Continuous improvement SRE Splunk Mentoring Incident Investigation Scalability Systems & applications support Root cause analysis Distributed computing Senior level Log analysis Communication skills
Full Job Description Key Responsibilities:
Provide production support for Retail Applications and Microservices built using Spring Boot architecture. Ensure high availability, reliability, and performance of business-critical retail systems and services. Apply Site Reliability Engineering (SRE) principles to improve system stability, scalability, and operational efficiency. Define, implement, and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs). Perform real-time monitoring, troubleshooting, and incident resolution for microservices and retail applications. Use Splunk for log analysis, alerting, and operational intelligence to diagnose and resolve production issues. Use Dynatrace for end-to-end application performance monitoring, distributed tracing, and root cause analysis. Investigate performance bottlenecks, latency issues, and system anomalies across microservices architecture. Build and maintain dashboards, alerts, and monitoring strategies for proactive issue detection. Participate in incident management processes, including on-call rotations, major incident response, and post-incident reviews. Conduct root cause analysis (RCA) and implement preventive measures to reduce recurrence of incidents. Work closely with development, DevOps, and infrastructure teams to improve system reliability and observability. Provide technical troubleshooting support to retail store associates and operations teams through calls or remote sessions. Ensure effective communication and coordination during incidents involving multiple teams and stakeholders. Drive automation and operational improvements to reduce manual intervention and improve system resilience. Support CI/CD pipelines and deployment monitoring for microservices applications. Analyze system logs, metrics, traces, and events to identify trends and proactively prevent outages. Document runbooks, troubleshooting guides, and operational procedures for retail application support. Demonstrate proactive learning, continuous improvement, and knowledge sharing within the SRE team. Mentor team members and contribute to best practices for monitoring, observability, and reliability engineering. Collaborate with engineering teams to improve system design for reliability, fault tolerance, and scalability. Ensure compliance with operational standards, security guidelines, and change management processes.Key Skills:
Strong knowledge of SRE principles (SLI, SLO, SLA, Error Budgets). Hands-on expertise in Splunk and Dynatrace monitoring tools. Experience supporting Spring Boot microservices and distributed systems. Strong production troubleshooting and incident management skills. Excellent communication and stakeholder interaction skills. Ability to work in high-pressure production environments and on-call support models #LI-VM1Similar remote jobs
Carrington
Jacksonville, FL
Posted2 days ago
Updated1 day ago
International Foundation of Employee Benefit Plans
Brookfield, WI
Posted2 days ago
Updated1 day ago
Similar jobs in San Francisco, CA
Morton's The Steakhouse
San Francisco, CA
Posted2 days ago
Updated1 day ago
Amazon
San Francisco, CA
Posted2 days ago
Updated1 day ago
Similar jobs in California
W3global
Los Angeles, CA
Posted2 days ago
Updated1 day ago