Site Reliability Engineer - W2 Role
Job
Info Dinamica Inc
Palo Alto, CA (In Person)
Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Role:
Site Reliability Engineer (SRE)Location:
Palo Alto, CA (Onsite from Day 1)Job Type:
Contract (W2)Skill Matrix:
Name Required Programming Yes SRE Yes Grafana Yes Prometheus Yes AWS Yes Cloud Infrastructure Yes Linux Yes UNIX Yes Top skills required for this role:Programming:
Proficiency in languages like Python, Java, or Go.System Administration:
Strong understanding of Linux/Unix systems.Cloud Infrastructure:
Experience with AWS Infrastructure as Code (IaC): Knowledge of tools like Terraform or Ansible.Monitoring Tools:
Proficiency with tools such asPrometheus, Grafana, or Datadog Job Description/ Responsibilities:
Automation and Tooling:
SREs write code to automate operational tasks, such as provisioning, configuration changes, and system updates to reduce manual work and human error.System Monitoring and Alerting:
Developing and maintaining observability stacks (logs, metrics, tracing) to proactively detect issues before they impact users.Incident Response and On-Call:
Managing 24/7 on-call rotation to respond to, troubleshoot, and resolve production incidents. Post-Incident Reviews (Postmortems): Conducting blameless, in-depth reviews of incidents to identify root causes and implement preventive measures.Capacity Planning:
Analyzing system resource utilization to ensure infrastructure can scale to handle future load requirements.Performance Optimization:
Identifying and fixing bottlenecks in software and infrastructure to improve system efficiency and responsiveness.Error Budget Management:
Setting and managing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to determine if a service is reliable enough to allow new feature deployments.Chaos Engineering:
Testing system resilience by intentionally introducing failures to ensure systems are fault-tolerantYears of Experience:
8+ Years of ExperienceSimilar remote jobs
Volkswagen Group DE
Ashburn, VA
Posted2 days ago
Updated1 day ago
Similar jobs in Palo Alto, CA
Swickard Palo Alto II, LLC d/b/a Mercedes-Benz of Palo Alto
Palo Alto, CA
Posted2 days ago
Updated1 day ago
Veterans Health Administration
Palo Alto, CA
Posted2 days ago
Updated1 day ago