SRE Engineer Position Available In Grafton, New Hampshire
Tallo's Job Summary: The SRE Engineer position in Hanover, NH focuses on designing, implementing, and running fault-tolerant systems. Requirements include deep expertise in AWS, Azure, and GCP, as well as experience in implementing monitoring solutions and optimizing distributed systems. This role involves driving reliability best practices and collaborating with various teams to balance reliability and feature velocity. The skillset required includes extensive experience in running IAM solutions, disaster recovery, continuous delivery, and hands-on coding in Python, Bash, and JSON/Yaml.
Job Description
1,401 of 10,000 Salary Not Available SRE Engineer
(Not Available)
Location:
Hanover, NH – 03755
Positions available: 1
Job #: 25-31077
Source:
Cynet Systems
Posted:
4/10/2025
Web Site:
www.cynetsystems.com
Job Type:
Full Time (30 Hours or More) Job Requirements and Properties Job Requirements and Properties Job Description Help for Job Description. Job Description Help for Job Description. Designing, implementing, deploying and running highly available, fault-tolerant, auto-scaling and auto-healing systems.
- Deep expertise in AWS, Azure, and GCP, including Kubernetes (EKS, ECS, Fargate, GKE) and server less architectures.
- Implementing advanced monitoring (Prometheus, Grafana, Datadog, ELK), tracing, logging and automated alerting solutions.
- Scaling distributed systems, optimising compute/storage efficiency, and cost management.
- Designing failure simulations to improve system robustness and incident response.
- Expert in AWS CLI, Cloud Formation, Ansible, Helm, and GitOps for automated infrastructure provisioning.
- Driving reliability best practices across engineering teams, embedding SRE principles into the Dev Sec Ops lifecycle.
- Partnering with engineering, security, and product teams to balance reliability and feature velocity.
- Expertise in CIAM, ForgeRock stack (PingGateway, PingAM, PingIDM, PingDS) with certification or proof of completion of ForgeRock Deep-Dive 400 trainings.
- Building and mentoring high-performing SRE teams, fostering a culture of automation and innovation.
- Defining and enforcing reliability metrics to balance innovation with system stability.
- Optimising deployment pipelines for high-frequency, zero-downtime releases.
- Leveraging machine learning for anomaly detection, predictive scaling, and automated remediation.
Skillset Required:
- 5+ years experience in hands-on configuration, deployment and running ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with automated GitOps CI/CD pipelines using GitLab.
- Design and hands-on implementation of GitOps CI/CD pipelines, automated failover, data backup and restore solutions.
- Automating telemetry, dashboards.
- 10+ years experience in Running Disaster Recovery, zero downtime deployment solutions.
- Designing and implementing continuous delivery.
- Hands-on coding in Python, Bash and JSON/Yaml (CaC).
- Supporting large-scale, distributed, cloud-based micro service and API service solutions with 99.
9%+ uptime.