Cloud DevOps Engineer
Job
Compunnel, Inc.
Concord, CA (In Person)
Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
78
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
JOB SUMMARY
This role is responsible for leading the design and implementation of platform and infrastructure architecture for AIML and NLP within a modern hybrid cloud computing environment. The individual will actively participate in daily standups, provide architectural solutions for public and private cloud, and resolve technical blockers. They will collaborate closely with engineering strategy, platform engineering, and development teams to understand infrastructure requirements and drive AIML and GENAI solution delivery to cloud platforms. This position involves leading research and proposing highly available, resilient, and fault-tolerant infrastructure solutions for AIML and GENAI workloads, defining and driving infrastructure and platform roadmaps, and performing hardware and capacity planning. The role also includes providing SME guidance to product and business partners, researching industry best practices, evaluating new technologies, and developing standards to enhance automation and platform resiliency. Key Responsibilities- Lead and design the platform and infrastructure architecture for AIML and NLP in modern hybrid cloud computing.
- Participate in day-to-day standups for infrastructure and platform scrums.
- Provide architectural solutions for public and private cloud.
- Resolve technical blockers for the team.
- Collaborate with engineering strategy, platform engineering, and development teams to understand infrastructure requirements.
- Drive all aspects of AIML and GENAI solution delivery to cloud platforms.
- Lead teams in researching and proposing highly available, resilient, and fault-tolerant infrastructure solutions for AIML and GENAI workloads.
- Define and drive infrastructure and platform roadmaps that align with technology and business strategy.
- Perform hardware and capacity planning, analysis, and forecasts for applications with a focus on highest availability, scalability, performance, and timely delivery.
- Provide SME guidance around infrastructure and platform for product and business partners.
- Research industry best practices, evaluate new technologies, develop standards and engineering best practices.
- Recommend innovative solutions that support automation and improve platform resiliency and fault tolerance of critical applications. Required Qualifications
- 3+ years of experience in leading the design and implementation of grid/cluster computing infrastructure with CPU and GPUs supporting AIML and NLP workloads.
- 3+ years of experience with Azure and/or GCP/GKE.
- 3+ years of experience building complex infrastructure programmatically with IaC tools (Terraform/Ansible etc.).
- 1+ years of experience designing solutions and working with high-performance storage technologies including Object Storage.
- 2+ years of experience working and supporting infrastructure for high throughput and low latency High performance (HPC) computing.
- 2+ years of experience with Elastic Search.
- 1+ years of working with big data (Big Query).
- Thrives in an independent work environment and drives progress proactively.
- Working knowledge and understanding of developing APIs using Python.
- Excellent understanding and working knowledge of cloud computing concepts like Virtual Private Cloud (VPC), landing zone, Identity and Access Management (IAM), App Service Environment, Blueprints, Control Plane etc.
- Excellent verbal, written, and interpersonal communication skills.
- Ability to articulate technical solutions to both technical and business audiences.
- Recent and demonstrated ability to influence management on technical or business solutions.
- Experience with CI/CD, DevOps concepts and SRE principles. Preferred Qualifications
- 1+ years of experience in LLM, Generative AI (developing capabilities or dev/ops).
- Experience in developing APIs on GCP/Azure/API Gateways.
- Experience with data processing technology (Apache Spark etc.).
- Experience with data virtualization technology (Tibco DV, Dremio, etc.).
- Understanding of Agile practices and ability to work with Agile teams to define and track user stories.
- Experience with designing and implementing complex F5 or other Load Balancer Technologies.
- Knowledge and understanding of Cloud computing, PaaS design principles and micro services and k8s containers. Certifications
- Cloud certifications K8s, GCP & Azure preferred.
Similar remote jobs
UnitedHealth Group
Fort Wayne, IN
Posted2 days ago
Updated17 hours ago
Similar jobs in Concord, CA
Benchmark Electronics, Inc.
Concord, CA
Posted2 days ago
Updated17 hours ago
Ethos Veterinary Health
Concord, CA
Posted2 days ago
Updated17 hours ago
Bay Cities Paving & Grading
Concord, CA
Posted2 days ago
Updated17 hours ago
REI (Recreational Equipment Inc.)
Concord, CA
Posted2 days ago
Updated17 hours ago
Save Mart
Concord, CA
Posted3 days ago
Updated2 days ago
Similar jobs in California
KPMG
Irvine, CA
Posted2 days ago
Updated17 hours ago
Freedom Village of Holland Michigan
San Diego, CA
Posted2 days ago
Updated17 hours ago