Tallo logoTallo logo

Infrastructure Architect

Job

Amiseq Inc.

San Jose, CA (In Person)

Full-Time

Posted 3 days ago (Updated 11 hours ago) • Actively hiring

Expires 6/8/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
75
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Role Overview We are looking for a Principal Infrastructure Architect to join our IT PMO organization to take responsibility and lead the design, orchestration, and lifecycle management of our next-generation GPU Farm and AI Factory environments. This role is unique in its breadth, requiring a deep understanding of high-performance AI compute stacks alongside the disciplined management of physical data center assets and their long-term operational health. You will bridge the gap between R D engineering requirements and the physical realities of global data center operations. Key Responsibilities
AI & GPU
Infrastructure Design (GPU Farm / AI Factory) Lead the architectural design and refinement of the Nutanix GPU-as-a-Service (GPUaaS) platform, ensuring a seamless experience for internal R D, QA, and Sales teams. Provide technical leadership in some of the key initiatives such as Nutanix Validated Designs (NVD) for the AI Factory, incorporating
NVIDIA MGX/HGX
architectures and high-density Cisco nodes (e.g., UCS 845A). Architect the Management Cluster control plane (NKP, Prism Central, NuDeploy) to ensure it is decoupled from GPU compute nodes for maximum efficiency. Implement policy-driven placement of workloads across on-prem and cloud-burst environments. Data Center Asset & Lifecycle Management Design solution for a centralized Data Center Asset Inventory system, ensuring real-time visibility into all hardware assets, including CPUs, GPUs, Virtual Machines, and networking. Develop a comprehensive Hardware Lifecycle Management strategy, including procurement forecasting, "rack and stack" operationalization, and decommissioning of legacy systems (G3/G4/G5). Lead "Tiger Team" initiatives to navigate supply chain constraints, ensuring critical release milestones are not delayed by hardware shortages. Enforce strict Security Standards for Data Center HW Provisioning. Implement network segmentation for all the critical applications. Ensure all infrastructure meets SOC 2 and
ISO 27001
compliance objectives while maintaining low-latency performance. Special Projects Provide required architecture and designs during the project intake process. Review, guide the teams for right architecture for all demands before they become approved projects. Partner with security team and provide guidelines for upcoming projects. Involve and lead projects as an architect on special projects. Required Qualifications Bachelor's degree in Information Technology, Business, or a related field 5+ years of experience in Data Center projects in an enterprise environment Knowledge of Cisco, Dell, HPE, Supermicro hardware.
Hardware Expertise:
Deep knowledge of Cisco
HW, NVIDIA GPU
architectures (H100, B200, RTX 6000 Pro) and high-speed interconnects (RoCE v2, InfiniBand).
Infrastructure Mastery:
Extensive knowledge and experience with Data Center infrastructure.
Management Tools:
Proficiency with asset management and automation tools (Netbox, ServiceNow, Terraform, or OpenTofu).
Lifecycle Mgmt & Capacity Planning:
Experience in Data Center lifecycle mgmt, DC HW capacity planning, decommissioning, defragmentation, building complex financial showback models for shared infrastructure.
AI/ML Ops:
Proven expertise in Kubernetes (NKP preferred) and
NVIDIA AI
Enterprise stacks (GPU Operator, DCGM, Triton, vLLM). Preferred Qualifications Experience managing (as an architect) massive-scale data center environments (1,000+ nodes). Knowledge of Nutanix Cloud Infrastructure (NCI), AHV, and Prism Central Strong background in MLOps and automated pipeline integration (Kubeflow/MLflow).

Similar remote jobs

Similar jobs in San Jose, CA

Similar jobs in California