Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Infrastructure Technical Specialist

Job

Acasia Operations, Inc.

Phoenix, AZ (In Person)

$112,500 Salary, Full-Time

Posted 3 days ago (Updated 15 hours ago) • Actively hiring

Expires 7/7/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Infrastructure Technical Specialist Acasia Operations, Inc. Phoenix, AZ Job Details Full-time $100,000 - $125,000 a year 4 hours ago Benefits Health insurance Dental insurance 401(k) Vision insurance 401(k) matching Qualifications Customer communication Data center experience Safety protocol adherence Attention to detail Computer hardware Network routing Linux Network infrastructure management

Full Job Description Infrastructure Technical Specialist Compensation:

Base:

$100k-$125k

Bonus:

15%

Benefits:

Health, Dental, Vision, 401k

Location:

Onsite. Phoenix, Arizona About Acasia Acasia builds and operates GPU infrastructure for enterprise AI workloads. We help customers access high-performance compute by deploying, managing, and supporting GPU clusters in data center environments. As demand for AI infrastructure grows, execution quality matters. Customers expect GPU environments that are installed correctly, configured cleanly, monitored actively, and maintained with urgency. This role sits on the front lines of that promise. Role Summary The Infrastructure Technical Specialist will help deploy, maintain, troubleshoot, and support Acasia's GPU infrastructure inside data center environments. This is a hands-on technical operations role responsible for physical installation, hardware maintenance, system configuration support, basic network troubleshooting, incident response, documentation, and customer SLA support. This person should be comfortable working in data centers, handling technical equipment, following strict procedures, and communicating clearly during high-pressure operational issues. Key ResponsibilitiesGPU Infrastructure Deployment Assist with the physical deployment of GPU servers, networking equipment, cabling, racks, PDUs, and related infrastructure. Rack, stack, cable, label, and validate equipment according to Acasia standards. Support equipment receiving, inventory tracking, staging, burn-in, and deployment readiness checks. Coordinate with data center personnel, vendors, internal engineering teams, and logistics partners during installations. Follow deployment runbooks and escalate deviations or blockers quickly. Maintenance & Troubleshooting Diagnose and troubleshoot server, GPU, power, cabling, storage, network, and connectivity issues. Replace or coordinate replacement of failed components, including GPUs, NICs, drives, memory, power supplies, cables, fans, and other server parts. Support firmware, BIOS, driver, and operating system configuration tasks under engineering guidance. Assist with routine preventative maintenance and infrastructure health checks. Document root causes, corrective actions, and recurring issues. Customer SLA Support Respond quickly to infrastructure incidents that may impact customer uptime, performance, or availability. Support escalation workflows for customer-impacting issues. Communicate status updates clearly to internal stakeholders. Help maintain service reliability by following incident response procedures and documenting actions taken. Participate in after-hours or on-call support as needed. Configuration & Validation Perform basic configuration and validation tasks for servers, GPUs, networking, storage, and monitoring tools. Run diagnostic tests to confirm hardware and system readiness. Validate that deployed infrastructure meets required standards before customer handoff. Assist engineering teams with cluster bring-up, node validation, and performance checks. Maintain accurate records of asset status, serial numbers, rack locations, configurations, and changes. Process & Documentation Follow standardized runbooks, checklists, and change control processes. Create and update technical documentation, deployment notes, incident logs, and maintenance records. Identify gaps in processes and recommend improvements. Help build repeatable operational workflows as Acasia scales deployments across multiple data centers. Required Qualifications 2+ years of experience in data center operations, IT infrastructure, hardware support, network operations, systems administration, or technical field support. Hands-on experience with servers, racks, cabling, switches, power distribution, and hardware troubleshooting. Basic understanding of Linux systems. Basic understanding of networking concepts such as IP addressing, VLANs, DNS, DHCP, routing, switching, and firewalls. Strong attention to detail and ability to follow technical procedures. Ability to work in data center environments, including lifting equipment, standing for extended periods, and following safety/security procedures. Strong written and verbal communication skills. Ability to work under pressure during incidents or customer-impacting outages. Willingness to travel to data center sites as needed. Preferred Qualifications Experience with GPU servers, AI/HPC infrastructure, or high-density compute environments. Experience with NVIDIA GPUs, CUDA environments, NVIDIA drivers, InfiniBand, RoCE, or high-performance networking. Experience with Supermicro, Dell, HPE, Lenovo, ASUS, Gigabyte, or other enterprise server platforms. Familiarity with data center tools such as DCIM systems, IPMI/BMC, iDRAC, iLO, Redfish, or remote management platforms. Experience with monitoring tools, ticketing systems, and incident management workflows. Familiarity with Kubernetes, Slurm, Docker, virtualization, or cloud infrastructure. Certifications such as CompTIA Network+, Server+, Linux+, CCNA, or equivalent practical experience.

Pay:

$100,000.00 - $125,000.00 per year

Benefits:

401(k) 401(k) matching Dental insurance Health insurance Vision insurance

Work Location:

In person