Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Data Center Technician

Job

SumasEdge Corporation

Dallas, TX (In Person)

Full-Time

Posted 4 days ago (Updated 1 day ago) • Actively hiring

Expires 7/26/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Operations focused Systems Administrator / Systems Engineer (more of a Data Center Technician actually) supporting a large-scale bare-metal server environment (~17,000 servers) with a heavy emphasis on CPU and GPU compute availability. This role is centered on reliability, automation, and operational excellence digging into systems and pipelines when things break and improving them so they break less often. This is not hands-on data center work. What you ll be doing Administer and support large-scale bare-metal server infrastructure, primarily HPE and Dell platforms Perform server break/fix troubleshooting including hardware faults, firmware/BIOS/BMC issues, POST failures, degraded components, and system instability Manage server lifecycle operations: onboarding, provisioning, firmware updates, BIOS/BMC configuration, and hardware refresh kits Own incident response and break/fix workflows while maintaining 98%+ compute availability SLAs Work cross-functionally with Data Center and Networking teams during hardware incidents, including ticket creation, repair coordination, and log collection Interface directly with HPE and Dell vendors: gathering diagnostics, sending logs, driving RMAs, and tracking issues through resolution Support and troubleshoot CI/CD and automation pipelines used for server provisioning, configuration, and lifecycle management Dig into automation code and workflows (Ansible, scripts, pipelines) when jobs fail to understand root cause and unblock deployments Identify recurring operational issues and contribute to process improvements, runbooks, and reliability enhancements Help manage and reduce the operations backlog, prioritizing fixes, cleanup, and automation improvements

Must Have:

Hands-on experience supporting HPE and Dell servers in production, including break/fix and hardware incident troubleshooting Experience with HPE iLO, Dell iDRAC, and related BMC environments Strong understanding of server hardware components (CPU, GPU, memory, disks, NICs, power) and common failure modes Experience troubleshooting automation and CI/CD pipelines that manage infrastructure (not just running them, but fixing them when they fail) Operational mindset with experience owning incidents, SLAs, backlog items, and process improvements Automation experience with Ansible, Bash, Jenkins, or similar tooling Exposure to GPU-dense, HPC, or high-performance compute environments Experience improving runbooks, reducing toil, and scaling operations through automation