Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

HPC Cluster & Scheduler Management consultant/Fremont, CA/ Tualatin, OR

Job

E-Solutions Inc.

Fremont, CA (In Person)

Full-Time

Posted 1 week ago (Updated 16 hours ago) • Actively hiring

Expires 7/4/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
59
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

HPC Cluster & Scheduler Management consultant/Fremont, CA/ Tualatin, OR (Fremont, CA, 94555) | 05/12/26 Job Description Role- HPC consultant Location- Fremont, CA/ Tualatin, OR HPC Cluster & Scheduler Management Design, configure, tune, and optimize SLURM partitions, queues, QoS, and scheduling policies to maximize cluster utilization and workload efficiency. Perform in-depth analysis of job scheduling behavior, bottlenecks, and resource contention. Troubleshoot job failures, performance degradation, and scheduler-related issues in production HPC environments. Implement fair-share, backfill, reservations, and policy-driven scheduling as required. Storage Benchmarking & Procurement Support Lead HPC storage performance benchmarking using industry-standard tools (e.g., IOR, FIO, MDTest, IOzone). Analyze I/O patterns of HPC workloads and map them to appropriate storage architectures (parallel file systems, NVMe, Lustre, Spectrum Scale, etc.). Provide technical input for storage selection and procurement, including performance expectations, sizing, and cost-performance tradeoffs. Collaborate with vendors and internal teams during POCs and performance validation exercises. HPC Application Build & Optimization Build, install, configure, and maintain HPC applications, compilers, libraries, and scientific software stacks. Optimize application performance using MPI, OpenMP, GPU acceleration (where applicable), and tuned math libraries. Support multiple compiler toolchains (GCC, Intel, LLVM, NVIDIA HPC SDK, etc.). Implement and manage environment modules (Lmod) or similar software management frameworks. System Performance & Operations Conduct system-level performance tuning across compute, memory, network, and storage layers. Diagnose node-level issues involving CPU, GPU, interconnects (InfiniBand/Ethernet), and OS configurations. Create operational runbooks, performance baselines, and troubleshooting documentation. Support cluster upgrades, expansions, and hardware refresh activities. Collaboration & Delivery Work closely with application owners, researchers, and infrastructure teams to meet aggressive delivery timelines. Translate workload requirements into practical HPC configurations and optimizations. Provide clear technical guidance and recommendations to leadership and stakeholders. Required Skills & Experience Core HPC Skills 8-12+ years of hands-on HPC engineering experience in production environments. Strong expertise with SLURM (configuration, tuning, troubleshooting). Solid understanding of Linux systems (RHEL/CentOS/Rocky/Alma preferred). Deep knowledge of HPC storage systems and I/O performance analysis. Proven experience building and optimizing HPC applications and libraries. Technical Proficiency MPI implementations (Open
MPI, MPICH
), OpenMP Compilers and toolchains (GCC, Intel, NVIDIA HPC SDK) Performance tools (perf, vtune, nvprof/nsys, IB diagnostics) Environment modules (Lmod), package managers (Spack preferred) Bash/Python scripting for automation and diagnostics Nice to Have Experience with GPU-based HPC workloads (NVIDIA CUDA, ROCm). Exposure to cloud-based HPC (Azure, AWS, GCP). Familiarity with parallel file systems such as Lustre or IBM Spectrum Scale. Vendor engagement experience for HPC hardware/storage evaluations. HPC Cluster & Scheduler Management consultant/Fremont, CA/ Tualatin, OR1Cluster, HPC United States