Member of Technical Staff - Infrastructure Engineer

Job

Black Forest Labs

San Francisco, CA (In Person)

Full-Time

Posted 03/08/2026 (Updated 7 weeks ago) • Actively hiring

Expires 5/27/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Member of Technical Staff - Infrastructure Engineer Black Forest Labs Other Engineering, IT San Francisco, CA, USA Posted on Sep 18, 2025 Apply now Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1 , is currently looking for a strong candidate to join us in developing and maintaining our large GPU training clusters. Role & Responsibilities Design, deploy, and maintain large-scale ML training clusters running SLURM for distributed workload orchestration Implement comprehensive node health monitoring systems with automated failure detection and recovery workflows Partner with cloud and colocation providers to ensure cluster availability and performance Establish and enforce security best practices across the ML infrastructure stack (network, storage, compute) Build and maintain developer-facing tools and APIs that streamline ML workflows and improve researcher productivity Collaborate directly with ML research teams to translate computational requirements into infrastructure capabilities and capacity planning Required Experience Production experience managing SLURM clusters at scale, including job scheduling policies, resource allocation, and federation Hands-on experience with Docker, Enroot/Pyxis, or similar container runtimes in HPC environments Proven track record managingGPU clusters, including driver management and DCGM monitoring Preferred Qualifications Understanding of distributed training patterns, checkpointing strategies, and data pipeline optimization Experience with Kubernetes for containerized workloads, particularly for inference or mixed compute environments Experience with high-performance interconnects (InfiniBand, RoCE) and NCCL optimization for multi-node training Track record of managing 1000+ GPU training runs, with deep understanding of failure modes and recovery patterns Familiarity with high-performance storage solutions (VAST, blob storage) and their performance characteristics for ML workloads Experience running hybrid training/inference infrastructure with appropriate resource isolation Strong scripting skills (Python, Bash) and infrastructure-as-code experience Apply now See more open positions at Black Forest Labs

Similar remote jobs

Job
Strategic Account Executive
CH
CVS Health
Massachusetts
Posted1 day ago
Updated10 hours ago
Job
Staff Engineer, Storage Engine
C
CoreWeave
New York, NY
Posted1 day ago
Updated10 hours ago
Job
Licensed Therapist (LCSW, LPC, LMFT) - Norfolk, VA
LH
LifeStance Health
Norfolk, VA
Posted1 day ago
Updated10 hours ago
Job
Associate Principal Scientist, Mass spectrometry/Separations, AR&D
C
CenterWell
Posted1 day ago
Updated10 hours ago
Job
Pharmacy Verification Technician
AP
Allivet Pet Pharmacy - Miami Lakes, FL
Miami Lakes, FL
Posted1 day ago
Updated10 hours ago

Similar jobs in San Francisco, CA

Job
Server
TI
The Italian Homemade Company
San Francisco, CA
Posted1 day ago
Updated10 hours ago
Job
Nursing - RN - Wound Care Nurse
IH
Ingenovis Health
San Francisco, CA
Posted1 day ago
Updated10 hours ago
Job
Senior Consultant, Advisory Services
RG
Resources Global Professionals
San Francisco, CA
Posted1 day ago
Updated10 hours ago
Job
HVAC Retro Controls Commissioning Manager
EL
ENFRA LLC
San Francisco, CA
Posted1 day ago
Updated10 hours ago
Job
Senior Director of Sales & Revenue - AI & GTM Technology
O
Okta
San Francisco, CA
Posted1 day ago
Updated10 hours ago

Similar jobs in California

Job
Hardware Systems Design Electrical Engineering Manager - iPhone
AI
Apple, Inc.
Cupertino, CA
Posted1 day ago
Updated10 hours ago
Job
Scientist Computational Physicist (Software Focus)
GA
General Atomics
San Diego, CA
Posted1 day ago
Updated10 hours ago
Job
Hospice Liaison
OH
OPTIMAL HEALTH SERVICES INC
Fresno, CA
Posted1 day ago
Updated10 hours ago
Job
Server
TI
The Italian Homemade Company
San Francisco, CA
Posted1 day ago
Updated10 hours ago
Job
Nursing - RN - Wound Care Nurse
IH
Ingenovis Health
San Francisco, CA
Posted1 day ago
Updated10 hours ago