Skip to main content
Tallo logoTallo logo

RESEARCHER, POST-TRAINING

Job

MakerMaker

San Francisco, CA (In Person)

Full-Time

Posted 4 days ago (Updated 17 hours ago) • Actively hiring

Expires 6/20/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
76
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

RESEARCHER, POST-TRAINING
MakerMaker San Francisco, CA Job Details Full-time 11 hours ago Qualifications Statistical analysis Machine learning Full Job Description
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You'll lead our work on model post-training: supervised fine-tuning, preference data, reinforcement learning from human and AI feedback, reward modeling, and the evaluation suites that tell us what's actually working. You'll own a research area that meaningfully shapes our model behavior and capability. This is a hands-on senior research role. You'll set direction, run experiments, and ship into production. You'll partner with the data, infrastructure, and engineering teams to make the post-training pipeline reliable and fast: improvements there compound into every model we ship.
WHAT YOU'LL DO
Lead post-training research: SFT, RLHF/RLAIF, RLVR, DPO and successor methods, reward modeling, preference data design Design and curate the data that goes into post-training (from sourcing, to filtering, to quality assessment) Build and maintain the evaluation suites that measure what matters; resist Goodharting your own benchmarks Run rigorous experiments (controls, ablations, statistical significance) and write up internal findings clearly Scale data pipelines and the infrastructure team to scale training Identify and characterize failure modes (reward hacking, distribution drift, eval saturation) and design experiments to address them Stay current on the post-training literature; bring useful methods in, ignore the noise
WHAT WE'RE LOOKING FOR
Strong track record of post-training research (SFT, RL, reward modeling) at a frontier-model lab or equivalent 5+ years of hands-on ML research experience Comfort with large-scale data curation and preference-data pipelines Experience designing evaluation suites for capabilities that aren't easily benchmarked Fluent in PyTorch or equivalent; comfortable at the scale of distributed training Strong statistical instincts: you'd notice a flawed comparison before someone else points it out Strong written communication
NICE TO HAVE
PhD in ML, statistics, CS, or adjacent Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues Experience with reward hacking detection, scaling reward models, or RLHF infrastructure Synthetic data generation experience Background in RL math (policy gradients, importance sampling, off-policy methods) Open-source contributions to post-training infrastructure
THIS ROLE IS PROBABLY NOT FOR YOU IF
- You're primarily interested in pretraining (that's a different role)- You'd rather invent novel methods in isolation than ship them into a model that real users run You prefer benchmarks that are stable to evaluation work where the right answer isn't yet defined

Similar jobs in San Francisco, CA

Similar jobs in California