RESEARCHER, EFFICIENT INFERENCE

Job

MakerMaker

San Francisco, CA (In Person)

Full-Time

Posted 5 days ago (Updated 2 days ago) • Actively hiring

Expires 6/21/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

RESEARCHER, EFFICIENT INFERENCE

MakerMaker San Francisco, CA Job Details Full-time 11 hours ago Qualifications Scientific publications Statistical analysis Machine learning libraries Research & development Machine learning frameworks Prototypes Full Job Description

ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

You'll be researching making models efficient: quantization, speculative decoding, sparse and structured attention, distillation, mixture-of-experts inference, and the training-time techniques that make those methods possible. The work spans algorithm design, careful evaluation, and pushing methods to where they actually run. This is a senior research role with a clear engineering edge. You'll spend time at the intersection of model architecture and inference performance, designing methods that move accuracy/latency/cost trade-offs in our favor (then partnering with engineers to make those wins real in production).

WHAT YOU'LL DO

Research and develop quantization methods: post-training quantization, quantization-aware training, mixed-precision regimes, low-bit-width arithmetic Design and evaluate speculative decoding approaches: draft models, tree attention, parallel speculation, lookahead decoding Investigate training-time efficiency methods that compose well with inference: distillation, sparse attention, mixture-of-experts, low-rank adaptation, pruning Run controlled experiments at production scale; characterize what works on real workloads, not just toy benchmarks Co-design methods with the inference engineering team: push results to where they actually run, not stop at the paper Read deeply across the efficient ML / efficient inference literature; translate the most useful ideas into our stack Publish when the work warrants it; share findings internally Partner with model and training researchers so efficiency choices align with model architecture and post-training decisions

WHAT WE'RE LOOKING FOR

Strong track record of ML research on efficiency methods: quantization, speculative decoding, distillation, MoE, sparse attention, or adjacent 5+ years of hands-on research experience Deep familiarity with both training and inference performance characteristics Fluent in PyTorch, Jax or equivalent; comfortable working at the kernel and serving-framework level when methods require it Track record of moving efficiency research from prototype to production Strong statistical expertise: you'd notice a flawed comparison before someone else points it out Strong written communication Published research at NeurIPS, ICML, ICLR, MLSys, or comparable venues

NICE TO HAVE

PhD in ML, systems, or related field Open-source contributions to quantization, speculative-decoding, or efficient-inference libraries Experience with hardware-aware optimization and accelerator-specific tooling Background in numerical methods, low-precision arithmetic, or approximate computation

THIS ROLE IS PROBABLY NOT FOR YOU IF

You want to focus on pretraining large models from scratch (that's a different role) You prefer abstract algorithmic research without hands-on implementation You want a fixed benchmark with stable targets (our targets shift with what our models actually need to do)

Similar jobs in San Francisco, CA

Job
Medical Fellow
D
Doximity
San Francisco, CA
Posted1 day ago
Updated6 hours ago
Job
Founding AE
C
CyberCoders
San Francisco, CA
Posted1 day ago
Updated6 hours ago
Job
PM Dishwasher Part time
BM
Bacchus Management Group
San Francisco, CA
Posted1 day ago
Updated6 hours ago
Job
Sr. Principal Security Researcher- Advanced Threat Prevention (ATP)
PA
Palo Alto Networks
San Francisco, CA
Posted1 day ago
Updated6 hours ago
Job
Product Marketing Manager, Workspace Security
CP
Check Point Software Technologies Ltd.
San Francisco, CA
Posted1 day ago
Updated6 hours ago

Similar jobs in California

Job
Physical Therapist (PT) - URGENT NEED Home Health - Per Diem- High Rate
PS
PLOV Solutions
Thousand Oaks, CA
Posted1 day ago
Updated6 hours ago
Job
Guest Room Attendant
CH
Crescent Hotels & Resorts LLC
San Jose, CA
Posted1 day ago
Updated6 hours ago
Job
CNC Set Up/Operators - 2nd Shift
AG
AtWork Group
Baldwin Park, CA
Posted1 day ago
Updated6 hours ago
Job
Security Patrol Driver Shopping Mall
AU
Allied Universal
Lake Forest, CA
Posted1 day ago
Updated6 hours ago
Job
Memory Care Assistant
FP
Front Porch
San Diego, CA
Posted1 day ago
Updated6 hours ago