Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Machine Learning Engineer - AI & ML Evaluation Frameworks

Job

Apple Inc.

Cupertino, CA (In Person)

$209,750 Salary, Full-Time

Posted 2 days ago (Updated 14 hours ago) • Actively hiring

Expires 7/10/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Machine Learning Engineer -
AI & ML Evaluation Frameworks Cupertino, California, United States Hardware Summary Posted:
Jun 08, 2026
Role Number:
200665512-0836 The Health Sensing Machine Learning Interpretability & Analytics (MLIA) team ensures clinical rigor and contextual trust are at the foundation of Apple's health sensing features. We are looking for an exceptional ML Engineer to help us build the next generation of scalable evaluation infrastructure and lead rigorous investigations into model performance. You will develop cutting-edge tools, synthetic data pipelines, and automated frameworks that ensure our health features are mathematically sound, demographically equitable, and clinically safe. If you are passionate about AI safety, robust software architecture, and pushing the boundaries of ML innovation, come join us! Description In this role, you will architect and build large-scale evaluation frameworks to interrogate unimodal ML systems and multi-modal foundation models. Beyond infrastructure, you will lead deep-dive ML evaluations, performing failure analysis to uncover performance gaps, reasoning flaws, and edge cases. You will translate findings into actionable insights and work directly with algorithm teams to improve the safety and reliability of our health features. Your work will empower teams across Apple to rapidly evaluate multi-modal sensor fusion while upholding Apple's privacy standards. Responsibilities Design robust methodologies and scalable frameworks to assess the performance, reliability, and safety of both traditional ML and foundation models (e.g., LLMs, diffusion models). Drive failure analysis along with building instrumentation to detect clinical hallucinations, reasoning flaws, and edge cases. Expand LLM/diffusion-based data generation pipelines that enable model training and evaluation without exposing real user data. Build data adaptors and visualizers to fuse asynchronous time-series signals (wearables, camera, behavioral metadata). Develop generalizable tools and metrics to discover biases and measure demographic equity across diverse populations Translate evaluation results into actionable engineering insights for GenAI researchers, algorithm leads, and clinical experts. Minimum Qualifications BS in Computer Science, Machine Learning, Statistics, or related field 3+ years of experience in ML Engineering or Applied ML Strong experience in evaluating supervised, unsupervised, LLMs and deep learning models. Proficiency in Python with the ability to write production-grade code (OOP, CI/CD, Git) Hands-on experience in failure analysis, evaluating LLMs and driving subsequent model improvements Experience building data pipelines, inference frameworks, and automated evaluation systems Strong communication skills to articulate complex technical concepts across technical and non-technical audiences Preferred Qualifications MS/PhD in Computer Science, Machine Learning, Statistics, or related field Experience evaluating LLMs or agentic systems (e.g., LLM-as-a-judge, RAG evaluation) Experience with synthetic data generation and prompt engineering Experience in parallel data processing (Spark, Kubernetes, Airflow) or privacy-preserving ML (Federated Learning) Background in AI Safety, model interpretability, or adversarial testing Interest in digital health and clinical rigor Pay & Benefits At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.
Note:
Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Apple accepts applications to this posting on an ongoing basis.