Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Machine Learning Engineer, ML/GenAI Evaluation

Job

Apple, Inc.

Austin, TX (In Person)

$236,900 Salary, Full-Time

Posted 2 days ago (Updated 14 hours ago) • Actively hiring

Expires 7/13/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Summary Would you like to contribute to Machine Learning and Generative AI technologies? Are you passionate about measuring what matters and ensuring AI systems work reliably for everyone? Do you believe that rigorous evaluation - including holding models accountable to fairness standards - is what separates great ML from good ML? We truly believe it is! We are defining what exceptional looks like for machine learning across Wallet, Payments, and Commerce. As a Machine Learning Engineer specializing in Evaluation, you will establish the evaluation criteria, metrics frameworks, and quality standards that determine when models are ready to reach hundreds of millions of users. Your judgment shapes model quality and earns the confidence to ship. You'll work at the intersection of rigorous ML science and high-impact product decisions, collaborating closely with ML Engineering, Product, Privacy, and Legal teams. This unique opportunity puts you at the center of model quality - designing adversarial test strategies, surfacing failure modes before they reach users, and owning the sign-off process that ensures Apple's financial features meet the highest bar for accuracy, robustness, and reliability. Description The ideal candidate is a rigorous, curious ML practitioner who believes that how you measure a model is just as important as how you train it. You think critically about what metrics actually capture, know how models break in the real world, and hold quality standards others find uncomfortably high - including on dimensions like fairness. You will own the full evaluation lifecycle for ML models across Wallet features - designing test frameworks, adversarial corpora, and benchmarks that reflect the diversity of Apple's global user base, then making the final quality call before any model ships. Your findings directly shape model development priorities and product decisions at scale. Minimum Qualifications M.S. in Machine Learning, Computer Science, Statistics, Applied Mathematics, or a related technical field strongly preferred. Bachelor's degree with 7+ years hands-on experience in ML evaluation, model quality, or applied research will be considered 5+ years of hands-on ML experience, with deep expertise in model evaluation, offline metrics design, and behavioral testing Strong track record designing evaluation frameworks for production ML systems - not just accuracy/F1, but precision-recall tradeoffs, calibration, fairness, and task-specific quality dimensions Creative mindset with the ability to translate standard ML evaluation metrics (F1, AUC, etc.) into utility and user trust measures Experience testing for distribution shift, out-of-distribution generalization, and temporal drift in real-world deployed models Proven ability to construct adversarial test suites, aggressor scenarios, and edge-case corpora that surface model failure modes before they reach users Experience with structured and semi-structured document understanding, OCR pipelines, or financial data extraction is a strong plus Strong programming skills in Python; fluency with evaluation tooling, data pipelines, and experiment tracking (e.g., MLflow, W&B, or equivalent) Excellent communication skills - ability to translate metric results into product-quality narratives for engineering and executive audiences Experience owning model quality sign-off in a cross-functional launch process Preferred Qualifications PhD in Computer Science, Data Science, Statistics, AI/ML, or a related field. Experience with Bayesian or causal graph-based approaches to data generation. Experience with causal approaches to fairness evaluation - counterfactual fairness, causal Shapley values, or structural causal model-based bias auditing. Experience evaluating models under privacy constraints or on-device inference settings is a plus. Familiarity with confidence calibration techniques and uncertainty quantification a plus Background in financial services, fintech, or consumer payment products Pay & Benefits At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about

Apple Benefits Note:

Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.