Annotation Data Scientist, Evaluation Integrity (Siri)

Job

Apple, Inc.

Cambridge, MA (In Person)

$214,750 Salary, Full-Time

Posted 1 day ago (Updated 50 minutes ago) • Actively hiring

Expires 6/20/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Summary Play a part in the ongoing revolution in human-computer interaction. Siri is evolving - and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release. Within the Siri evaluation organization, the Human Evaluation sub-team is responsible for answering the question: can we trust our evals? We do that by designing human-in-the-loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation - the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre-ship model and product decisions. Description As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent-to-agent conversations, and the reliability of LLM-as-judge and rule-based evaluators against Siri's product specifications. You will own annotation initiatives end-to-end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams. Minimum Qualifications Bachelor's or Master's degree in a quantitative or related field such as Data Science, Computer Science, Linguistics, Statistics, or Cognitive Science, or equivalent job-related experience. 3+ years of hands-on experience working with human-annotated datasets or human-in-the-loop evaluation methodologies for machine learning, natural language processing, or large language model systems. 3+ years of experience using Python for data processing, analysis, and prototyping, including experience with libraries such as pandas, Jupyter, and at least one data visualization library. Experience designing, implementing, and communicating annotation schemas, rubrics, or ontologies for machine learning training or evaluation data. Experience managing multiple concurrent dataset curation efforts, including scoping work, iterating on guidelines, coordinating with in-house or vendor annotators, and monitoring annotator performance metrics such as accuracy, throughput, and inter-annotator agreement. Experience specifying or designing custom annotation tooling in collaboration with software engineers. Preferred Qualifications Experience evaluating LLM-powered or agentic systems, including familiarity with LLM-as-judge methodologies, rubric-based grading, or trajectory and tool-call evaluation. Familiarity with statistical methods that address accuracy and variability in human annotation data, such as inter-annotator agreement, Cohen's or Fleiss' kappa, Krippendorff's alpha, or bootstrapping. Data-querying experience with SQL, Spark, or similar, and comfort working with large, complex, real-world datasets. Experience building pre-ship evaluation pipelines for conversational or assistant products. Experience with prompt engineering, or with designing simulated user personae for agent evaluation. Experience running annotation programs across multiple locales or at large scale. Excellent written and verbal communication skills, with the ability to explain technical topics clearly to data scientists, engineers, annotators, and cross-functional partners. Proven ability to collaborate effectively across functions and drive projects of varying sizes and scopes - knowing when to dive deep and when to delegate. Pay & Benefits At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $154,600 and $274,900, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about

Apple Benefits Note:

Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Similar remote jobs

Job
Staff Cloud Engineer
GL
Grindr LLC
Chicago, IL
Posted1 day ago
Updated50 minutes ago
Job
ISSO-893-QK
OP
Onyx Point, Inc.
Hanover, MD
Posted1 day ago
Updated50 minutes ago
Job
Clinical Research Coordinator I | School of Medicine - Pediatrics, Nephrology
EU
Emory University
Atlanta, GA
Posted1 day ago
Updated50 minutes ago
Job
Epic Ambulatory Senior Analyst
NI
Nityo Infotech Corporation
Posted1 day ago
Updated50 minutes ago
Job
Full Stack Data Engineer & Analytics Developer - Remote
VU
VIVA USA INC
Posted1 day ago
Updated50 minutes ago

Similar jobs in Cambridge, MA

Job
Clinical Specialist, Evaluation and Safety
G
Google
Cambridge, MA
Posted1 day ago
Updated50 minutes ago
Job
Director, Project Management - Patient Support Services
S
Sanofi
Cambridge, MA
Posted1 day ago
Updated50 minutes ago
Job
Financial Assistant, IT
HU
Harvard University Cambridge
Cambridge, MA
Posted1 day ago
Updated50 minutes ago
Job
Senior Clinical Outcomes Assessment (COA) Lead
S
Sanofi
Cambridge, MA
Posted1 day ago
Updated50 minutes ago
Job
Counter Staff
TB
Tatte Bakery & Cafe - Harvard SQ
Cambridge, MA
Posted1 day ago
Updated50 minutes ago

Similar jobs in Massachusetts

Job
Staff Product Manager, Healthcare
G
GoTo
Boston, MA
Posted1 day ago
Updated50 minutes ago
Job
Production Technician
NR
Nesco Resource
Concord, MA
Posted1 day ago
Updated50 minutes ago
Job
Construction Manager
NR
Nesco Resource
Framingham, MA
Posted1 day ago
Updated50 minutes ago
Job
Teacher: Early Childhood (Preschool) - Systemwide 2026-2027
WP
Worcester Public Schools
Worcester, MA
Posted1 day ago
Updated50 minutes ago
Job
Store Associate
CH
CVS Health
Lynn, MA
Posted1 day ago
Updated50 minutes ago