Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Data Engineer Jobs in USA, CA, Menlo Park | Rose International Job

Job

Rose International

Remote

Full-Time

Posted 3 days ago (Updated 10 hours ago) • Actively hiring

Expires 7/4/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Client Location:

Onsite 5 days a week in Menlo Park, CAREQUIRED

Bachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field
5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference
Demonstrated track record of building and operating production data pipelines that invoke ML models at scale
Strong software engineering fundamentals. Python, data structures, concurrency/async programming
Advanced SQL & data pipeline expertise. Complex queries, query optimization, pipeline orchestration frameworks (Airflow, Dataswarm, or equivalent)
Experience integrating ML models into data pipelines. Calling inference endpoints, managing model versions, batching requests, handling inference failures at scale
Proficiency with AI-assisted coding agents (e.g., Copilot, Cursor, Codex). Expected to leverage AI tools as a force multiplier for writing, debugging, and reviewing code, building pipelines faster, and accelerating day-to-day engineering workflows
Strong verbal and written communication skills, problem-solving ability, and cross-functional collaboration.

PREFERRED

Working knowledge of embeddings and vector representations like generating, storing, indexing, and querying embeddings (FAISS, Milvus, or equivalent)
Familiarity with content-understanding models like image classifiers, object detection, OCR, NSFW detection, aesthetic scoring
Experience with LLMs for data tasks like prompt engineering for annotation, data cleaning, or evaluation using LLM APIs
Knowledge of generative AI like diffusion models, image generation, evaluation metrics (FID, CLIP score, etc.

Job Description:

Generative AI models are only as good as the data they consume. Unlike traditional data engineering, building data pipelines for generative AI requires orchestrating ML model invocations (content understanding classifiers, embedding models, LLM-based cleaners) alongside standard SQL-based transformations, all at billion-row scale.

This role sits at the intersection of Data Engineering and ML Systems. The Senior AI Data Engineer will own end-to-end data pipelines that don't just move and transform data, but enrich it through remote model inference, managing the systems complexity of async execution, capacity allocation, retry/fallback logic, and throughput optimization that comes with it. This is not a pure ETL-with-SQL role; it demands hands-on systems experience with distributed inference infrastructure.

Our team develops comprehensive data curation and evaluation solutions for image generation models across quality dimensions including visual quality, prompt adherence, identity preservation, naturalness, and visual text generation.

Job ResponsibilitiesMain ResponsibilitiesAI-Augmented Data Pipelines:

Design and maintain AI-augmented, large-scale data pipelines (billions of images) integrating traditional transformations with ML models (classifiers, embeddings, LLMs) for cleaning and annotation.

Remote Inference Orchestration:

Own the systems for remote ML model inference orchestration within pipelines, managing batching, retries, async jobs, and ensuring graceful degradation.

Feature Pipelines:

Build and maintain scalable pipelines for generating, storing, and serving vector embeddings, including nearest-neighbor index management and quality validation.

Data Curation at

Scale:

Source, filter, and curate training datasets using a combination of SQL and model-derived signals (e.g., aesthetic scores, NSFW classifiers), owning the end-to-end data flow and maintaining governance, quality, and compliance.

Additional ResponsibilitiesLLM-Assisted Annotation:

Design and operate pipelines that use LLMs and vision models for automated annotation of training data, including auditing workflows to measure and improve annotation model performance.

Tooling & Frameworks:

Contribute to shared tooling and frameworks that make it easier for the broader team to build AI-augmented data pipelines — e.g., reusable operators for model invocation, standard patterns for async job management.

Education / ExperienceBachelor's degree or higher in Computer Science, Data Engineering, Machine Learning, or a related STEM field.5+ years of industry experience in data engineering, ML engineering, or a hybrid role involving both data pipelines and model serving/inference.

Pursuant to the California Fair Chance Act, Los Angeles County Fair Chance Ordinance for Employers, Los Angeles Fair Chance Initiative for Hiring Ordinance, and San Francisco Fair Chance Ordinance, qualified applicants will be considered for assignment with arrest and conviction records. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness, meet client expectations, standards, and accompanying requirements, and safeguard business operations and company reputation.

Only those lawfully authorized to work in the designated country associated with the position will be considered.
Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client's business needs and requirements.

•