Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Software Engineer (Model Evaluation & Benchmarking)

Job

SpreeAI

San Francisco, CA (In Person)

Full-Time

Posted 2 days ago (Updated 21 hours ago) • Actively hiring

Expires 7/24/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
83
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Software Engineer (Model Evaluation & Benchmarking) SpreeAI San Francisco, CA Job Details 6 hours ago Qualifications AI models Data structures Research Programming languages Machine intelligence Full Job Description About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently, and predictably as it moves from research into production. This position focuses on evaluating generative and vision-based models through automated benchmarking, dataset-driven testing, and performance validation pipelines. You will work at the intersection of applied science, infrastructure, and product — helping define how we measure realism, consistency, and quality across image, video, and multimodal AI systems. Why This Role Exists Modern AI evaluation extends beyond pass/fail testing.
Multimodal generative systems require:
benchmarking across visual realism, pose consistency, and identity preservation, automated regression detection across model checkpoints, scalable evaluation pipelines integrated into continuous deployment workflows. We are building evaluation systems where research velocity and product reliability must coexist. This role is for engineers interested in defining how quality is measured in generative AI systems. What you'll do Build automated evaluation pipelines for multimodal AI models. Benchmark diffusion models, vision systems, and generative workflows. Validate model checkpoints and detect regressions across versions. Develop evaluation metrics for realism, consistency, and performance. Integrate evaluation tooling into CI/CD workflows. Collaborate with ML researchers and infrastructure teams to ensure production readiness. Analyze failure modes and propose evaluation strategies. Core Areas & Tooling Candidates should be familiar with or interested in: LLM, VLM, or Stable Diffusion model evals Image/Video benchmarking techniques Multimodal evaluation frameworks dataset-driven testing workflows research experiment validation pipelines Qualifications Degree in Computer Science, AI, Engineering, or comparable combination of education and practical experience. Strong programming skills in Python. Familiarity with object-oriented programming (C++, Java, Python, or similar). Strong data structures and algorithms fundamentals. Understanding of machine learning experimentation workflows. Preferred Qualifications Experience evaluating vision or generative models. Familiarity with HuggingFace ecosystem or open-source ML toolkits. Experience building automated test frameworks or benchmarking tools. Knowledge of diffusion models or multimodal architectures. Experience with data analysis tools (NumPy, Pandas, visualization libraries). SPREEAI is a fast-growing, innovative AI company at the forefront of fashion and e-commerce, revolutionizing how consumers engage with fashion through lifelike photorealistic try-on technology and hyper-personalized shopping experiences. Our mission is to redefine the retail landscape with cutting-edge AI solutions that blend high fashion and technology. We thrive in a dynamic, fast-paced environment where creativity meets technology to drive real impact. If you are passionate about innovation and shaping the future of fashion, SPREEAI offers a platform to make your mark.