Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

AI Research Engineer / Scientist (Speech & Conversational Intelligence)

Job

Insight Global

San Francisco, CA (In Person)

Full-Time

Posted 2 days ago (Updated 18 hours ago) • Actively hiring

Expires 7/24/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Job Description We are seeking a high-caliber AI Research Engineer / Scientist to join our specialized Conversational Intelligence team. In this role, you will bridge the gap between advanced generative speech research and real-world production systems. You will focus on building Duplex/Real-time Speech-to-Speech pipelines and optimizing the post-training infrastructure for our next-generation Slackbot Advanced Voice Mode. The ideal candidate brings strong academic roots in generative speech modeling (TTS, Speech-to-Speech dialogue, Disentanglement, or Diffusion) combined with hands-on experience tuning Large Language Models (LLMs) for complex tool call execution and low-latency interactions. Key Responsibilities 1.

Post-Training & Data Pipeline Engineering Pipeline Architecture:

Develop, scale, and maintain the supervised fine-tuning (SFT) and post-training pipelines supporting the advanced voice model.

Data Curative & Synthesis:

Clean, curate, and expand multi-modal datasets for enterprise voice interactions. Build automated, synthetic human voice simulation engines to generate high-fidelity end-to-end evaluation training data.

Enterprise Tool Integration:

Optimize datasets to handle complex, multi-turn, and multi-tool execution flows unique to collaboration environments. 2.

Generative Model Training & Iteration Fine-Tuning & Alignment:

Train and iteratively refine model checkpoints targeting rapid, accurate tool selection and API invocation (e.g., automated status updates, channel posting, and cross-functional reminders).

Hallucination & Noise Mitigation:

Engineer the pipeline for minimal hallucination rates during name/channel entity resolution, and implement audio robustness behaviors against ambient acoustic noise.

Multilingual Expansion:

Explore and implement cross-lingual transfer, accent resilience, and expressive, natural speech generation paradigms to ensure global user accessibility. 3.

Evaluation & Quality Assurance Advanced Speech Evaluation:

Define automated evaluation metrics, severity scoring matrices, and layer-wise distillation methodologies to benchmark voice models against strong baselines.

Trace Analysis:

Deeply analyze model traces and system failure modes to identify and fix systemic degradation in full-duplex/micro-turn voice architectures. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.

To learn more about how we collect, keep, and process your private information, please review

Insight Global's Workforce Privacy Policy:

https://insightglobal.com/workforce-privacy-policy/.

Skills and Requirements Education:

Enrolled in or graduated from a top-tier Master's or Ph.D. program in Computer Science, Creative Informatics, or Information and Communications Engineering with a strong research focus on Speech Processing.

Speech Mastery:

Proven track record in generative speech modeling, including experience with Expressive Text-to-Speech (TTS), Speech-to-Speech (S2S) dialogue cascades, and Speech Evaluation (e.g., MOS prediction).

Deep Learning Frameworks:

Deep proficiency in Python and PyTorch, with a solid grasp of foundational architectures (Diffusion models, Self-Supervised Learning, and LLM fine-tuning techniques).

Systems & Infrastructure:

Practical understanding of cloud architecture and machine learning engineering workflows (AWS environment experience or certifications are highly valued).

Publication Record:

Authorship in premier signal processing or speech communication conferences (e.g., ICASSP, INTERSPEECH, APSIPA).

Duplex Architectures:

Direct academic or project experience with Full-Duplex systems, VAD-free cascaded pipelines, or micro-turn conversational optimization.

Linguistic Versatility:

Multilingual fluency (e.g., native/fluent capabilities in English, Chinese, or Japanese) to drive global accent and multilingual modeling tasks.