Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Senior Data Scientist

Job

InterVenn Biosciences

South San Francisco, CA (In Person)

$179,500 Salary, Full-Time

Posted 4 days ago (Updated 15 hours ago) • Actively hiring

Expires 7/13/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Senior Data Scientist InterVenn Biosciences South San Francisco, CA Job Details Full-time $163,000 - $196,000 a year 13 hours ago Qualifications AI models Scientific publications Generative models AI platforms (beyond public GPTs) Machine learning frameworks Generative AI Full Job Description At InterVenn, our technology enables and empowers the understanding of glycoproteomics, a new clinical layer of biology beyond the genome, using a simple blood draw. InterVenn's powerful solutions will broaden humankind's perception and interpretation of diseases like cancer. We look forward to having new members join our team who have diverse perspectives and backgrounds, challenge the status quo, and are solution oriented. We are seeking a creative, methodologically rigorous Senior Data Scientist to push the frontier of how we research and build classifiers from glycoproteomic data. This is a research-forward individual contributor role for someone who reaches across the full breadth of modern statistical and AI methods — classical ML, deep learning, foundation models for biology, generative approaches, and whatever the literature surfaces next — and is energized by open problems: new quantification and normalization schemes, novel feature engineering, multimodal model architectures, and the biological interpretation of model outputs.

RESPONSIBILITIES

Design, prototype, and rigorously evaluate novel classifier architectures for clinical diagnostics across oncology indications Lead exploratory research into new quantification, normalization, and feature engineering methods for high-dimensional glycoproteomic data Bring a diverse modeling toolkit — classical statistical methods, tree-based ensembles, deep learning, probabilistic and Bayesian approaches, foundation models, graph neural networks, and generative AI — and choose the right tool for the problem based on evidence rather than habit or hype Develop cross-validation, calibration, and uncertainty-quantification strategies that hold up to the realities of small clinical cohorts and high feature counts Investigate and mitigate batch, cohort, and site effects so that models generalize from discovery to bridging to locked panels Drive cross-indication synthesis — separate shared disease biology from indication-conditioned signal, and from nonspecific inflammatory or acute-phase axes Build multimodal models that combine glycan/motif information, proteomic grounding, and clinical covariates rather than relying on protein-quantity signal alone Translate emerging techniques from the ML, AI, and computational-biology literature into production-ready methods Mentor junior data scientists and raise the methodological bar across the team

QUALIFICATIONS

Ph.D. in Statistics, Computer Science, Computational Biology, Bioinformatics, or a related quantitative field, plus 6+ years of experience building predictive models on biological data in industry or academia; alternatively, an MS in a similar field with 8+ years of relevant experience Demonstrated track record of methodological innovation — first-author publications, novel methods deployed in production, open-source contributions, or comparable evidence of original work Deep proficiency in Python and/or R, including the modern ML stack (scikit-learn, PyTorch or TensorFlow, XGBoost/LightGBM, and similar) Methodological breadth across paradigms — comfortable moving between classical statistics, tree-based ML, deep learning, and modern AI (transformers, graph neural networks, foundation models, generative methods) — and the judgment to argue rigorously for one approach over another Strong statistical foundation: cross-validation strategy, regularization, calibration, uncertainty quantification, and handling of confounders and class imbalance Hands-on experience building and validating classifiers on high-dimensional, low-sample-size biological data (proteomics, glycoproteomics, transcriptomics, or genomics) Experience with batch-effect correction and normalization techniques, and a healthy skepticism about how those choices propagate into downstream performance estimates Preference will be given to candidates with experience in multimodal modeling, interpretability methods, or foundation/representation-learning approaches for biological data Familiarity with clinical diagnostic development — analytical and clinical validation, locking classifiers, and bridging studies — is a strong plus Excellent written and verbal communication: able to explain novel methods clearly to wet-lab scientists, clinicians, and fellow statisticians alike A genuine desire to impact patient lives and contribute to the broader scientific community