Data Warehouse / Data Lake Engineer Position Available In Worcester, Massachusetts
Tallo's Job Summary: This job listing in Worcester - MA has been recently added. Tallo will add a summary here for this job shortly.
Job Description
Data Warehouse / Data Lake Engineer
Location:
Worcester, MA
Salary:
$75.00 USD Hourly – $95.00 USD Hourly
Description:
Position Summary
We are seeking a highly skilled Data Warehouse/Data Lake Engineer to support enterprise-wide and scientific data initiatives across R D, CMC, and operational teams. This hybrid role-requiring three on-site days per week in Worcester, MA-involves hands-on architecture, data pipeline development, and integration in a regulated biopharmaceutical environment. The ideal candidate brings expertise in both on-premises and AWS cloud data environments
, with a strong foundation in machine learning and computational biology
.
Key Responsibilities
Design and deploy ML pipelines to assess biologics developability, immunogenicity, and manufacturability.
Develop, tune, and implement models using protein sequences
, structural data, and multi-modal omics datasets
.
Partner with data engineers to define model-ready datasets and standardized data schemas.
Apply methods including deep learning, transformers, graph neural networks, and probabilistic models
.
Automate workflows for model training, validation, and deployment using SageMaker, MLflow
, or similar tools.
Promote reproducible science and maintain model traceability to ensure compliance in regulated environments.
Communicate findings and model limitations to multidisciplinary scientific teams.
Stay up to date with evolving trends in protein machine learning and bioinformatics
.
Required Qualifications
Minimum 5 years of experience in applied machine learning or computational biology
.
Advanced proficiency in Python and ML libraries such as scikit-learn, PyTorch, TensorFlow, HuggingFace Transformers
, etc.
Proven experience building ML pipelines in cloud platforms, preferably AWS
.
Strong understanding of protein structures, antibody sequences
, or biologics development.
Familiarity with data versioning, experiment tracking
, and ML operations (MLOps) best practices.
Ability to collaborate across scientific and engineering domains, translating complex research needs into scalable data solutions.
Preferred Qualifications
Experience with structure-based modeling or AlphaFold-derived data
.
Knowledge of large language models (LLMs) and embedding techniques for protein or gene sequences.
Integration experience with public scientific databases like IEDB, UniProt, SAbDab, and PDB
.
Understanding of GxP compliance
, FAIR data principles
, and regulated model development practices.
By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively “Judge”) to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge’s Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.
Contact:
This job and many more are available through The Judge Group. Please apply with us today!
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
cxjudgpa
Position Id:
1089121