Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Experience:
7 10+ years in Data Engineering / Cloud Development Proven experience designing and building scalable data pipelines (ingestion, transformation, validation) using AWS, Python, and Databricks (Spark/PySpark) Experience working with structured, semi-structured, and unstructured data Exposure to life sciences data formats such as DICOM, FASTQ/BAM/PLINK, or
SAS7BDAT
is strongly preferred Hands-on experience with modern data platforms such as Databricks Experience with data governance tools (e.g., Immuta) is preferred Familiarity with analytics and statistical tools such as Tableau, R/Posit, or SAS is a plus Deep expertise in AWS data services (e.g., S3, Lambda, RDS, FSx/EFS) and cloud-native architecture best practices Strong programming skills in Python and SQL, with experience in PySpark; working knowledge of R is a plus Experience with CI/CD and orchestration tools such as GitHub, Azure DevOps, and Apache Airflow Working knowledge of data science and machine learning concepts (e.g., scikit-learn, TensorFlow, PyTorch) Experience with Databricks, including cluster usage and performance optimization; exposure to Unity Catalog and platform administration is a plus AWS and/or Databricks certifications are preferred Experience with Kubernetes/EKS is a plus Experience architecting technology solutions to meet business requirements Experience managing technology projects end-to-end through planning, design, build, testing, and deployment phases Understanding of Computer System Validation for GxP vs. Non-GxP technologies is preferred Strong communication skills and ability to work collaboratively with internal IT partners, business partners, and external vendors