Design and implement scalable, distributed data pipelines using AWS services such as S3, Redshift, Glue, Lambda, EMR, Athena, and Kinesis, with transformation logic developed in PySpark, Python, and SQL. Architect and lead the implementation of a comprehensive security framework for the Databricks platform, including identity and access management (IAM), data governance, network security, encryption, and audit controls. Define and enforce enterprise-grade security standards across Databricks workspaces, Unity Catalog, and associated data pipelines, ensuring alignment with organizational policies and industry best practices. Implement and manage user access provisioning and authentication through Microsoft Entra ID (formerly Azure AD), including SCIM-based group provisioning, SSO integration, RBAC policies, and conditional access for Databricks. Apply deep domain expertise in molecule-level data to uncover strategic insights and identify business opportunities across the drug development lifecycle. This includes interpreting and linking molecular entities, manufacturers, regulatory events, and clinical stage indicators to support asset evaluation and portfolio optimization. Lead the design, development, and deployment of generative AI use cases, taking them from ideation through production implementation, ensuring long-term scalability and maintainability. Develop and fine-tune large language model (LLM) applications, including prompt engineering strategies, reusable prompt templates, and context augmentation techniques for improved response accuracy and relevance. Integrate generative AI systems with enterprise data ecosystems using REST APIs, vector databases, knowledge graphs, orchestration frameworks, and other scalable backend components. Establish robust LLM evaluation and monitoring frameworks, defining key metrics for measuring model accuracy, relevance, safety, and overall production performance. Collaborate cross-functionally with engineering, data science, product, and business stakeholders to prioritize and deliver impactful, responsible AI solutions aligned with business goals. Conduct architecture reviews and optimize end-to-end data pipelines and GenAI workflows for cost efficiency, runtime performance, and scalability in multi-cloud environments. Implement CI/CD pipelines using GitHub and GitHub Actions, enabling modular, version-controlled deployment of infrastructure, data products, and AI applications. Develop intelligent knowledge workflows using Flowise, retrieval-augmented generation (RAG), function calling, SQL orchestration, and webhook integrations to support dynamic use cases. Design and build process mining dashboards using Celonis EMS, including KPI definitions, root cause analysis using Process Query Language (PQL), and operational insights. Automate enterprise workflows through Celonis Action Flows, integrating seamlessly with systems like SAP, Salesforce, and other business platforms to enable process optimization. Model enterprise process data within the Celonis Data Model (CDM) and configure scalable data pipelines using Celonis Data Integration for high-performance analytics. Can work remotely or telecommute.
REQUIREMENTS
MINIMUM Education Requirement:
Bachelor's degree in Computer Science, Computer Engineering, or related field of study.
MINIMUM Experience Requirement:
7 years of Software Engineering, Data Engineering, or related experience.
Alternative Education and Experience Requirement:
Master's degree in Computer Science, Computer Engineering, or related field of study plus 5 years of Software Engineering, Data Engineering, or related experience.
Required knowledge or experience with:
Proficiency in programming languages: Python, R, Scala, and Java. Design and implement scalable, distributed data pipelines using AWS services such as S3, Redshift, Glue, Lambda, EMR, Athena, and Kinesis, with transformation logic developed in PySpark, Python, and SQL. Proficient in Linux/Unix environments with experience in shell scripting (Bash) for automation and system operations. Experienced in working with relational databases such as MySQL, PostgreSQL, and SQL Server, as well as cloud data warehouses like Amazon Redshift. Architect and lead the implementation of a comprehensive security framework for the Databricks platform, including identity and access management (IAM), data governance, network security, encryption, and audit controls. Define and enforce enterprise-grade security standards across Databricks workspaces, Unity Catalog, and associated data pipelines, ensuring alignment with organizational policies and industry best practices. Implement and manage user access provisioning and authentication through Microsoft Entra ID (formerly Azure AD), including SCIM-based group provisioning, SSO integration, RBAC policies, and conditional access for Databricks. Apply deep domain expertise in molecule-level data to uncover strategic insights and identify business opportunities across the drug development lifecycle, including linking molecular entities, manufacturers, regulatory events, and clinical stage indicators to support asset evaluation and portfolio optimization. Experience in managing Databricks Unity Catalog using Terraform, including configuration of external locations, catalogs, schemas, and access controls. Proficient in automating data governance and access management through Terraform modules to provision Unity Catalog resources and integrate securely with cloud storage. Implement automated CI/CD pipelines with GitHub, GitHub Actions, Jenkins, and Airflow, enabling modular, version-controlled deployment of infrastructure. Develop and deploy machine learning models using supervised learning (linear regression, logistic regression, decision trees, random forests), unsupervised learning (k-means clustering, PCA), and deep learning (neural networks, CNNs, RNNs) to generate actionable insights and improve metrics. Apply time series forecasting models such as ARIMA, Prophet, and LSTM for predictive analytics on temporal datasets. Apply NLP and text analytics techniques, including text preprocessing, TF-IDF, Word2Vec embeddings, and transformer-based models (BERT) for text classification and entity recognition. Create interactive visualizations using Power BI on top of Databricks Delta tables for real-time analytics and develop in-depth exploratory visualizations using Matplotlib, Seaborn, and Plotly. Develop and maintain interactive dashboards and visualizations in Amazon QuickSight, leveraging data processed and stored in Delta Lake.
Salary:
$178131 to $186000 per year Compensation and Benefits The salary pay range estimated for this position Sr Developer based in Massachusetts is $178,131.00-$186,000.00. This position may also be eligible to receive a variable annual bonus based on company, team, and/or individual performance results in accordance with company policy. We offer a comprehensive Total Rewards package that our U.S. colleagues and their families can count on, which includes: A choice of national medical and dental plans, and a national vision plan, including health incentive programs Employee assistance and family support programs, including commuter benefits and tuition reimbursement At least 120 hours paid time off (PTO), 10 paid holidays annually, paid parental leave (3 weeks for bonding and 8 weeks for caregiver leave), accident and life insurance, and short- and long-term disability in accordance with company policy Retirement and savings programs, such as our competitive 401(k) U.S. retirement savings plan Employees' Stock Purchase Plan (ESPP) offers eligible colleagues the opportunity to purchase company stock at a discount For more information on our benefits, please visit: https://jobs.thermofisher.com/global/en/total-rewards Thank you for your interest as you consider starting a new career journey with us. As the world leader in serving science, our colleagues develop critical solutions through innovation—and build rewarding careers. Discover their extraordinary stories and connection to our Mission to enable our customers to make the world healthier, cleaner and safer. Their work is a story of purpose. What story will you tell? Thermo Fisher Scientific Inc. is the world leader in serving science, with annual revenue of more than $40 billion. Our Mission is to enable our customers to make the world healthier, cleaner and safer. Whether our customers are accelerating life sciences research, solving complex analytical challenges, increasing productivity in their laboratories, improving patient health through diagnostics or the development and manufacture of life-changing therapies, we are here to support them. Our global team delivers an unrivaled combination of innovative technologies, purchasing convenience and pharmaceutical services through our industry-leading brands, including Thermo Scientific, Applied Biosystems, Invitrogen, Fisher Scientific, Unity Lab Services, Patheon and PPD. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation. Thermo Fisher Scientific is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, creed, religion, color, national or ethnic origin, citizenship, sex, sexual orientation, gender identity and expression, genetic information, veteran status, age or disability status.