Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
TECHNICAL ARCHITECT- MLOPS ENGINEER
Toronto 6 - 10
Years Experience Job ID:
115756-39-1 Job description
POSITION / TITLE
MLOps Engineer Location:
Montreal, QC Job Overview We are looking for an experienced MLOPs / LLMOPs Engineer with a strong background in deploying and monitoring machine learning and large language model (LLM) pipelines. The ideal candidate will have 10+ years of experience in MLOPs, with expertise in setting up end-to-end ML/LLM pipelines using open-source tools and cloud-native solutions on platforms like AWS, GCP, and Azure. This role requires hands-on knowledge in deploying, automating, and monitoring ML/LLM workflows, with a solid grounding in DevOps practices to ensure seamless CI/CD processes.
Responsibilities:
Pipeline Design & Implementation:
Design, build, and manage MLOPs and LLMOPs pipelines for data ingestion, model training, validation, deployment, and monitoring.
Use open-source tools such as Mlflow, Kubeflow, DVC, and Airflow to automate and monitor machine learning workflows.
Implement scalable LLM-specific solutions for model training and inference, optimizing resource allocation and deployment efficiency.
Cloud-native MLOPs Implementation:
Set up and manage MLOPs pipelines in Primary in GCP (SageMaker, EKS, Lambda, S3), or have similar experience with GCP (Vertex AI, AI Platform Pipelines), and Azure (Machine Learning, AKS, Azure Functions).
Manage model versioning, retraining, and deployment workflows on cloud platforms to ensure consistent performance and availability.
Execute CI/CD pipelines for ML models with GitHub Actions, Jenkins, or GitLab CI.
Model Monitoring & Performance Optimization:
Monitor models in production using Prometheus, Grafana, and Tensorboard, establishing observability metrics for model drift, accuracy, and latency.
Collaborate with Data Engineering and ML teams to implement scalable and efficient pipelines
Use A/B testing and shadow deployment strategies to validate and optimize LLM model performance in real-time.
LLM-specific Model Operations:
Deploy and monitor LLMs for specific tasks, ensuring they adhere to performance SLAs and are optimized for cost.
Understand techniques of fine-tuning, optimizing inference, and managing infrastructure costs for large LLMs. Required Skills and Qualifications Technical Skills - Good to have: Proficiency with Kubernetes and Docker for container orchestration and model deployment.
Experience with open-source MLOPs tools (Mlflow, Kubeflow, DVC) and data versioning.
Hands-on experience with cloud-native ML tools in AWS, GCP, or Azure and associated ML services.
Knowledge of Python or Bash scripting for automating processes and custom integrations.
DevOps-Related Skills:
Solid understanding of CI/CD practices and tools like GitHub Actions, Jenkins, or GitLab CI/CD to build and deploy ML/LLM models.
Proficient in infrastructure-as-code tools, such as Terraform or Ansible, to enable automated provisioning and configuration management. Programming & Scripting Python
SQL, No-SQL, PySpark (Optional) AI/ML & Data Science - Good To have: Supervised, Unsupervised Learning & Model evaluation metrics
NLP, RAG, GenAI, LLMs
Deep Learning (Sequential & Functional APIs) using Pytorch/TensorFlow
MLOPs & Mlflow Experiment Tracking
Explainable AI (XAI)
LIME SHAP
(Optional)
Cloud Platforms:
(
Primary:
AWS) or Handons Expertise on any of cloud platforms
Azure AI/ML,
Google Vertex AI,
Databricks Studio Education:
Bachelors/ Master's/PhD Degree in Mathematics, Statistics, Physics, Computer Science, Engineering, Data Science, or a related relevant degree from quantitative field.
Process Skills:
Understanding of Agile and Scrum methodologies.
Ability to follow SDLC processes and contribute to technical documentation.
Behavioral Skills:
Must have Structural thinking and goal-oriented approach to problem-solving
Self-motivated and capable of working independently with minimal management supervision.
Well-developed design, analytical & problem-solving skills
Excellent communication and interpersonal skills.
Excellent team player, able to work with virtual teams in several time zones.