Tallo logoTallo logo

On-prem Cloud Engineer

Job

MASE Insights

Charlotte, NC (In Person)

$93,600 Salary, Full-Time

Posted 3 days ago (Updated 10 hours ago) • Actively hiring

Expires 6/11/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
84
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

On-prem Cloud Engineer MASE Insights Charlotte, NC Job Details Contract From $45 an hour 21 hours ago Qualifications GPU programming IT system monitoring Model deployment Red Hat OpenShift Benchmarking AI Batch data processing MLOps Generative AI System performance monitoring Full Job Description Job Duties Build, configure, and operate on‑prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.
  • Design and optimize high‑performance inference stacks using vLLM, TensorRT‑LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).
  • Manage GPU orchestration and capacity using
Run:
AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.
  • Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.
  • Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.
  • Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.
  • Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize Tech Skills needed vLLM
  • TensorRT‑LLM
  • Triton Inference Server
  • SGLang
  • Inference Optimization
  • Continuous Batching
  • Speculative Decoding
  • KV Cache / Prefix Caching
FP8 / AWQ / GPTQ
  • Tensor Parallelism
  • Kubernetes ML Serving
  • KServe
  • OpenShift AI
  • Helm / Operators
  • GPU Orchestration
Run:
AI
  • Performance Benchmarking
CUDA / NCCL / MIG
  • Prometheus / Grafana
ML Observability GuideLLM, Locust Pay:
From $45.00 per hour
Work Location:
In person

Similar remote jobs

Similar jobs in Charlotte, NC

Similar jobs in North Carolina