On-prem Cloud Engineer
Job
MASE Insights
Charlotte, NC (In Person)
$93,600 Salary, Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
84
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
On-prem Cloud Engineer MASE Insights Charlotte, NC Job Details Contract From $45 an hour 21 hours ago Qualifications GPU programming IT system monitoring Model deployment Red Hat OpenShift Benchmarking AI Batch data processing MLOps Generative AI System performance monitoring Full Job Description Job Duties Build, configure, and operate on‑prem Kubernetes/OpenShift AI platforms for deploying and serving GenAI models and LLM inference workloads.
- Design and optimize high‑performance inference stacks using vLLM, TensorRT‑LLM, Triton Inference Server, SGLang, and advanced techniques (continuous batching, speculative decoding, KV caching).
- Manage GPU orchestration and capacity using
Run:
AI, MIG, CUDA/NCCL, and tensor parallelism to maximize utilization and throughput.- Deploy and operate Kubernetes ML serving frameworks (KServe, Helm, Operators) for scalable, reliable model serving.
- Drive inference optimization and benchmarking, leveraging FP8, AWQ, GPTQ, and performance tools such as GuideLLM and Locust.
- Implement observability and ML monitoring using Prometheus, Grafana, Arize AI, ensuring SLA/SLO compliance for GenAI services.
- Collaborate with ML and research teams to onboard new models, tune inference performance, and productionize Tech Skills needed vLLM
- TensorRT‑LLM
- Triton Inference Server
- SGLang
- Inference Optimization
- Continuous Batching
- Speculative Decoding
- KV Cache / Prefix Caching
FP8 / AWQ / GPTQ
- Tensor Parallelism
- Kubernetes ML Serving
- KServe
- OpenShift AI
- Helm / Operators
- GPU Orchestration
Run:
AI- Performance Benchmarking
CUDA / NCCL / MIG
- Prometheus / Grafana
ML Observability GuideLLM, Locust Pay:
From $45.00 per hourWork Location:
In personSimilar remote jobs
LifeStance Health
New Hyde Park, NY
Posted2 days ago
Updated10 hours ago
Albemarle County Public Schools
Charlottesville, VA
Posted2 days ago
Updated10 hours ago
Intermountain Health
Frankfort, KY
Posted2 days ago
Updated10 hours ago
Similar jobs in Charlotte, NC
Harris Teeter, LLC
Charlotte, NC
Posted2 days ago
Updated10 hours ago
Tailormade Protective Services
Charlotte, NC
Posted2 days ago
Updated10 hours ago
Similar jobs in North Carolina
Powerback Rehabilitation
Pinehurst, NC
Posted2 days ago
Updated10 hours ago