Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

On-prem Platform Engineer

Job

CogniSoft Technologies

Charlotte, NC (In Person)

Full-Time

Posted 3 days ago (Updated 15 hours ago) • Actively hiring

Expires 7/3/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV cache / Prefix caching Model optimization:
FP8, AWQ, GPTQ
Distributed & GPU Systems Tensor parallelism and large model scaling
CUDA, NCCL, GPU
architecture GPU partitioning & optimization (MIG) Kubernetes & ML Serving Kubernetes-based ML serving platforms KServe, OpenShift AI Helm charts, Operators, platform automation
GPU Orchestration Run:
AI or similar GPU scheduling/orchestration platforms Multi-tenant GPU workload management Platform Engineering Experience building internal AI/ML platforms (on-prem or hybrid) Strong automation and system design mindset Observability & Performance Prometheus, Grafana ML observability (model latency, throughput, drift, resource utilization) Performance benchmarking and tuning