Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

On-prem Platform Engineer

Job

CogniSoft Technologies

Charlotte, NC (In Person)

Full-Time

Posted 3 days ago (Updated 15 hours ago) • Actively hiring

Expires 7/3/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV cache / Prefix caching Model optimization:

FP8, AWQ, GPTQ

Distributed & GPU Systems Tensor parallelism and large model scaling

CUDA, NCCL, GPU

architecture GPU partitioning & optimization (MIG) Kubernetes & ML Serving Kubernetes-based ML serving platforms KServe, OpenShift AI Helm charts, Operators, platform automation

GPU Orchestration Run:

AI or similar GPU scheduling/orchestration platforms Multi-tenant GPU workload management Platform Engineering Experience building internal AI/ML platforms (on-prem or hybrid) Strong automation and system design mindset Observability & Performance Prometheus, Grafana ML observability (model latency, throughput, drift, resource utilization) Performance benchmarking and tuning