Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Victoria Metrics Architect

Job

VDart, Inc.

Bellevue, WA (In Person)

Full-Time

Posted 5 days ago (Updated 1 day ago) • Actively hiring

Expires 7/22/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
74
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Job Title:
Victoria Metrics Architect Location :
Bellevue, WA /
Overland Park, KS Contract Required Qualifications:
VictoriaMetrics — Expert Level Candidates should have hands-on Victoria Metrics production experience at scale for this role. Significant production experience operating VictoriaMetrics at scale — VMCluster deployments handling sustained, high-cardinality workloads in live environments. This is the non-negotiable baseline for the role.
VMCluster internals at depth:
the write path from VMInsert through VMStorage replication, the query fan-out and merge behavior of VMSelect, and the performance implications of topology decisions on ingestion throughput and query latency.
Active time series lifecycle management:
how time series are created, sustained, and expired; the relationship between cardinality and memory pressure; and the ability to diagnose and remediate a cardinality explosion in a production environment.
MetricsQL fluency:
advanced aggregation, rollup window semantics, subquery patterns, and query design that reduces load on VMStorage at scale.
VMAgent at depth:
scrape configuration, stream aggregation for edge-side cardinality reduction, rate limiting, deduplication, and write buffering continuity during upstream unavailability.
VMAuth multi-tenancy:
per-tenant routing via VMUser custom resources, token-based authentication, and read/write path segregation.
VMAlert and VMAnomaly:
alerting and recording rule design, anomaly model selection, and integration with enterprise alert dispatch systems.
Federation design:
global query layer architecture, cross-cluster deduplication, and remote_write performance tuning under high-cardinality ingestion at sustained scale.
Storage architecture:
retention modelling, down sampling, backup and restore, and capacity planning for time-series workloads.
VictoriaMetrics Operator:
lifecycle management of all VM custom resource definitions and upgrade strategy on OpenShift. Red Hat OpenShift — Production Depth Substantial Kubernetes experience with a material portion on Red Hat OpenShift in bare-metal or on-premises enterprise environments — not exclusively managed cloud Kubernetes.
OpenShift security model:
Security Context Constraints, Network Policy, namespace RBAC, and the constraints that apply to stateful, high-throughput workloads. StatefulSet lifecycle, PersistentVolumeClaim management, and StorageClass selection for write-intensive time-series workloads. OCP upgrade path management and the implications for Operator compatibility and cluster monitoring interactions.
Multi-cluster OpenShift topology:
hub and spoke architectures, cross-cluster networking, and remote scrape or remote_write connectivity across cluster boundaries. Comfort designing for IPv6 and dual-stack network environments — increasingly common in carrier-grade infrastructure deployments. GitOps and CI/CD Delivery GitOps-native delivery as a professional standard: all platform configuration managed in Git, no manual changes to production cluster state, and a clear promotion gate model from lab through to production.
ArgoCD at production scale:
application hierarchy design, sync policy configuration, health checks for custom resources, and multi-cluster application deployment. Kustomize overlay strategy for multi-cluster and multi-tenant deployments — base definitions with environment-specific patches.
GitLab CI/CD pipeline design:
manifest validation, environment promotion gates, and automated operator upgrade pipelines. Terraform or equivalent infrastructure-as-code for provisioning supporting platform resources. Security and Identity HashiCorp Vault at production depth: dynamic secrets, Vault Secrets Operator synchronisation, token lifecycle management, and PKI secrets engine integration for certificate issuance.
Enterprise PKI:
TLS certificate lifecycle, automated renewal, and CA distribution to distributed cluster workloads. OIDC and OAuth2 integration: platform service authentication via an enterprise identity provider, service account token federation, and the elimination of static credential patterns. Zero Trust design as a default: every interface between platform components authenticated and encrypted; no implicit trust between tenants, ingestion sources, or query consumers. Telecommunications and Network Observability Proven experience designing or operating observability platforms for telecommunications infrastructure — 5G core, RAN, transport, or carrier-grade edge environments.
FCAPS framework alignment:
mapping Fault, Configuration, Accounting, Performance, and Security monitoring requirements to metric taxonomies, alerting rules, and operational dashboards.
Heterogeneous vendor telemetry integration:
Prometheus exporter compatibility assessment, OpenMetrics format validation, and labelling standardization across multi-vendor sources. Multi-vendor, multi-tenant metrics ingestion design: label isolation strategy, per-vendor cardinality allocation, and data segregation enforcement at the proxy and routing layer.
Enterprise NOC integration:
alert routing design from evaluation engine through to ticketing or event management platforms, deduplication, suppression, and severity mapping.