Skip to main content
Tallo logoTallo logo

Monitoring & Telemetry Lead SME

Job

Electronic Consulting Services, Inc (ECS Federal)

Falls Church, VA (In Person)

Full-Time

Posted 1 day ago (Updated 7 hours ago) • Actively hiring

Expires 6/29/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
99
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Job Description Everforth ECS is seeking a Monitoring & Telemetry Lead SME to work in the National Capital Region covering the Pentagon, Falls Church, and Fairfax .
Please Note:
This position is contingent upon contract award. The War Data Platform (WDP) is a key initiative within the U.S. Department of War's (DoW) AI-First strategy introduced in early 2026. The WDP focuses on operational warfighting data and aims to accelerate the deployment of artificial intelligence (AI) on the battlefield. The WDP extends to Unclassified, Secret, and Top Secret environments, and supports collaboration between Combatant Commands, Joint Staff directorates, Senior Executive Service leaders, and operational analysts. This role defines, architects, and governs telemetry, observability, and service-level indicator frameworks supporting AI and machine learning model-serving operations across all WDP classification enclaves, ensuring enterprise-wide operational visibility, mission assurance alignment, and resilient monitoring of AI/ML-serving infrastructure and API ecosystems. Defines, architects, and governs telemetry, observability, and service-level indicator frameworks supporting AI and machine learning model-serving operations across Unclassified, Secret, and Top Secret enclaves within the War Data Platform (WDP) Core Integration enterprise. Establishes monitoring standards, performance indicators, traceability conventions, and health signal baselines for serving application programming interfaces, reverse proxies, model zoo interfaces, and external provider integrations supporting Combatant Commands, Joint Staff elements, and Senior Executive Service decision makers. Designs and integrates instrumentation patterns within deployment pipelines, runtime environments, service meshes, and logging and auditing frameworks to provide immediate operational visibility following production releases. Implements structured metrics pipelines using platforms such as Prometheus, Grafana, OpenTelemetry, Elastic, Splunk, and DoW-approved monitoring suites to capture latency, throughput, error rates, dependency bottlenecks, cross-domain access behavior, and cyber-relevant anomalies. Conducts telemetry readiness reviews, evaluates instrumentation completeness, and validates monitoring coverage for emerging serving endpoints, model artifacts, and external model provider interfaces. Coordinates with model-serving engineers, API engineers, DevSecOps teams, pipeline operators, cybersecurity personnel, and platform architects to integrate observability with test and evaluation gates, reliability assessments, and mission assurance workflows. Produces operational dashboards, service-level definitions, instrumentation standards, alerting policies, runbooks, and observability assessment reports that strengthen reliability, accelerate incident triage, and elevate mission readiness across all enclaves. Advances War Data Platform (WDP) Core Integration program value by delivering resilient, measurable, and domain-compliant monitoring capabilities for enterprise AI/ML model access. Performs other duties as assigned. Required Skills Current Secret security clearance with the ability to obtain and maintain a Top Secret (TS) security clearance with Sensitive Compartmented Information (SCI). CompTIA A+ certification. Minimum 12 years of experience designing, implementing, and governing enterprise monitoring, telemetry, and observability frameworks across multi-domain or classified environments. Demonstrated hands-on expertise with monitoring and observability platforms, including Prometheus, Grafana, OpenTelemetry, Elastic, and Splunk, with proven ability to architect structured metrics pipelines and operational dashboards in production environments. Experience integrating observability frameworks with DevSecOps pipelines, service meshes, and AI/ML model-serving infrastructure to ensure real-time operational visibility and mission-assurance alignment. Strong problem-solving and decision-making capabilities, with a proven ability to weigh the relative costs and benefits of potential actions and identify the most appropriate solution. Highly developed interpersonal and oral/written communication skills, with the ability to effectively and professionally interact with a diverse set of stakeholders (from peers to end-users to executive management). Desired Skills Active Top Secret (TS) security clearance with Sensitive Compartmented Information (SCI) eligibility. Experience applying AI-powered anomaly detection and predictive analytics to monitoring operations, including automated incident response and proactive risk identification within enterprise telemetry frameworks. Familiarity with Service Level Objective (SLO) and Quality Management Plan (QMP) performance frameworks as applied to AI/ML platform operations, including experience defining and iteratively refining SLOs in collaboration with mission partners and engineering teams. Background in cybersecurity-integrated monitoring, including cross-domain access behavior analysis, DLP/ABAC telemetry, and audit-artifact generation using platforms such as Splunk and Tenable in support of compliance with DoDI 8510.01, CNSSI 1253, and FISMA. Prior experience developing and delivering observability runbooks, instrumentation standards, and training materials that elevate monitoring proficiency across multidisciplinary platform engineering teams. Experience in developing synthetic monitoring scripts. Experience defining and calculating MTTD and
MTTR. ECS
Federal LLC is an equal opportunity employer and does not discriminate or allow discrimination on the basis any characteristic protected by law. All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, or local jurisdiction law. Everforth ECS is the federal segment of Everforth , a $4B global organization with over 10,000 employees. Our nearly 3,500 professionals deliver advanced technology solutions in data and AI, cybersecurity, and enterprise transformation, serving defense, intelligence, and federal civilian agencies. Our work powers mission-critical outcomes, strengthens technology partnerships, and creates meaningful opportunities for our people. We are defined by a commitment to excellence in delivery, a culture of innovation, and an environment where talent can thrive and grow.
We value:
Attracting and developing top talent and high-performing teams Fostering a culture that is engaging, accountable, and mission-driven