Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

Observability Engineer

Job

Bayforce

Remote

Full-Time

Posted 1 week ago (Updated 6 days ago) • Actively hiring

Expires 6/28/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Role Title:

Observability Engineer Employment Type:

Contract Duration:

6 Months (Potential Extensions)

Location:

Cleveland, OH area - Hybrid (4 days onsite / 1 day remote) About the Role We are seeking an experienced Observability Engineer to support and expand a centralized enterprise observability platform. This initiative is focused on building a true single pane of glass monitoring environment using modern telemetry and monitoring technologies including Prometheus, Grafana, and Loki. The current environment captures approximately 50% of server telemetry and is now evolving to include cross-domain observability across infrastructure, applications, databases, storage, and business transaction data. Long-term goals include enabling AI/ML-driven anomaly detection and intelligent root-cause analysis. This is an opportunity to play a key role in building an enterprise-wide operational intelligence platform. Responsibilities Expand telemetry ingestion across infrastructure, databases, storage platforms, applications, and network environments Assist with onboarding remaining systems and extending monitoring beyond traditional OS metrics Build and enhance Grafana dashboards that correlate infrastructure health with application performance and business transaction metrics Develop and maintain synthetic monitoring scripts using Playwright or similar tools to simulate critical user journeys Configure and optimize alerting workflows using Alertmanager and Loki Improve signal-to-noise ratio and reduce alert fatigue through better event management practices Establish and maintain telemetry labeling standards and data quality practices Support troubleshooting, root-cause analysis, and operational documentation efforts Partner with engineering and infrastructure teams to drive observability best practices across the enterprise Required Qualifications Hands-on experience with: Prometheus Grafana Loki Alertmanager Strong experience writing PromQL queries and building Grafana dashboards Experience designing or supporting enterprise observability and monitoring platforms Ability to collect and normalize telemetry across: Servers Databases Storage environments Networks Applications Experience with synthetic monitoring tools such as Playwright or Selenium Strong Linux command-line experience Experience editing and managing YAML and JSON configuration files Knowledge of alert routing, escalation workflows, and reducing alert fatigue Understanding of telemetry standards, labeling strategy, and data hygiene practices Strong troubleshooting and analytical skills Preferred Qualifications Oracle and SQL database experience Experience with SNMP, network flow data, or infrastructure performance monitoring Exposure to AI/ML-based observability or anomaly detection initiatives This role offers the opportunity to help shape the future of enterprise monitoring and observability while working on high-impact initiatives supporting large-scale infrastructure and application environments.