Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

L1 SRE Operations Engineer

Job

Intone Networks, Inc

DFW Airport, TX (In Person)

Full-Time

Posted 6 weeks ago (Updated 1 week ago) • Actively hiring

Expires 7/13/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

REQ-00050385L1 SRE

Operations EngineerDallas-Fort Worth}Contract

Duration:

12 Months The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/GCP). They will follow runbooks, leverage automation, and escalate appropriately to minimize downtime. Skills Mandatory Skills (Must-Have) 1. System & Infrastructure Monitoring

Expectation:

Ability to use monitoring dashboards (e.g., Grafana, Datadog, Splunk, Argos, AIOps) to identify anomalies, follow alert workflows, and escalate when thresholds are breached.

Example:

When a Kubernetes pod crash-loop is flagged in Prometheus, L1 should validate it against runbooks, check pod logs, and escalate if restart attempts fail. 2. Runbook Execution

Expectation:

Strictly follow documented steps to resolve standard incidents, escalate when steps do not apply or fail.

Example:

Use a provided runbook to restart a failed API proxy service; if error persists beyond documented steps, escalate to L2. 3. Incident Triage & Communication

Expectation:

Perform first-line triage of alerts, gather logs/metrics, categorize severity, and notify stakeholders in clear, concise language.

Example:

For a database connection timeout, collect error logs, verify service reachability, andprovide a detailed incident note to L2 before escalation. 4. Kubernetes (Cloud or onprem) operations knowledge

Expectation:

Ability to check pod status, understand logs, and verify service endpoints using kubectl and monitoring tools.

Example:

Run kubectl get pods -n to verify if deployments are healthy. 5. Scripting (Python, Bash, PowerShell)

Expectation:

Able to read and make small edits to scripts to automate repetitive checks.

Example:

Modify a Bash script to include an additional log path in a health check. 6. Networking & Security Awareness

Expectation:

Understand troubleshooting (ping, netstat, curl, traceroute) and know when issue smay be related to firewall, WAF, or proxy.

Example:

For an unreachable service, confirm DNS resolution and connectivity before escalating toL2. 7. Documentation & Knowledge Capture

Expectation:

Accurately record steps taken during incidents, suggest runbook updates where gaps exist.

Example:

After handling an alert for disk usage, note missing cleanup steps in the runbook and flag for update. Preferred Skills (Nice-to-Have) 1. Cloud Platform Familiarity (AWS, Azure, GCP)

Expectation:

Understand basics of cloud services (VMs, load balancers, storage) and how to navigate a cloud console.

Example:

Use AWS Console to check EC2 instance health status when a service alert is triggered. 2.Database Basics (SQL/NoSQL)

Expectation:

Run simple queries to validate DB connectivity and health.

Example:

Execute

SELECT 1

; to verify a database is reachable. 3. Automation & Self-Service Mindset

Expectation:

Identify repetitive manual steps and propose candidates for automation.

Example:

Flag that manual log collection during outages could be replaced with a script. 4. Exposure to Incident Management Tools (xMatters, ServiceNow, Jira, etc.)

Expectation:

Comfortable working within ITSM/incident workflows.

Example:

Log incident details in ServiceNow with accurate categorization and timestamps. 5. AI/Chatbot-Assisted Ops (emerging skill)

Expectation:

Use AI assistants to search runbooks or suggest remediation steps.

Example:

Ask an AI ops assistant to summarize logs before escalation. Less Posted on: 05-07-2026 Posted by: Kapil Dhar