Tallo logoTallo logo

Reliability & Observability Analyst I

Job

IREN

Fort Worth, TX (In Person)

Full-Time

Posted 3 days ago (Updated 10 hours ago) • Actively hiring

Expires 6/9/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Reliability & Observability Analyst I IREN Dallas-Fort Worth, TX Job Details Full-time 7 hours ago Benefits Health savings account Paid holidays Disability insurance Health insurance Dental insurance Financial planning services Paid time off Employee assistance program Vision insurance 401(k) matching Professional development assistance Life insurance Qualifications Performance dashboard reports Data Center Operations Computer science Data center experience Cloud infrastructure Statistics Computer Science Automation Bachelor's degree in statistics Technical documentation IT system monitoring Dashboard creation Data analysis skills Data reporting AIOps Technical support Bash Analysis skills Outlier detection Bachelor's degree SRE Data quality monitoring Splunk Computer networking Linux Prometheus Data validation 1 year Cross-functional collaboration Log analysis
Full Job Description Job Type:
Full-Time l
Location:
Dallas / Fort Worth, TX l
Department:
Operations l Reporting to: Data Center Manager |
Work Location Type:
#onsite IREN is a leading next-generation data center business powering the future with 100% renewable energy. We build, own and operate our data centers and take pride in being at the forefront of sustainable solutions for the ever-evolving applications of high-performance computers. We believe that human progress is invaluable, but it should be done in the right way - responsibly, sustainably and having a positive impact on the communities we operate in. With 100% renewable energy, we build, own and operate our data centers and take pride in being at the forefront of sustainable solutions for the ever-evolving applications of high-performance compute. We believe that human progress is invaluable, but it should be done in the right way - responsibly, sustainably and having a positive impact on the communities we operate in. We are seeking an IOC Reliability & Observability Analyst I with a strong reliability, observability, and automation mindset to support our 24/7 HPC Data Center Operations. The role focuses on analyzing operational signals, improving incident quality, and supporting AIOps enabled automation and tooling and is designed for candidates early in their careers who want to grow into Site Reliability, Infrastructure Operations, or Platform Engineering paths. This is an entry‑level (Level 1) IOC role focused on operational analysis, data quality, and reliability signal validation rather than system design or engineering ownership. You will support IOC, engineering, and operations teams by analyzing incidents, validating operational signals, and identifying opportunities to improve detection quality and operational reliability under established processes and guidance. 1-3 years of experience in
IOC, NOC, SRE
‑adjacent operations, systems analysis, or technical support roles Bachelor's degree in Computer Science, Data Science, Statistics, or equivalent hands-on experience Exposure to 24/7 production environments supporting infrastructure, cloud, or data center operations Foundational awareness of SRE concepts such as service health, MTTR/MTTD, and the incident lifecycle, with the ability to apply these concepts in operational analysis. Working knowledge of Linux-based systems, basic networking concepts, and infrastructure dependencies Experience working with metrics, logs, and alerting systems across infrastructure or application environments Familiarity with observability platforms (e.g., Splunk, Datadog, Prometheus-style metrics) Ability to assess alert quality, identify noise, and recognize monitoring gaps Awareness of AIOps concepts such as anomaly detection, event correlation, and alert noise reduction, primarily for the purpose of reviewing and validating automated insights Experience validating automated insights and supporting alerting or observability automation Ability to read automation artifacts (Python, Bash, or configuration-based workflows) and assist with minor updates under documented procedures and guidance Ability to analyze incident trends and system behaviors with strong attention to data accuracy, signal integrity, and identify recurring issues or improvement opportunities Clear communication skills and comfort working cross-functionally with operations and engineering teams Other important requirements This role operates in a 24×7 IOC/NOC environment and works 12‑hour rotating shifts on a 4‑days‑on / 3‑days‑off, alternating with 3‑days‑on / 4‑days‑off schedule Pre-employment screening, including background check and substance testing may be required according to company policies Analyze incident data, system behaviors, and operational signals across GPU clusters, networks, and facilities to identify risks and trends Identify detection gaps, alert delays, false positives, and under-monitored systems, and document findings for review by IOC leadership or engineering teams Validate ticketing and incident data for accuracy, completeness, and reporting integrity Support continuous improvement of observability by evaluating metrics, logs, alerts, and dashboards Assist in refining operational views focused on service health, reliability, and signal quality Generate post-incident insights highlighting trends, risks, and improvement opportunities Support AIOps-enabled capabilities by reviewing outputs from anomaly detection, alert correlation, and event clustering, and flagging accuracy or data-quality issues Validate automated insights and escalate tuning or accuracy issues to IOC and engineering teams Assist with testing automation related to alert routing, enrichment, and suppression, and submit recommended changes through established change and review processes Produce and maintain SLA/KPI dashboards and reliability reports using established templates, definitions, and data sources Provide data-driven insights and recommendations to inform preventive measures, workflow improvements, and monitoring enhancements Contribute to runbook updates, operational documentation, and reliability initiatives in partnership with IOC and engineering teams Develop foundational SRE skills in preparation for expanded operational responsibility This role operates under defined IOC processes and supervision, with increasing responsibility as skills and experience develop At IREN, we offer a comprehensive, market-competitive total rewards package designed to support employees' well-being, career advancement, and financial wealth. Our offerings reflect our commitment to Proceed with Purpose while rewarding high performance and long-term growth. Compensation Actual compensation will be determined based on factors such as experience, qualifications. Overtime compensation for non-exempt workers for hours worked over 40 per week Health & Wellness 100% company paid health insurance premiums (medical, dental, and vision) for employees, 75% company paid coverage for dependents Company-paid short-term and long-term disability insurance Voluntary life, critical illness, and accident coverage available Health Savings Accounts (HSA) - when combined with the High-Deductible Health Plan Employee Assistance Program and wellness resources Retirement & Financial Wealth 401(k) retirement plan with company match Paid professional development and access to financial planning and legal services Time Off & Leave Programs Paid Time Off (PTO) and paid holidays Growth & Development Professional development to support certifications, continuing education, or role related training Community & Culture Company events and team-building activities We value diverse perspectives and believe that skills can be developed. If you're passionate about this role, we want to hear from you — whether you meet every criteria or not. Your unique experiences might be exactly what we need! IE US Operations Inc., the employing entity and proud member of the IREN group is an equal opportunity employer that is committed to creating an inclusive workplace. We are committed to evaluating qualified applicants and do not discriminate against protected characteristics under applicable legislation. We participate in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S. E-Verify Participation Notice . By applying for this position and submitting your resume and application materials, you consent to the processing of your personal information in accordance with our Job Applicant Privacy Statement available on our website at www.iren.com .

Similar remote jobs

Similar jobs in Fort Worth, TX

Similar jobs in Texas