Tallo logoTallo logo

System Debug Engineer Manager, Cloud AI Infrastructure

Job

Google

Kirkland, WA (In Person)

$235,000 Salary, Full-Time

Posted 5 days ago (Updated 2 days ago) • Actively hiring

Expires 6/6/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
84
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

System Debug Engineer Manager, Cloud AI Infrastructure corporate_fare Google place Kirkland, WA, USA ; Austin, TX, USA bar_chart Advanced Advanced Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders; deep expertise in domain. info_outline X In accordance with Washington state law, we are highlighting our comprehensive benefits package, which is available to all eligible US based employees.
Benefits for this role include:
Health, dental, vision, life, disability insurance
Retirement Benefits:
401(k) with company match
Paid Time Off:
20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment
Sick Time:
40 hours/year (increased to 69 hours/year for Seattle) including 5 discretionary sick days per instance Maternity Leave (Short-Term Disability + Baby Bonding): 28-30 weeks
Baby Bonding Leave:
18 weeks
Holidays:
13 paid days per year
Note:
By applying to this position you will have an opportunity to your preferred working location from the following: Kirkland, WA, USA; Austin, TX, USA .
Minimum qualifications:
Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience. 8 years of experience with system design. 5 years of experience managing or leading a team. 5 years of experience with managing technical work, engineering strategy, and roadmaps. 5 years of experience with hardware debug (silicon debug, platform debug, IO interface, memory analysis). 3 years of experience with organizational design.
Preferred qualifications:
5 years of experience working with vendors or customers. 3 years of experience with leadership development and career growth of employees. 3 years of experience in analyzing and troubleshooting distributed systems. 2 years of CPU, dGPU, or TPU debug or validation experience. Understanding of memory and high-speed IO technologies. About the job The US base salary range for this full-time position is $192,000-$278,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about . Responsibilities Drive technical team performance across on-call activities and system management by delivering leadership, mentorship, and career development while collaborating with primary responders to address system issues. Debug platform hardware, silicon, and AI/ML workloads to drive root-cause resolution, develop permanent infrastructure improvements, and build tools for faster diagnosis through troubleshooting and reproduction. Collaborate cross-functionally with Product, Quality, and Engineering teams to ehance product outcomes, and engage with Site Reliability Engineering (SRE) teams to ensure high-quality production and reliability. Resolve customer challenges on AI/ML infrastructure through effective diagnosis, resolution, and the implementation of investigation tools to increase productivity for critical reported issues. Serve as a consultant and subject matter expert for internal stakeholders to resolve deployment and operational obstacles across AI infrastructure environments daily.

Similar remote jobs

Similar jobs in Kirkland, WA

Similar jobs in Washington