System Debug Engineer Manager, Cloud AI Infrastructure
Kirkland, WA (In Person)
$235,000 Salary, Full-Time
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Benefits for this role include:
Health, dental, vision, life, disability insuranceRetirement Benefits:
401(k) with company matchPaid Time Off:
20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employmentSick Time:
40 hours/year (increased to 69 hours/year for Seattle) including 5 discretionary sick days per instance Maternity Leave (Short-Term Disability + Baby Bonding): 28-30 weeksBaby Bonding Leave:
18 weeksHolidays:
13 paid days per yearNote:
By applying to this position you will have an opportunity to share your preferred working location from the following: Kirkland, WA, USA; Austin, TX, USA .Minimum qualifications:
Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience. 8 years of experience with system design. 5 years of experience managing or leading a team. 5 years of experience with managing technical work, engineering strategy, and roadmaps. 5 years of experience with hardware debug (silicon debug, platform debug, IO interface, memory analysis). 3 years of experience with organizational design.Preferred qualifications:
5 years of experience working with vendors or customers. 3 years of experience with leadership development and career growth of employees. 3 years of experience in analyzing and troubleshooting distributed systems. 2 years of CPU, dGPU, or TPU debug or validation experience. Understanding of memory and high-speed IO technologies. About the job Systems Development Engineering (SDE) at Google is a role where you manage services and systems at scale. SDEs creatively put their engineering discipline to use automating the mundane and reducing toil. We don't just write code to fix bugs, but emphasize the development of tools and solutions that fix classes of problems. We know it's hard to control what you can't measure - so we focus on observability: instrumenting first, then turning data into knowledge, and finally knowledge into action. We know that the operational efficiency of Google systems, services, virtual compute environments and the operating systems that power them impact the environment, not just the bottom line. We know that working together we can do more, and that community matters. Google brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. Together we engineer and build the infrastructure, tools, access and telemetry for systems that enable orchestration of Google-scale services. Come build things that matter. As a part of the Google Cloud Support team, you will ensure customers maximize their investment. As a Systems Debug Engineer, you will be a trusted advisor driving hardware understanding and issue resolution. You will troubleshoot platform challenges, providing expert solutions that enable innovation. You will represent the customer, collaborate with engineering and product teams to drive continuous improvement across global cloud products and services.Google Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems. The US base salary range for this full-time position is $192,000-$278,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google. Responsibilities Drive technical team performance across on-call activities and system management by delivering leadership, mentorship, and career development while collaborating with primary responders to address system issues. Debug platform hardware, silicon, and AI/ML workloads to drive root-cause resolution, develop permanent infrastructure improvements, and build tools for faster diagnosis through troubleshooting and reproduction. Collaborate cross-functionally with Product, Quality, and Engineering teams to ehance product outcomes, and engage with Site Reliability Engineering (SRE) teams to ensure high-quality production and reliability. Resolve customer challenges on AI/ML infrastructure through effective diagnosis, resolution, and the implementation of investigation tools to increase productivity for critical reported issues. Serve as a consultant and subject matter expert for internal stakeholders to resolve deployment and operational obstacles across AI infrastructure environments daily. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
Similar remote jobs
Maximus
Pierre, SD
Posted2 days ago
Updated18 hours ago
Under Armour, Inc.
Little Rock, AR
Posted2 days ago
Updated18 hours ago
Similar jobs in Kirkland, WA
Freeway Insurance Services, Inc.
Kirkland, WA
Posted2 days ago
Updated18 hours ago
A Place at Home Kirkland
Kirkland, WA
Posted2 days ago
Updated18 hours ago
BELLA BRASIL MARKET PLUES
Kirkland, WA
Posted2 days ago
Updated18 hours ago
Toyota of Kirkland
Kirkland, WA
Posted2 days ago
Updated18 hours ago
Similar jobs in Washington
Cardinal Health
Olympia, WA
Posted2 days ago
Updated18 hours ago
Smurfit Westrock plc (fmr Smurfit Kappa Group)
Washington
Posted2 days ago
Updated18 hours ago
Washington State University
Pullman, WA
Posted2 days ago
Updated18 hours ago
Costco Wholesale Corporation
Gig Harbor, WA
Posted2 days ago
Updated18 hours ago