IoT Cloud Application Architect (Onsite)
New York Technology Partners
Roswell, GA (In Person)
Full-Time
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
This is not a feature development role . The EMS IoT Lead will own system-level triage, debugging, RCA, and resolution across the IoT ecosystem device, firmware, cloud, and mobile with a strong focus on US-time incident response and platform stability . We are specifically looking for candidates who: Can own end-to-end system behavior across device firmware cloud mobile Have strong experience in debugging production systems , performing deep RCA , and driving fixes to closure Are comfortable identifying the right resolution strategy (hotfix, config change, rollback, or escalation) Can work closely with offshore engineering teams to drive execution Can communicate clearly and consistently with Customer Support, NOC, and Marketing teams during incidents Have hands-on exposure to IoT and cloud ecosystems (ClearBlade, AWS, MQTT, Datadog, MongoDB, etc.) Ideal candidate profile Please prioritize candidates who have grown from Senior Engineer / Tech Lead into Architect or System Owner roles , with real-world experience handling production incidents, platform stability, and cross-functional coordination . About the Role We are seeking an experienced EMS IoT Lead / Architect to provide system-level technical ownership for a large-scale connected IoT ecosystem. This role is responsible for platform stability, production issue resolution, and continuous improvement across devices, firmware, cloud services, mobile applications, and integrations . The EMS IoT Lead operates as the primary engineering authority during production incidents , leading triage, debugging, root cause analysis (RCA), fix strategy selection, and closure coordination. The role requires close collaboration with engineering, operations, customer support, and customer-facing teams to ensure reliable and consistent user experiences. Key Responsibilities System-Level IoT Ownership Own end-to-end technical accountability across the full IoT stack: Device connectivity and telemetry Firmware behavior and state management Cloud services and data pipelines Mobile applications and APIs Understand and debug end-to-end connectivity flows from device and firmware through cloud platforms to mobile applications Diagnose issues related to connectivity failures, message loss, latency, retries, state synchronization, and data inconsistencies Prioritize issues based on customer impact, severity, and recurrence , not component boundaries Incident Management & US-Time Response Act as the primary engineering escalation point during US business hours Lead real-time investigation and response for: Production incidents NOC escalations Customer-facing issues Evaluate and select the most appropriate resolution strategy , including: Hotfixes Configuration changes Rollbacks Permanent code fixes Drive rapid mitigation to stabilize incidents while minimizing customer impact Debugging, RCA & Resolution Leadership Lead deep debugging and root cause analysis across distributed systems Analyze logs, telemetry, metrics, and traces across device, cloud, and application layers Determine whether issues can be resolved via: Tactical fixes Operational or configuration changes Architectural or design changes Drive fixes to completion , coordinating development, validation, deployment, and verification until issues are fully resolved in production Ensure all resolved issues include clear RCA documentation and corrective actions Cross-Functional & Offshore Team Collaboration Work closely with: Cloud engineering teams Mobile engineering teams Firmware and platform teams Collaborate with offshore engineering teams , providing: Clear RCA context Technical direction Execution priorities Enable effective follow-the-sun execution while maintaining ownership and continuity Customer Support & Stakeholder Communication Partner closely with Customer Support and NOC teams during incidents and escalations Communicate issue status, impact, and resolution progress clearly and consistently Coordinate with Marketing and customer-facing teams to support accurate and aligned customer messaging during incidents or service degradations Ensure timely and transparent communication throughout the issue lifecycle Escalation & Governance Escalate issues to core engineering or product teams only when they cannot be resolved through EMS Prepare high-quality escalation packages, including: Completed RCA Reproduction steps Impact assessment Design or architectural considerations Maintain tracking and visibility of escalated issues through closure Process Improvement & Platform Stability Establish and enforce standards for: Issue intake quality Triage consistency RCA documentation Closure and communication Analyze trends and recurring issues to identify systemic risks Drive continuous improvements to reduce incident frequency and improve platform reliability Technology Environment The role requires hands-on familiarity with modern IoT and cloud platforms, including: Cloud & Platform AWS (compute, networking, deployments, monitoring) ClearBlade IoT Platform Datadog (logs, metrics, tracing, incident analysis) MongoDB or similar NoSQL databases IoT & Messaging MQTT-based device communication Device telemetry and command/control patterns Understanding of firmware interaction and device lifecycle concepts Applications & APIs RESTful and event-driven APIs Mobile application interaction with cloud platforms Familiarity with iOS/Android release lifecycles and crash analysis concepts Qualifications Required Bachelor s degree in Computer Science, Engineering, or related field 10+ years of experience in IoT platforms, distributed systems, or cloud-native architectures Proven experience in: Production incident management Debugging complex system issues Root cause analysis Strong communication and decision-making skills under pressure Preferred Experience with large-scale IoT or connected device platforms Background in telecom, industrial IoT, or consumer electronics ecosystems Experience working with NOC, customer support, and customer-facing teams Success Metrics Success in this role will be measured by: Reduction in incident resolution time (MTTR) Improved platform stability and reliability Reduction in recurring and systemic issues Quality of RCA and fix execution Effective collaboration across engineering, operations, and support teams Why This Role Matters This role plays a critical part in ensuring: Reliable and consistent customer experiences Stable and scalable IoT platform operations Engineering teams can focus on innovation while production stability is maintained This position is ideal for a senior technologist who thrives in system ownership, problem-solving, and operational leadership within complex IoT environments. This is not a feature development role . The EMS IoT Lead will own system-level triage, debugging, RCA, and resolution across the IoT ecosystem device, firmware, cloud, and mobile with a strong focus on US-time incident response and platform stability . We are specifically looking for candidates who: Can own end-to-end system behavior across device firmware cloud mobile Have strong experience in debugging production systems , performing deep RCA , and driving fixes to closure Are comfortable identifying the right resolution strategy (hotfix, config change, rollback, or escalation) Can work closely with offshore engineering teams to drive execution Can communicate clearly and consistently with Customer Support, NOC, and Marketing teams during incidents Have hands-on exposure to IoT and cloud ecosystems (ClearBlade, AWS, MQTT, Datadog, MongoDB, etc.) Ideal candidate profile Please prioritize candidates who have grown from Senior Engineer / Tech Lead into Architect or System Owner roles , with real-world experience handling production incidents, platform stability, and cross-functional coordination . About the Role We are seeking an experienced EMS IoT Lead / Architect to provide system-level technical ownership for a large-scale connected IoT ecosystem. This role is responsible for platform stability, production issue resolution, and continuous improvement across devices, firmware, cloud services, mobile applications, and integrations . The EMS IoT Lead operates as the primary engineering authority during production incidents , leading triage, debugging, root cause analysis (RCA), fix strategy selection, and closure coordination. The role requires close collaboration with engineering, operations, customer support, and customer-facing teams to ensure reliable and consistent user experiences. Key Responsibilities System-Level IoT Ownership Own end-to-end technical accountability across the full IoT stack: Device connectivity and telemetry Firmware behavior and state management Cloud services and data pipelines Mobile applications and APIs Understand and debug end-to-end connectivity flows from device and firmware through cloud platforms to mobile applications Diagnose issues related to connectivity failures, message loss, latency, retries, state synchronization, and data inconsistencies Prioritize issues based on customer impact, severity, and recurrence , not component boundaries Incident Management & US-Time Response Act as the primary engineering escalation point during US business hours Lead real-time investigation and response for: Production incidents NOC escalations Customer-facing issues Evaluate and select the most appropriate resolution strategy , including: Hotfixes Configuration changes Rollbacks Permanent code fixes Drive rapid mitigation to stabilize incidents while minimizing customer impact Debugging, RCA & Resolution Leadership Lead deep debugging and root cause analysis across distributed systems Analyze logs, telemetry, metrics, and traces across device, cloud, and application layers Determine whether issues can be resolved via: Tactical fixes Operational or configuration changes Architectural or design changes Drive fixes to completion , coordinating development, validation, deployment, and verification until issues are fully resolved in production Ensure all resolved issues include clear RCA documentation and corrective actions Cross-Functional & Offshore Team Collaboration Work closely with: Cloud engineering teams Mobile engineering teams Firmware and platform teams Collaborate with offshore engineering teams , providing: Clear RCA context Technical direction Execution priorities Enable effective follow-the-sun execution while maintaining ownership and continuity Customer Support & Stakeholder Communication Partner closely with Customer Support and NOC teams during incidents and escalations Communicate issue status, impact, and resolution progress clearly and consistently Coordinate with Marketing and customer-facing teams to support accurate and aligned customer messaging during incidents or service degradations Ensure timely and transparent communication throughout the issue lifecycle Escalation & Governance Escalate issues to core engineering or product teams only when they cannot be resolved through EMS Prepare high-quality escalation packages, including: Completed RCA Reproduction steps Impact assessment Design or architectural considerations Maintain tracking and visibility of escalated issues through closure Process Improvement & Platform Stability Establish and enforce standards for: Issue intake quality Triage consistency RCA documentation Closure and communication Analyze trends and recurring issues to identify systemic risks Drive continuous improvements to reduce incident frequency and improve platform reliability Technology Environment The role requires hands-on familiarity with modern IoT and cloud platforms, including: Cloud & Platform AWS (compute, networking, deployments, monitoring) ClearBlade IoT Platform Datadog (logs, metrics, tracing, incident analysis) MongoDB or similar NoSQL databases IoT & Messaging MQTT-based device communication Device telemetry and command/control patterns Understanding of firmware interaction and device lifecycle concepts Applications & APIs RESTful and event-driven APIs Mobile application interaction with cloud platforms Familiarity with iOS/Android release lifecycles and crash analysis concepts Qualifications Required Bachelor s degree in Computer Science, Engineering, or related field 10+ years of experience in IoT platforms, distributed systems, or cloud-native architectures Proven experience in: Production incident management Debugging complex system issues Root cause analysis Strong communication and decision-making skills under pressure Preferred Experience with large-scale IoT or connected device platforms Background in telecom, industrial IoT, or consumer electronics ecosystems Experience working with NOC, customer support, and customer-facing teams Success Metrics Success in this role will be measured by: Reduction in incident resolution time (MTTR) Improved platform stability and reliability Reduction in recurring and systemic issues Quality of RCA and fix execution Effective collaboration across engineering, operations, and support teams Why This Role Matters This role plays a critical part in ensuring: Reliable and consistent customer experiences Stable and scalable IoT platform operations Engineering teams can focus on innovation while production stability is maintained This position is ideal for a senior technologist who thrives in system ownership, problem-solving, and operational leadership within complex IoT environments.
Similar remote jobs
Wells Fargo
Chandler, AZ
Posted2 days ago
Updated6 hours ago
Similar jobs in Roswell, GA
New York Technology Partners
Roswell, GA
Posted2 days ago
Updated6 hours ago
Koons
Roswell, GA
Posted2 days ago
Updated6 hours ago
5000 Wellstar Medical Group, LLC
Roswell, GA
Posted2 days ago
Updated6 hours ago
Similar jobs in Georgia
Coweta County School System
Sharpsburg, GA
Posted2 days ago
Updated6 hours ago