Production Support/Incident Management Position Available In Essex, New Jersey
Tallo's Job Summary: This job listing in Essex - NJ has been recently added. Tallo will add a summary here for this job shortly.
Job Description
Production Support/Incident Management JSR Tech Consulting
- 4.0 Newark, NJ Job Details $70
- $80 an hour 13 hours ago Qualifications CI/CD Management Software troubleshooting 5 years Mid-level AWS Release management Splunk Scripting Software development ServiceNow Financial services Leadership Communication skills Full Job Description Newark-based, hybrid RTH Payrate 70
- 80 per hour
Job Description:
We are seeking an experienced Technical Business Analyst to manage and enhance the stability, performance and availability of our client facing applications. This role requires a proactive leader who can guide a dedicated support team, collaborate with engineering teams, and effectively manage incidents to minimize downtime, improve user experience and communicate with stakeholders.
Key Responsibilities:
Incident Management and Resolution:
Oversee the triage, investigation and resolution of production issues, ensuring timely communication and status updates Manage incident response efforts, including documentation and root cause analysis and post-incident reviews to identify preventative actions Establish clear escalation protocols and ensure adherence to serve level agreements (SLAs) Coordinate resolution and follow ups with dependencies outside immediate team Coordinate KTs between development teams and L1/L2 triage to establish runbooks and knowledge base
Team Leadership and Coordination:
Coordinate with development, QA, and infrastructure teams to ensure seamless issue resolution and knowledge sharing Foster a strong ownership mindset within the team, ensuring accountability for system health and stability Monitoring and Alerting Define and maintain effective monitoring solutions in partnership with development teams to proactively identify and address potential issues Continuously improve observability by implementing dashboards, alerts and automated health checks in partnership with development teams Process and Documentation Develop and maintain detailed runbooks, SOPs and knowledge base articles to ensure consistent response procedures Establish best practices for incident response, including communication templates and decision frameworks
Stakeholder Communication:
Serve as the primary point of contact for production issues affecting client experiences Provide clear, concise updates to leadership, internal teams and clients during incidents and post-incident reviews. Continuous Improvement Identify patterns in recurring incidents and partner with development teams to implement permanent fixes Drive initiatives to enhance system reliability, scalability, and performance.
Qualifications and Skills:
Proven experience in a production support leadership role for client facing applications Strong understanding of incident management frameworks Proficiency in troubleshooting application, database, and infrastructure issues Familiarity with monitoring tools such Dynatrace, Datadog , Splunk etc Familiarity with incident management platforms such as ServiceNow Ability to prioritize tasks effectively, and communicate technical concepts to non technical stakehodlers Excellent problem solving skills and a calm, solution-focused approach under pressure Experience working in AWS Familiarity with CI/CD pipelines and release management processes
Preferred:
Background in software development or scripting for automation Previous experience in the financial services industry
Success Metrics MTTA:
Mean time to acknowledge
MTTR:
Mean time to resolve Stakeholder satisfaction with incident communication Knowledge base usage rate and coverage Number of issues handed over to
L1/L2, EMKT
teams Measure # of system identified vs user reported alerts and trends over time Enhancements and alerts requested Minimize # of user reported incidents Measure incidents resolved with L1/L2 without app support team Reduction in resolution times due to documented processes #LI-JV1