- Responsible for being a single source of status information and delivering clear, timely, and accurate business and technical communications across the organization, including technology and business stakeholders.
- Confidently lead the recovery of high-profile, major, and crisis technology incidents within vast, complex environments, including disaster recovery scenarios. Focusing on service restoration and minimum disruption using various methodologies and management techniques.
- Take responsibility for governing the incident management, problem management, and post-incident review processes end-to-end, collaborating with Service Management peers, Site Reliability Engineering (SRE), business and technology teams, and external vendors to ensure all KPIs are met and a high standard of management and reporting is consistently achieved.
- Manage post-incident reviews to maximize learnings from major incidents.
- Collaborate with the Site Reliability Engineering (SRE) team to enable effective engineering analysis and the evolution of incident management and post-incident review processes to ensure ongoing refinement and valuable outputs.
- Provide comprehensive incident, problem, and post-incident review reports to all required audiences.
- Manage major incidents and Problems with focus and urgency to mitigate contributing factors and pursue the completion of relevant improvement actions.
- Build and maintain customer and stakeholder relationships.
- Support the Service Management team’s obligation to process ownership, operations, and governance.