Expert Site Reliability Engineer at Confluent

IBM
Full-time toronto, on Engineering
Posted:
June 16, 2026
Location:
toronto, on, Canada

Job Description

Make a significant impact at Confluent as an Expert Site Reliability Engineer focused on incident management and reliability enhancements. You'll work within a multi-cloud architecture to optimize performance and reliability.
This expert role blends 75% technical engineering with 25% strategy, involving the analysis of systemic failure patterns, designing reliability frameworks, and teaching best practices. You'll be instrumental in developing incident response processes that facilitate organizational success and sustainability. Join a global team dedicated to improving cloud-based reliability.
Key Responsibilities:
• Analyze and improve systemic failure patterns
• Own configuration and workflows for incident management tools
• Define SLO/SLA frameworks to guide reliability investments
• Edit incident documents for customer clarity
• Lead training programs and coach teams through post-mortems
Requirements:
• 10+ years of experience in SRE or incident manageme...

Apply for this Job

Submit your application for the Expert Site Reliability Engineer at Confluent position at IBM.

Apply Now Save for Later

Job Overview

Job Type: Full-time
Location: toronto, Canada
Posted: June 16, 2026
Deadline: July 26, 2026