Senior SRE: AI/ML HPC Infra & GPU Cluster

Boson AI

Full-time toronto, on Other-General

Posted:

May 20, 2026

Location:

toronto, on, Canada

Job Description

                    A technology company in Toronto seeks a Senior Site Reliability Engineer to manage and optimize its HPC infrastructure. In this role, you'll ensure smooth operations of a powerful GPU cluster, deploy infrastructure-as-code solutions, and support ML teams. Candidates should have extensive SRE experience, proficiency in Linux, and familiarity with Kubernetes and Ceph storage. This position offers the chance to work with cutting-edge technology in a collaborative environment, perfect for problem-solvers who love learning.
#J-18808-Ljbffr
                

Apply for this Job

Submit your application for the Senior SRE: AI/ML HPC Infra & GPU Cluster position at Boson AI.

Apply Now Save for Later

Job Overview

Job Type: Full-time

Location: toronto, Canada

Posted: May 20, 2026

Deadline: June 29, 2026