LLMInference Performance&EvalsEngineer

Cerebras Systems

Full-time toronto, on Other-General

Posted:

June 15, 2026

Location:

toronto, on, Canada

Job Description

About The Role Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production. 
Key Responsibilities Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge. 
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests. 
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software / hardware innovation. 
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities. 
Skills And Qualifications 3 + years building high-perform...
                

Apply for this Job

Submit your application for the LLMInference Performance&EvalsEngineer position at Cerebras Systems.

Apply Now Save for Later

Job Overview

Job Type: Full-time

Location: toronto, Canada

Posted: June 15, 2026

Deadline: July 25, 2026