Senior Software Engineer, AI Inference Systems

NVIDIA
Full-time Toronto, Canada other-general
Posted:
February 27, 2026
Location:
Toronto, Canada, Canada

Job Description

We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You’ll collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI.


What you’ll be doing:
+ Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, prefill-decode disaggregation.
+ Develop, optimize, and benchmark GPU kernels (hand-tuned and compiler-generated) using techniques such as fusion, autotuning, and memory/layout optimization; build and extend high...

Apply for this Job

Submit your application for the Senior Software Engineer, AI Inference Systems position at NVIDIA.

Apply Now Save for Later

Job Overview

Job Type: Full-time
Location: Toronto, Canada
Posted: February 27, 2026
Deadline: March 04, 2026