Senior Performance Engineer - LLM Inference Frameworks

NVIDIA
Full-time Yokneam, Israel other-general
Posted:
June 06, 2026
Location:
Yokneam, Israel, Israel

Job Description

NVIDIA is hiring exceptional software engineers to build and optimize the core inference infrastructure for large language models. Join the TensorRT‑LLM team - the group defining how generative AI performs at global scale on NVIDIA GPUs. We’re looking for engineers who love squeezing every drop of throughput, memory efficiency, and scalability out of modern model runtimes. Your work will directly shape the frameworks behind state‑of‑the‑art LLM inference used across NVIDIA and the AI community. Join us to redefine what “fast” means for LLM inference - building the frameworks that power the next generation of generative AI at scale.


What you'll be doing:
+ Design, implement, and optimize high‑performance inference pipelines for large language models running on GPUs
+ Profile and tune model execution across the stack - from scheduler design to kernel fusions and everything in-between
+ Design and experiment with memory management strategies for improved memory ba...

Apply for this Job

Submit your application for the Senior Performance Engineer - LLM Inference Frameworks position at NVIDIA.

Apply Now Save for Later

Job Overview

Job Type: Full-time
Location: Yokneam, Israel
Posted: June 06, 2026
Deadline: June 11, 2026