Re
Forward Deployed Engineer, AI Inference (vLLM and Kubernetes)
Red Hat
Full-time
Sacramento, CA
other-general
Posted:
June 10, 2026
Location:
Sacramento, CA, United States
Job Description
The vLLM and LLM-D Engineering team at Red Hat is looking for a customer obsessed developer to join our team as a Forward Deployed Engineer. In this role, you will not just build software; you will be the bridge between our cutting-edge inference platform (LLM-D (https://llm-d.ai/) , and vLLM (https://github.com/vllm-project/vllm) ) and our customers' most critical production environments.
You will interface directly with the engineering teams at our customer to deploy, optimize, and scale distributed Large Language Model (LLM) inference systems. You will solve last mile infrastructure challenges that defy off-the-shelf solutions, ensuring that massive models run with low latency and high throughput on complex Kubernetes clusters. This is not a sales engineering role, you will be part of the core vLLM and LLM-D engineering team.
**What You Will Do**
+ Orchestrate Distributed Inference: Deploy and configure LLM-D and vLLM on Kubernetes clusters. You will set u...
You will interface directly with the engineering teams at our customer to deploy, optimize, and scale distributed Large Language Model (LLM) inference systems. You will solve last mile infrastructure challenges that defy off-the-shelf solutions, ensuring that massive models run with low latency and high throughput on complex Kubernetes clusters. This is not a sales engineering role, you will be part of the core vLLM and LLM-D engineering team.
**What You Will Do**
+ Orchestrate Distributed Inference: Deploy and configure LLM-D and vLLM on Kubernetes clusters. You will set u...
Apply for this Job
Submit your application for the Forward Deployed Engineer, AI Inference (vLLM and Kubernetes) position at Red Hat.
Apply Now Save for LaterJob Overview
Job Type:
Full-time
Location:
Sacramento, United States
Posted:
June 10, 2026
Deadline:
June 15, 2026