Job Description
We are looking for exceptional AI Engineers to build the next generation of AI infrastructure and Machine Learning Systems (MLSys).
This role focuses on large-scale system infrastructure rather than model research. You will work on the core foundations that power large-scale AI training and inference systems, including Kubernetes cluster management, RDMA networking, unified KV Cache architecture, observability platforms, distributed systems, GPU orchestration, and CUDA kernel optimization.
You will collaborate closely with AI researchers, infrastructure architects, networking engineers, and platform teams to maximize the efficiency, scalability, and reliability of AI systems.
Key Responsibilities- Design, deploy, and operate large-scale Kubernetes-based AI infrastructure.
- Develop cluster governance frameworks, scheduling policies, resource isolation, and multi-tenancy capabilities.
- Build and optimize GPU orchestration platform...
Apply for this Job
Submit your application for the AI Engineer (ML Systems & Infrastructure) position at SwapeTech.
Apply Now Save for Later