Member of Technical Staff - ML Infrastructure & Performance

Moonlake AI

Full-time San Mateo, Rizal IT & Technology

Posted:

March 03, 2026

Location:

San Mateo, Rizal, Philippines

Job Description

Introducing Moonlake, AI for creating real-time interactive content 
Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions. 
Scope of Work: GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. 
Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on‑GPU KV reuse; speculative decoding/medusa; mixture‑of‑agents routing. 
Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning. 
Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving. 
Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback. 
Tech signals: Previous experience at Infra-heavy startups such as Databricks, Roblox 
We are committed to being an on-site, in-person team currently based in San Mateo 
 #J-18808-Ljbffr
                

Apply for this Job

Submit your application for the Member of Technical Staff - ML Infrastructure & Performance position at Moonlake AI.

Apply Now Save for Later

Job Overview

Job Type: Full-time

Location: San Mateo, Philippines

Posted: March 03, 2026

Deadline: April 12, 2026