By
Site Reliability Engineer - Machine Learning Systems (Singapore) Technology - Backend Singapore[...]
ByteDance
Full-time
Singapore, Singapore
Electrical & Energy Engineering
Posted:
March 02, 2026
Location:
Singapore, Singapore, Singapore
Job Description
Site Reliability Engineer - Machine Learning Systems (Singapore)
Job Code: A A
Responsibilities
- Ensure our ML systems operate efficiently for large model deployment, training, evaluation, and inference.
- Maintain stability of offline tasks/services across multi‑data center, multi‑region, and multi‑cloud scenarios.
- Manage resource planning, cost, and budget, including computing and storage resources.
- Implement global system disaster recovery, cluster machine governance, and enhance business service stability, resource utilization, and operational efficiency.
- Build software tools, products, and systems to monitor and manage ML infrastructure and services efficiently.
- Participate in the global team roster that ensures system and business on‑call support.
Minimum Qualifications
- Bachelor’s degree or above in Computer Science, Computer Engineering, or related fields. ...
Apply for this Job
Submit your application for the Site Reliability Engineer - Machine Learning Systems (Singapore) Technology - Backend Singapore[...] position at ByteDance.
Apply Now Save for LaterJob Overview
Job Type:
Full-time
Location:
Singapore, Singapore
Posted:
March 02, 2026
Deadline:
April 11, 2026