Job Description
Key Responsibilities
Cluster Operations & Management: Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units
Ensure optimal performance, scalability, and reliability of distributed systems
Infrastructure Platform Development: Design, build, and enhance infrastructure operation platforms
Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
Drive platform standardization and automation initiatives
High Availability & Reliability: Ensure maximum uptime for production services through proactive monitoring and incident response
Continuously optimize service architecture, deployment strategies, and operational processes
Implement and maintain SLA/SLO frameworks and reliability...
Apply for this Job
Submit your application for the DevOps & SRE Engineer position at Manus AI.
Apply Now Save for Later