Senior Data Platform Reliability Engineer

OpsWerks
Full-time , , Philippines, , , Philippines Engineering
Posted:
February 22, 2026
Location:
, , Philippines, , , Philippines, Philippines

Job Description

Responsibilities

  • Run managed services, not just systems. Operate multi‑tenant data/AI platforms (Spark, Airflow, Flink, Jupyter) with clear SLAs/SLIs/SLOs, cost guardrails, and capacity plans across AWS/GCP + Kubernetes.
  • Be the face of reliability. Lead incidents end‑to‑end, own customer comms and post‑incident reviews (RCA with actions customers can see and feel).
  • Design for Customer experience. Help Data scientists and customers reduce failed/slow jobs, improve time‑to‑data, and optimize costs—so customers notice faster pipelines and fewer surprises.
  • Standardize & scale. Build service runbooks, golden paths, and automation that make onboarding and daily ops predictable across customers.
  • Automate the toil away. Ship tooling (Bash/Python, GitOps, CI/CD) for backups, DR drills, upgrades, access, and environment bootstrapping.
  • Make signals meaningful. Instrument platforms with metrics/logs/traces; tune alerting to cu...

Apply for this Job

Submit your application for the Senior Data Platform Reliability Engineer position at OpsWerks.

Apply Now Save for Later

Job Overview

Job Type: Full-time
Location: , , Philippines, Philippines
Posted: February 22, 2026
Deadline: April 03, 2026