workfromanywhereworkfromanywhere
All jobs
CloudlinuxEngineering

Senior Database Reliability Engineer

RemotePosted 4 days ago

CloudLinux / TuxCare is hiring a Senior Database Reliability Engineer to maintain and improve the reliability of critical database services, primarily PostgreSQL, and support other databases like ClickHouse, MongoDB, and Redis in a remote-first environment.

Location: Remote

Responsibilities

  • Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation.
  • Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans.
  • Support the wider database estate: ClickHouse, MongoDB, and Redis. Troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn production ClickHouse patterns.
  • Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata.
  • Help build DBaaS-style self-service capabilities for engineering teams to request databases, access, credentials, and operational checks with less manual DBA intervention.
  • Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.

Requirements

  • Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth.
  • Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
  • Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
  • Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
  • Automation skills with Ansible and scripting. Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages.
  • Ability to support more than one database engine. You do not need to be a ClickHouse expert on day one, but you must be ready to learn it quickly and take responsibility for it.
  • Practical use of AI engineering assistants such as Claude and Codex. Use them to improve speed and quality, while personally verifying generated SQL, commands, scripts, and operational conclusions.
  • Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.

Benefits

  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • Opportunity to receive a reward for the most innovative idea that the company can patent.

Location

Remote

Category

Engineering

Company

Cloudlinux

Source

remoteok

Posted

4 days ago

Similar remote jobs