All jobs
CloudlinuxEngineering
Senior Database Reliability Engineer
RemotePosted 4 days ago
CloudLinux / TuxCare is hiring a Senior Database Reliability Engineer to maintain and improve the reliability of critical database services, primarily PostgreSQL, and support other databases like ClickHouse, MongoDB, and Redis in a remote-first environment.
Location: Remote
Responsibilities
- Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation.
- Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans.
- Support the wider database estate: ClickHouse, MongoDB, and Redis. Troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn production ClickHouse patterns.
- Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata.
- Help build DBaaS-style self-service capabilities for engineering teams to request databases, access, credentials, and operational checks with less manual DBA intervention.
- Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.
Requirements
- Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth.
- Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
- Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
- Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
- Automation skills with Ansible and scripting. Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages.
- Ability to support more than one database engine. You do not need to be a ClickHouse expert on day one, but you must be ready to learn it quickly and take responsibility for it.
- Practical use of AI engineering assistants such as Claude and Codex. Use them to improve speed and quality, while personally verifying generated SQL, commands, scripts, and operational conclusions.
- Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.
Benefits
- A focus on professional development.
- Interesting and challenging projects.
- Fully remote work with flexible working hours.
- Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
- Compensation for private medical insurance.
- Co-working and gym/sports reimbursement.
- Budget for education.
- Opportunity to receive a reward for the most innovative idea that the company can patent.
Skills & Tags
Similar remote jobs
yesterday
comscoreEngineering
Manager, Software Engineer
Remote - California, Nevada, or Tennessee$153,000-$165,000
yesterday
yesterday
yesterday
goPro Consultancy GroupEngineering
Senior Full-Stack Developer - Full Remote
Fort Western Province Sri Lanka
yesterday