All jobs
RunwareDevOps
Staff DevOps Engineer
Remote (UK)Posted today
Runware is building the API layer for the next generation of AI products, providing real-time inference access across models through a flexible API. They focus on performance, reliability, and operational simplicity for AI media generation products.
Location: Remote (UK)
Responsibilities
- Build and scale the infrastructure that powers real-time AI inference across GPU fleets, bare-metal servers, serverless and containerised production systems
- Help evolve Runware’s platform toward more elastic, on-demand infrastructure that can scale quickly with customer traffic and model demand
- Make Runware faster, more reliable and more resilient by improving the critical paths behind our request entrypoints, inference services, queues, storage, load balancers and networking layer
- Automate the hard parts of infrastructure operations, from provisioning and configuration through to CI/CD, deployment safety, progressive rollouts and rapid rollback
- Build the observability backbone for a high-performance AI platform, with the signals needed to spot issues early, understand capacity and fix problems before customers feel them
- Play a leading role in production operations, incident response, debugging and post-incident improvements, helping us turn operational challenges into a stronger platform
- Strengthen the security and compliance foundations of our infrastructure through patching, secrets management, access controls, hardening, auditability, documentation and repeatable operational processes
Requirements
- Strong experience as a DevOps Engineer, SRE, Infrastructure Engineer, Platform Engineer or similar, with a track record of running production systems at scale
- Deep Linux knowledge and confidence debugging real production issues across networking, storage, performance, services and system behaviour
- Hands-on experience building automation, Infrastructure-as-Code, CI/CD pipelines and deployment workflows that make infrastructure safer and easier to operate
- Experience operating high-availability, low-latency or high-throughput platforms where reliability and performance directly affect customers
- Strong networking fundamentals across TCP/IP, DNS, load balancing, routing, firewalls, proxies, TLS and HTTP
- A calm and pragmatic approach under pressure, with strong communication, good judgement and a bias toward automation over manual toil
Benefits
- Generous paid time off – vacation, sick days, public holidays
- Meaningful stock options – share in the upside you create
- Remote-first setup – work from home anywhere we can employ you
- Flexible hours – own your schedule outside core collaboration blocks
- Family leave – paid maternity, paternity, and caregiver time
- Company retreats – twice-yearly gatherings in inspiring locations
Similar remote jobs
yesterday