All jobs
Voxel51DevOps
Principal Infrastructure Engineer
Remote (US)$250K–$280K/yrPosted today
The role is for a Principal Infrastructure Engineer at Voxel51, focusing on designing and maintaining scalable, reliable infrastructure for AI data platform.
Location: Remote (US)
Salary: $250K–$280K/yr
Responsibilities
- Shape the architecture and evolution of Voxel51’s infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises
- Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments, ensuring reliability, security, and repeatability
- Partner with enterprise customers to deliver and support production-grade deployments, guiding installation, troubleshooting, and scaling
- Lead infrastructure initiatives across engineering teams, enabling faster development with robust tooling and automation
- Drive best practices in CI/CD, evolving pipelines and introducing new approaches
- Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) and customer on-prem installations (K8s or Docker Compose)
- Champion developer productivity, improving workflows for development and automated cloud deployments
- Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges
- Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions
- Mentor engineers and set technical direction to keep infrastructure ahead of customer needs and industry trends
Requirements
- Deep experience with containerized environments (building, packaging, debugging container images, Kubernetes, Docker Compose, Helm charts)
- Infrastructure as Code expertise (Terraform, Ansible, or equivalent)
- Scripting and automation skills (Bash or similar)
- Python expertise (build and environment management, packaging/distribution, release management, dependency debugging)
- CI/CD systems experience (GitHub Actions)
- Cloud infrastructure knowledge (GCP, IAM, VPC, load balancing, ingress/egress routing, proxies, firewall rules)
- Database fundamentals (MongoDB or similar NoSQL systems)
- Observability skills (monitoring, logging, tracing, alerting)
- Security best practices (certificates, service accounts, least privilege, role assumptions)
- Troubleshooting ability in complex, distributed systems
- Testing mindset for validating functionality
- Strong communication skills for working with enterprise customers and remote teams
- Adaptability and curiosity for learning new concepts and technologies
Benefits
- Equity in the form of options
- A variety of benefits
- Opportunity to grow in an exciting and collaborative environment
Additional Information
- Originally posted on Himalayas