All jobs
Bright Vision TechnologiesEngineering
Model Serving Engineer
Remote (Contiguous United States)Posted today
Bright Vision Technologies is seeking a Model Serving Engineer to design, build, and operate high-performance, reliable inference platforms for large machine learning models, focusing on systems engineering aspects of AI deployment.
Location: Remote (Contiguous United States)
Responsibilities
- Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems.
- Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing.
- Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints.
- Build autoscaling and capacity management systems that balance latency, throughput, and cost.
- Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads.
- Integrate model serving with API gateways, identity systems, and observability platforms.
- Implement caching, prompt deduplication, and response reuse strategies where appropriate.
- Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking.
- Develop deployment workflows including canary releases, shadow testing, and automated rollback.
- Operate incident response for high-availability AI services and drive durable reliability improvements.
- Collaborate with ML and product teams to support new model releases and capability rollouts.
- Implement security controls including request signing, content filtering, and abuse detection at the serving layer.
- Document operational procedures, performance characteristics, and tuning guidance for internal teams.
- Stay current with AI serving research and translate advances into production capabilities.
Requirements
- Open-source contributions to model serving infrastructure.
- Experience with multi-region or globally distributed AI serving.
- Familiarity with model quantization, distillation, and compression techniques.
- Exposure to FinOps for AI workloads and cost-efficient serving design.
- Experience supporting external-facing AI APIs at scale.
Additional Information
- Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies.
- No new H1B sponsorship is available, but transfers are supported for qualified candidates.
- A technical coding assessment is mandatory for applicants.
- The role is a full-time, remote, in-house position with no third-party client or vendor involvement.
Location
Remote (Contiguous United States)
Category
EngineeringCompany
Bright Vision TechnologiesSource
himalayas
Posted
today
Similar remote jobs
Principal Electrical Engineer Renewable Energy (MV-HV) - Remote
Remote, anywhere in the Americas with reasonable access for travel.$133,279.00-$199,919.00 Per Year
today
Power Apps Developer
On-site as needed depending on client location, with ability to commute.$102,000.00-$170,000.00 per year
today