All jobs
SonyData
Research Intern – Multimodal Foundation Model for Vision
Flexible (Tokyo, Europe, US)$50.00/hourPosted 28 days ago
Sony AI is seeking research interns to develop next-generation foundation models for vision, focusing on innovative methodologies in vision-language models, model compression, and deployment on cloud and edge devices. Interns will work with a team of scientists and engineers on challenging problems in generative AI, with opportunities for publication and impact on billions of users.
Location: Flexible (Tokyo, Europe, US)
Salary: $50.00/hour
Responsibilities
- Conduct fundamental and innovative development in low-cost yet powerful vision-language models, unified models, automatic model compression, optimization, and deployment on cloud and edge.
- Design or implement state-of-the-art techniques on model compression, inference speedup, hardware deployment, and tool automation.
- Proof of Concept (PoC) for vision+text generation tasks (VQA, captioning, understanding, etc.) and hardware.
- Contribute to library and tool development to support business; publish influential research in top-tier conferences and journals.
Requirements
- Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field.
- Self-motivated with the ability to propose and implement innovative ideas.
- Strong presentation and communication skills.
- Publications or expertise in compact foundation model development and deployment, with influential open-source projects or papers at top conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ACL).