SonyData

Research Intern – Multimodal Foundation Model for Vision

Flexible (Tokyo, Europe, US)$50.00/hourPosted 28 days ago

Sony AI is seeking research interns to develop next-generation foundation models for vision, focusing on innovative methodologies in vision-language models, model compression, and deployment on cloud and edge devices. Interns will work with a team of scientists and engineers on challenging problems in generative AI, with opportunities for publication and impact on billions of users.

Location: Flexible (Tokyo, Europe, US)

Salary: $50.00/hour

Responsibilities

Conduct fundamental and innovative development in low-cost yet powerful vision-language models, unified models, automatic model compression, optimization, and deployment on cloud and edge.
Design or implement state-of-the-art techniques on model compression, inference speedup, hardware deployment, and tool automation.
Proof of Concept (PoC) for vision+text generation tasks (VQA, captioning, understanding, etc.) and hardware.
Contribute to library and tool development to support business; publish influential research in top-tier conferences and journals.

Requirements

Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field.
Self-motivated with the ability to propose and implement innovative ideas.
Strong presentation and communication skills.
Publications or expertise in compact foundation model development and deployment, with influential open-source projects or papers at top conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ACL).

Apply Now

Location

Flexible (Tokyo, Europe, US)

Salary

$50.00/hour