All jobs
MindriftEngineering
Freelance Agent Evaluation Engineer
United Kingdomup to $50 per hourPosted 20 days ago
Mindrift is connecting specialists with project-based AI opportunities focused on testing, evaluating, and improving AI systems. The role involves creating and evaluating tasks for AI coding agents in simulated environments, working collaboratively with AI to develop challenging scenarios.
Location: United Kingdom
Salary: up to $50 per hour
Responsibilities
- Build virtual companies following a high-level plan, including codebase, infrastructure, and context that form realistic development environments.
- Assemble and calibrate tasks from intermediate states of virtual companies, craft prompts, define evaluation criteria, and ensure tasks are solvable and fair.
- Design tasks in isolated environments emulating developer workstations with Linux, development tools, servers, and web application codebases.
- Write tests that accept all correct solutions and reject incorrect ones, balancing strictness and leniency.
- Iterate with AI agents on tests to verify their effectiveness and robustness.
- Review code written by AI agents, analyze success and failure cases, and design edge cases and adversarial scenarios.
- Iterate based on feedback from QA reviewers who score work on quality criteria.
Requirements
- Degree in Computer Science, Software Engineering, or related fields.
- 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations).
- Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and back-end systems.
- Experience writing functional and integration tests.
- Experience with Docker containers and infrastructure tools (Postgres, Kafka, Redis).
- Understanding of CI/CD processes, especially GitHub Actions.
- English proficiency - B2.
Additional Information
- This is a project-based, part-time opportunity, not permanent employment.
- The work involves significant collaboration with AI systems, making it challenging to create tasks that truly test frontier models.
- Estimated effort is around 20 hours per task, with flexible scheduling.
Location
United Kingdom
Salary
up to $50 per hour
Category
EngineeringCompany
MindriftSource
himalayas
Posted
20 days ago