workfromanywhereworkfromanywhere
All jobs
MindriftEngineering

Freelance Agent Evaluation Engineer

United Kingdomup to $50 per hourPosted 20 days ago

Mindrift is connecting specialists with project-based AI opportunities focused on testing, evaluating, and improving AI systems. The role involves creating and evaluating tasks for AI coding agents in simulated environments, working collaboratively with AI to develop challenging scenarios.

Location: United Kingdom

Salary: up to $50 per hour

Responsibilities

  • Build virtual companies following a high-level plan, including codebase, infrastructure, and context that form realistic development environments.
  • Assemble and calibrate tasks from intermediate states of virtual companies, craft prompts, define evaluation criteria, and ensure tasks are solvable and fair.
  • Design tasks in isolated environments emulating developer workstations with Linux, development tools, servers, and web application codebases.
  • Write tests that accept all correct solutions and reject incorrect ones, balancing strictness and leniency.
  • Iterate with AI agents on tests to verify their effectiveness and robustness.
  • Review code written by AI agents, analyze success and failure cases, and design edge cases and adversarial scenarios.
  • Iterate based on feedback from QA reviewers who score work on quality criteria.

Requirements

  • Degree in Computer Science, Software Engineering, or related fields.
  • 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations).
  • Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and back-end systems.
  • Experience writing functional and integration tests.
  • Experience with Docker containers and infrastructure tools (Postgres, Kafka, Redis).
  • Understanding of CI/CD processes, especially GitHub Actions.
  • English proficiency - B2.

Additional Information

  • This is a project-based, part-time opportunity, not permanent employment.
  • The work involves significant collaboration with AI systems, making it challenging to create tasks that truly test frontier models.
  • Estimated effort is around 20 hours per task, with flexible scheduling.

Location

United Kingdom

Salary

up to $50 per hour

Category

Engineering

Company

Mindrift

Source

himalayas

Posted

20 days ago

Similar remote jobs

MercorNewEngineering

Lead Software Engineer – Up To $70/hr

Germany$70/hr
today
2d ago
PogoEngineering

Don't See a Perfect Role Apply Anyway

Brooklyn, New York, United States
4d ago
Planner 5DNewEngineering

Senior Python Engineer

Remote
today
ConcentrixNewEngineering

M365 Architect German and English (m/f/d)

Dusseldorf, Germany
today