Full Stack AI/ML Engineer

Key

On-site

Palo Alto, California, United States

About Us

This role is for one our client. Our client is one of the worlds fastest-growing AI companies, pushing the boundaries of AI-assisted software development. Their mission is to empower the next generation of AI systems to reason about and work with real-world software repositories. You'll be working at the intersection of software engineering, open-source ecosystems, and frontier AI.

Project Overview

Our client is building high-quality evaluation and training datasets to improve how Large Language Models (LLMs) interact with realistic software engineering tasks. A key focus of this project is curating verifiable software engineering challenges from public GitHub repository histories using a human-in-the-loop process.

Why This Role Is Unique

Collaborate directly with AI researchers shaping the future of AI-powered software development.
Work with high-impact open-source projects and evaluate how LLMs perform on real bugs, issues, and developer tasks.
Influence dataset design that will train and benchmark next-gen LLMs.

Role Overview What Does a Typical Day Look Like?

Review and compare 34 model-generated code responses per task using a structured ranking system.
Evaluate code diffs for correctness, code quality, style, and efficiency.
Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
Maintain high consistency and objectivity across evaluations.
Collaborate with the team to identify edge cases and ambiguities in model behavior.

Required Skills & Experience

5+ years of software engineering experience, including 2+ continuous years at a top-tier product company (e.g., Stripe,Netflix,Datadog, Dropbox, Shopify, PayPal, IBM Research).
Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.
Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
Proven ability to review code diffs and evaluate correctness, maintainability, and efficiency.
Excellent oral and written communication skills for clear, structured evaluation rationales.

Bonus Points

Experience in LLM research, developer agents, or AI evaluation projects.
Background in building or scaling developer tools or automation systems.

Engagement Details

Commitment: ~10-20 hours/week (partial PST overlap required)
Type: Contractor (no medical/paid leave)
Duration: 1 month - starting next week; potential extensions based on performance and fit.

Apply now

Share this job

Twitter Facebook Linkedin Email

Full Stack AI/ML Engineer

About Us

Project Overview

Why This Role Is Unique

Role Overview What Does a Typical Day Look Like?

Required Skills & Experience

Bonus Points

Engagement Details

More jobs

Senior Data Analyst - Storefront

Quince

Sr. Data Engineer-Contractor

OPPO US Research Center