How to Apply Harness Engineering: Rules, Skills, and Evals

A week-one harness checklist for Cursor-style agents: project contract, rules, skills, hooks, planning habits, and lightweight evals your team can actually maintain.

Concepts do not ship interfaces—a maintained harness does. This is the operational follow-up to What Is Harness Engineering: how to build layers your team will still trust in six weeks, not just after a demo. The patterns apply to Cursor-class IDEs and compatible setups (Claude Code, Codex, Kiro, or synced .cursor/.agents trees). I use the same stack when Hong Kong startups ask me to tighten design–dev handoff while agents sit in the middle of both sides.

Start with a project contract

Put AGENTS.md at the repo root and treat it like a README for machines. Cap it at one screen: how to build, how to test, where routes and components live, and one or two canonical examples—not a Figma dump. Use real paths (src/components/ui/button.tsx) so updates survive refactors.

Exact commands for build, typecheck, and test—and when the agent must run them.
Token file and component entry points the team actually imports from.
Non-negotiable UX states on new flows: empty, loading, error, success.
Hard stops: no force-push, no --no-verify, no credentials in diffs.

Layer 1: Rules (always-on context)

Files in .cursor/rules/ load every session. Add a rule only after the same mistake happens twice; bloated rules get ignored. Good rules read like runbooks: import aliases, spacing scale, links to Button.tsx. Bad rules read like brand PDFs the model cannot act on.

Layer 2: Skills (on demand)

Skills (SKILL.md, agentskills.io-compatible) load when the task matches—deploy checklist, SEO pass, research synthesis—so you are not pasting playbooks into every chat. Turn recurring team rituals into skills: open a PR, run design QA on a branch, refresh the sitemap. Designers can own copy-review or research skills without touching application code.

Layer 3: Hooks and guardrails

Hooks fire scripts around agent actions: block commits until tests pass, warn on edits under /api/payments, format on save. Community templates mix path-based risk JSON with memory files and CI gates. Calibrate strictness to blast radius—tight where money and auth live, lighter on marketing pages.

Layer 4: Tools and MCP

Tools are the agent’s hands: patch edits, ripgrep, terminal, browser, MCP bridges to Figma or Sentry. Match edit format to the model (unified diff vs search-replace). Each MCP server costs context and can fail silently—add integrations that shorten handoff, not ones you will forget to maintain.

Planning and context hygiene

Use plan mode before multi-file work; store plans under .cursor/plans/ for the team.
One task per conversation; @-reference prior threads instead of dumping history.
Attach files you are sure about; let codebase search find the rest.
Define done with checks: tests green, typecheck clean, named states in the UI.

Evals: know the harness is working

Vendors ship harness changes against benchmark suites. You can run a micro-version: five fixed tasks on your repo—add a validated field, fix a flaky test, implement a card from Figma—graded pass/fail after each rules edit. When something breaks, tag the failure (wrong file, skipped test, rogue hex) and fix the layer that failed, not a one-off prompt.

Designer–engineer collaboration checklist

Figma owns visuals; rules point at token files, not PNGs alone.
Design QA on agent PRs focuses on breakpoints and edge states, not only the happy path.
Shared slash commands or skills for review so PRs look like your team wrote them.
Monthly harness hygiene: delete stale rules; promote repeat fixes into skills.

Treat the harness like a design system for agents: versioned, intentional, and owned.

Sources

Cursor’s agent best practices (instructions, skills, hooks, plan mode), their harness improvement write-up, and community rule compilers that sync across IDEs are the right starting points. Run this checklist against your repo—what works on a Next.js marketing site may need edits for a native app.

Let's work together

Open to UI/UX projects, collaborations, and product design support in Hong Kong and remotely.

Let's Connect