Johnny Chan logo
AI NewsMay 22, 20269 min read

By Johnny Chan · UI/UX Designer, Hong Kong

How to Apply Harness Engineering: Rules, Skills, and Evals

A week-one harness checklist for Cursor-style agents: project contract, rules, skills, hooks, planning habits, and lightweight evals your team can actually maintain.

How to Apply Harness Engineering: Rules, Skills, and Evals

Concepts do not ship interfaces—a maintained harness does. This is the operational follow-up to What Is Harness Engineering: how to build layers your team will still trust in six weeks, not just after a demo. The patterns apply to Cursor-class IDEs and compatible setups (Claude Code, Codex, Kiro, or synced .cursor/.agents trees). I use the same stack when Hong Kong startups ask me to tighten design–dev handoff while agents sit in the middle of both sides.

Start with a project contract

Put AGENTS.md at the repo root and treat it like a README for machines. Cap it at one screen: how to build, how to test, where routes and components live, and one or two canonical examples—not a Figma dump. Use real paths (src/components/ui/button.tsx) so updates survive refactors.

  • Exact commands for build, typecheck, and test—and when the agent must run them.
  • Token file and component entry points the team actually imports from.
  • Non-negotiable UX states on new flows: empty, loading, error, success.
  • Hard stops: no force-push, no --no-verify, no credentials in diffs.

Layer 1: Rules (always-on context)

Files in .cursor/rules/ load every session. Add a rule only after the same mistake happens twice; bloated rules get ignored. Good rules read like runbooks: import aliases, spacing scale, links to Button.tsx. Bad rules read like brand PDFs the model cannot act on.

Layer 2: Skills (on demand)

Skills (SKILL.md, agentskills.io-compatible) load when the task matches—deploy checklist, SEO pass, research synthesis—so you are not pasting playbooks into every chat. Turn recurring team rituals into skills: open a PR, run design QA on a branch, refresh the sitemap. Designers can own copy-review or research skills without touching application code.

Layer 3: Hooks and guardrails

Hooks fire scripts around agent actions: block commits until tests pass, warn on edits under /api/payments, format on save. Community templates mix path-based risk JSON with memory files and CI gates. Calibrate strictness to blast radius—tight where money and auth live, lighter on marketing pages.

Layer 4: Tools and MCP

Tools are the agent’s hands: patch edits, ripgrep, terminal, browser, MCP bridges to Figma or Sentry. Match edit format to the model (unified diff vs search-replace). Each MCP server costs context and can fail silently—add integrations that shorten handoff, not ones you will forget to maintain.

Planning and context hygiene

  • Use plan mode before multi-file work; store plans under .cursor/plans/ for the team.
  • One task per conversation; @-reference prior threads instead of dumping history.
  • Attach files you are sure about; let codebase search find the rest.
  • Define done with checks: tests green, typecheck clean, named states in the UI.

Evals: know the harness is working

Vendors ship harness changes against benchmark suites. You can run a micro-version: five fixed tasks on your repo—add a validated field, fix a flaky test, implement a card from Figma—graded pass/fail after each rules edit. When something breaks, tag the failure (wrong file, skipped test, rogue hex) and fix the layer that failed, not a one-off prompt.

Designer–engineer collaboration checklist

  • Figma owns visuals; rules point at token files, not PNGs alone.
  • Design QA on agent PRs focuses on breakpoints and edge states, not only the happy path.
  • Shared slash commands or skills for review so PRs look like your team wrote them.
  • Monthly harness hygiene: delete stale rules; promote repeat fixes into skills.
Treat the harness like a design system for agents: versioned, intentional, and owned.

Sources

Cursor’s agent best practices (instructions, skills, hooks, plan mode), their harness improvement write-up, and community rule compilers that sync across IDEs are the right starting points. Run this checklist against your repo—what works on a Next.js marketing site may need edits for a native app.

Let's work together

Open to UI/UX projects, collaborations, and product design support in Hong Kong and remotely.

Let's Connect