Agent Harness Field Guide: 50 Loops, Tool Systems, and Lessons for LingTai
Living field guide
This post condenses a source-grounded study of 50 current agent harnesses. It is intentionally a living blog entry: the ecosystem moves fast, and this page should be updated as harnesses change, disappear, or teach LingTai new lessons.
Most agent discussions talk about “the model.” This guide is about the thing around the model: the harness.
A harness decides how context is assembled, how tools are declared, how tool calls are approved, how side effects are committed, how traces are recorded, how work resumes after interruption, and how a human can tell whether the agent is thinking, stuck, or acting. The model matters, but the harness decides whether the model can do reliable work.
For LingTai, the important comparison is not “which project has the cleverest ReAct loop.” LingTai is already a different shape: an always-on agent network with durable memory, mail/chat wakeup, avatars, daemons, MCP/addon ownership, and lifecycle control. The right question is: what should such a network borrow from the best single-agent harnesses, framework harnesses, and sandbox substrates?
Bottom line
The ecosystem clusters around five dominant ideas:
- Coding workbenches make tool use visible: shell, file edits, patches, approvals, and resumable sessions.
- IDE agents win by living next to code and keeping context/approval friction low.
- Graph and workflow frameworks make long plans deterministic through typed state, checkpoints, and edges.
- SDK/framework harnesses are converging on strict tools, typed outputs, tracing, evals, and handoffs.
- Sandbox substrates remind us that execution policy is not an implementation detail; it is part of the harness.
LingTai’s differentiation is still strong: it is not just a loop. It is a network runtime. But the study suggests several concrete improvements.
Recommended improvements for LingTai
P0 — Tool-result commit ledger
Make each tool call explicitly move through states: proposed → approved → executing → side-effect committed → model-visible → durable-log-visible. This would make LingTai stronger than typical SDKs and reduce ambiguity around orphaned, retried, or healed tool calls.
P0 — Daemon/process reattachment
Adopt a run-artifact contract for every daemon/backend: parent PID, child PID, workspace, transcript, report path, last heartbeat, and recovery action. On restart, LingTai should be able to reattach, finalize, or explain instead of leaving a task in an unknown state.
P1 — Span-style observability
Borrow the tracing shape now common in modern agent SDKs: turn → model call → tool calls → MCP calls → daemon tasks. Render it in the portal/TUI so humans can see why an agent is slow or stuck.
P1 — Graph/checkpoint option
Keep LingTai’s always-on loop, but offer a graph/checkpoint primitive for workflows that need atomic multi-step state. LangGraph-style checkpointing is not a replacement for LingTai; it is a useful mode inside it.
P1 — Stricter tool schema ergonomics
Expose typed tool metadata: argument schema, side-effect class, timeout, approval policy, retry policy, and error formatter. The more tools LingTai owns, the more tool contracts should be visible as data.
P1 — Sandbox policy objects
Make sandbox/approval policy first-class per tool and backend. Claude Code, Codex, SWE-agent, and E2B/Daytona all show that filesystem, shell, network, and approval policy shape the agent’s behavior.
P1 — Cheaper handoff primitive
LingTai avatars are durable and powerful. Sometimes we also need a cheap in-process handoff/router primitive for specialist routing when persistence is unnecessary.
Taxonomy: how to read the field
- Agent frameworks (10): Frameworks that expose agents as programmable objects with tools, memory, callbacks, or typed outputs.
- Code workbench agents (7): Terminal agents optimized for repository editing, patches, shell commands, and approval loops.
- IDE-native agents (6): Agents embedded in editors, where context and approval sit next to code.
- Autonomous SWE platforms (5): Long-running systems that own a workspace and attempt end-to-end software tasks.
- Multi-agent frameworks (5): Role/team based systems where coordination is the product surface.
- Workflow runtimes (3): Graph/event runtimes that make agent steps durable and composable.
- Benchmark coding harnesses (2): Small, reproducible loops built to evaluate coding agents.
- Commercial closed agents (2): Closed products whose public materials still reveal product patterns.
- RAG/tool frameworks (2): Retrieval and pipeline stacks growing agentic tool-routing layers.
- Review/issue-to-PR agents (2): Narrow repository agents for reviews, issue triage, and PR generation.
- Agent lineage / primitives (1): Minimal task-loop ancestors and primitives.
- Local runtimes (1): Local extension/tool runtimes for personal workstations.
- Memory-first runtimes (1): Systems where explicit memory is the core runtime object.
- Prompt/programming frameworks (1): Systems that optimize prompts/agent programs as software.
- Sandbox substrates (1): Execution substrates that make tool use safe and reproducible.
- Uncertain/small harnesses (1): Small or lower-confidence packages kept to map the boundary of the term.
50-harness matrix
| # | Harness | Shape | Evidence | Lesson for LingTai |
|---|---|---|---|---|
| 1 | Claude Code | Coding CLI / closed agent | Closed/public evidence | Treat the agent loop as a product surface: approvals, compaction, resume, and tool semantics are visible, not hidden. |
| 2 | OpenAI Codex CLI | Coding CLI | Public/source-grounded | Sandbox and approval modes should be first-class runtime policy, not prompt folklore. |
| 3 | OpenCode | Coding CLI | Public/source-grounded | Provider-agnostic terminal agents need strict session state and model/tool abstraction boundaries. |
| 4 | OpenHands | Autonomous SWE platform | Public/source-grounded | A durable event stream plus workspace sandbox makes long-running SWE work inspectable and recoverable. |
| 5 | Aider | Coding CLI | Public/source-grounded | Git-native editing keeps coding agents honest: every change is a diff with context. |
| 6 | Continue | IDE/code assistant platform | Public/source-grounded | IDE-native agents win when context assembly is explicit and user-editable. |
| 7 | Cline | IDE coding agent | Public/source-grounded | A simple plan-act-observe loop becomes powerful when every tool call is user-visible. |
| 8 | Roo Code | IDE coding agent | Public/source-grounded | Modes are a cheap way to express specialist behavior without spawning durable agents. |
| 9 | Goose | Local agent runtime | Public/source-grounded | Extension-based local runtimes make tools composable while keeping execution near the user. |
| 10 | OpenClaw | Automation/agent-loop framework | Public/source-grounded | Explicit loop documentation is itself a product feature; users need to know what repeats. |
| 11 | OpenHarness | Long-running autonomous harness | Public/source-grounded | Long-running autonomy needs a run artifact, not only a transcript. |
| 12 | Hermes Agent | Self-improving agent | Public/source-grounded | Self-improvement requires memory and skill boundaries that prevent accidental drift. |
| 13 | Pi | Minimal coding harness | Public/source-grounded | Minimal harnesses reveal the irreducible loop: assemble context, call model, apply tools, repeat. |
| 14 | Oh My Pi | Terminal coding harness | Public/source-grounded | Persistent execution kernels are useful, but must be fenced by clear turn/tool budgets. |
| 15 | harness-agent | Small/uncertain harness package | Public/source uncertain | Small packages are useful negative space: naming a harness is not the same as owning a loop. |
| 16 | LangGraph | Graph agent framework | Public/source-grounded | Checkpointed graphs are the strongest pattern for deterministic multi-step agent workflows. |
| 17 | LangChain Agents | Agent framework | Public/source-grounded | Tool schemas, callbacks, and intermediate steps should be inspectable from the framework boundary. |
| 18 | CrewAI | Multi-agent framework | Public/source-grounded | Role-based teams make delegation legible, but they need durable accountability to avoid theater. |
| 19 | AutoGen | Multi-agent framework | Public/source-grounded | Conversation-as-orchestration is flexible; termination and handoff rules are the hard part. |
| 20 | Semantic Kernel Agents | Enterprise agent framework | Public/source-grounded | Enterprise harnesses need typed functions, planners, and policy surfaces that non-research users can trust. |
| 21 | LlamaIndex Agents | RAG/tool agent framework | Public/source-grounded | RAG-centric agents prove that retrieval and tool use should share one traceable context contract. |
| 22 | PydanticAI | Typed agent framework | Public/source-grounded | Typed outputs and dependencies reduce ambiguity at the model/framework boundary. |
| 23 | Agno | Agent/team framework | Public/source-grounded | Teams, memory, and tools should be configured as data, then traced as execution. |
| 24 | smolagents | Lightweight code/tool agents | Public/source-grounded | Code-as-action is powerful when the sandbox and imports are constrained by design. |
| 25 | DSPy agents | Prompt/programming framework | Public/source-grounded | Agent behavior can be optimized as a program, not only hand-written as a prompt. |
| 26 | AutoGPT Forge | Autonomous agent platform | Public/source-grounded | Autonomy platforms need capability registries and budgets before they need more prompts. |
| 27 | MetaGPT | Software-company multi-agent | Public/source-grounded | Structured artifacts can make multi-agent collaboration less chatty and more reviewable. |
| 28 | CAMEL-AI | Communicative multi-agent framework | Public/source-grounded | Society-style simulation is useful for research, but production needs ownership and state boundaries. |
| 29 | Letta / MemGPT | Stateful memory agent server | Public/source-grounded | Memory must be an explicit runtime object with edit, recall, and persistence semantics. |
| 30 | Mastra | TypeScript agent framework | Public/source-grounded | Modern app-agent frameworks treat agents, workflows, evals, and observability as one developer stack. |
| 31 | VoltAgent | TypeScript agent framework | Public/source-grounded | Developer-friendly dashboards matter because agent failure is usually a trace-reading problem. |
| 32 | Motia | Event-driven workflow framework | Public/source-grounded | Event-driven workflows are a good substrate for agent steps that must outlive one request. |
| 33 | Haystack Agents | Pipeline/RAG agent framework | Public/source-grounded | Pipelines and agents should converge when retrieval, routing, and tool use interact. |
| 34 | SWE-agent | SWE-bench coding harness | Public/source-grounded | Bench harnesses show the value of reproducible run directories and environment specs. |
| 35 | mini-SWE-agent | Lightweight SWE harness | Public/source-grounded | A small, explicit loop is easier to benchmark than a giant framework. |
| 36 | Devin | Commercial SWE agent | Closed/public evidence | Closed agents still teach product lessons: persistent workspace, async work, and human handoff. |
| 37 | Factory Droid | Commercial SWE agent | Closed/public evidence | Commercial SWE agents emphasize end-to-end job ownership rather than framework APIs. |
| 38 | Qodo PR-Agent | Code review/change agent | Public/source-grounded | Narrow review agents win by constraining context, outputs, and repository side effects. |
| 39 | Sweep AI | Issue-to-PR agent | Public/source-grounded | Issue-to-PR agents need clear escalation when repository reality diverges from the issue text. |
| 40 | Mentat | Command-line coding agent | Public/source-grounded | Conversation plus patching remains a durable baseline for local coding agents. |
| 41 | Cursor Agent | IDE-native commercial agent | Closed/public evidence | IDE-native commercial agents win through frictionless context and editor-integrated approval. |
| 42 | Windsurf / Cascade | IDE-native commercial agent | Closed/public evidence | Cascade-style products show the value of continuous project context, not one-off prompts. |
| 43 | GitHub Copilot Agent | IDE/GitHub coding agent | Closed/public evidence | GitHub-native agents benefit from living where issues, branches, and PRs already live. |
| 44 | OpenAI Agents SDK | SDK / AgentKit | Public/source-grounded | Tracing, handoffs, and typed tools are becoming the expected SDK contract. |
| 45 | BeeAI Framework | Agent framework | Public/source-grounded | Frameworks increasingly bundle memory, tools, and observability instead of treating them as add-ons. |
| 46 | ControlFlow | Workflow/agent framework | Public/source-grounded | Task graphs with typed results make agent work composable in ordinary software systems. |
| 47 | PocketFlow | Minimal workflow framework | Public/source-grounded | Minimal node/action abstractions are useful when the goal is teachability and portability. |
| 48 | E2B / Daytona | Sandbox substrate | Public/source-grounded | The sandbox is part of the harness: file system, network, process, and snapshot policy shape behavior. |
| 49 | SuperAGI | Autonomous agent platform | Public/source-grounded | Older autonomous platforms remind us that more tools without tighter state semantics becomes chaos. |
| 50 | BabyAGI / functionz | Task-loop lineage | Public/source-grounded | The original task loop is still visible under modern agents: create tasks, execute, reprioritize, remember. |
What this means for LingTai
LingTai should not copy a single harness wholesale. The interesting direction is synthesis:
- from Claude Code / Codex / Aider / Cline, take product-grade approval loops, resumable sessions, sandbox modes, and patch discipline;
- from OpenHands / SWE-agent, take reproducible run directories, event streams, and workspace recovery;
- from LangGraph / ControlFlow / PocketFlow / Motia, take graph/checkpoint modes for deterministic workflows;
- from OpenAI Agents SDK / PydanticAI / Mastra / VoltAgent, take strict typed tools, tracing, evals, and developer dashboards;
- from Letta, take memory as a real runtime object;
- from E2B / Daytona, take sandbox policy as a product-level contract.
The field is moving toward stricter contracts around tools and traces. LingTai already has the rarer piece: agents that can live, sleep, wake, remember, spawn durable peers, and coordinate through channels. The next step is to make every part of that life cycle as inspectable and replayable as the best coding harnesses make a single patch.
Method note
The underlying study inspected 50 systems with source-first evidence where available. Open-source projects were checked against public repositories. Closed commercial systems such as Claude Code, Devin, Cursor, Windsurf/Cascade, and GitHub Copilot Agent are marked lower-confidence because their internal loops are not fully public; they are included for product and interface lessons, not as source-level claims.