<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Agent Framework on Steve Sun</title><link>https://sund.site/en/tags/agent-framework/</link><description>Recent content in Agent Framework on Steve Sun</description><generator>Hugo</generator><language>en</language><copyright>© 2013-2026, Steve Sun</copyright><lastBuildDate>Tue, 26 May 2026 16:11:00 +0800</lastBuildDate><follow_challenge><feedId>41397727810093074</feedId><userId>56666701051455488</userId></follow_challenge><atom:link href="https://sund.site/en/tags/agent-framework/index.xml" rel="self" type="application/rss+xml"/><item><title>How to Design an AI Agent</title><link>https://sund.site/en/posts/2026/how-to-make-an-agent/</link><pubDate>Tue, 26 May 2026 16:11:00 +0800</pubDate><guid>https://sund.site/en/posts/2026/how-to-make-an-agent/</guid><description>&lt;p&gt;&lt;figure
 class="image-caption"
&gt;
 
 &lt;img src="https://raw.githubusercontent.com/stevedsun/blog-img/main/ai-agent-dot-matrix-header-900x383.png" alt="" loading="lazy" /&gt;
 
 &lt;figcaption&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;AI Agents will most likely be the paradigm for future AI software design, so for most developers and non-technical people just getting into vibe coding, understanding how they&amp;rsquo;re designed and the principles behind them will help you design next-generation application software more effectively.&lt;/p&gt;
&lt;p&gt;This post tries to use plain language to help you understand what an AI Agent is, what problems it solves, and which protocols and tools will come into play as part of that infrastructure.&lt;/p&gt;
&lt;p&gt;Target audience:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Vibe coders (rapid prototyping, build and iterate)&lt;/li&gt;
&lt;li&gt;Programmers&lt;/li&gt;
&lt;li&gt;Non-technical users just starting to code&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="first-principles-what-problems-does-an-agent-framework-actually-solve"&gt;First Principles: What Problems Does an Agent Framework Actually Solve?&lt;/h2&gt;
&lt;h3 id="models-are-powerful-but-unreliable"&gt;Models are powerful, but unreliable&lt;/h3&gt;
&lt;p&gt;Large Language Models (LLMs) &amp;ldquo;guess&amp;rdquo;; they don&amp;rsquo;t &amp;ldquo;guarantee.&amp;rdquo;
So you can&amp;rsquo;t treat them as deterministic programs (same input always produces same output).&lt;/p&gt;
&lt;p&gt;Problems to solve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How to bring unstable output into a controllable flow&lt;/li&gt;
&lt;li&gt;How to know where the failure is when something breaks&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="real-world-task-results-are-usually-not-simple-answers-but-complete-workflow-outputs"&gt;Real-world task results are usually not simple answers, but complete workflow outputs&lt;/h3&gt;
&lt;p&gt;Real tasks typically involve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read information&lt;/li&gt;
&lt;li&gt;Make decisions&lt;/li&gt;
&lt;li&gt;Call tools&lt;/li&gt;
&lt;li&gt;Continue deciding based on tool results&lt;/li&gt;
&lt;li&gt;Ultimately produce documents, code, or other artifacts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means an Agent&amp;rsquo;s design goal isn&amp;rsquo;t limited to &amp;ldquo;question and answer,&amp;rdquo; but is a &amp;ldquo;cyclic decision system.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="users-dont-want-to-wait-until-the-end-to-see-results"&gt;Users don&amp;rsquo;t want to wait until the end to see results&lt;/h3&gt;
&lt;p&gt;When interacting with AI, users typically want:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Visible process (streaming)&lt;/li&gt;
&lt;li&gt;Ability to interrupt (abort the current task)&lt;/li&gt;
&lt;li&gt;Ability to add instructions mid-task (steer, guide while executing)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the system must natively support real-time interaction, not one-shot black-box execution.&lt;/p&gt;
&lt;h3 id="context-keeps-growing-costs-keep-growing"&gt;Context keeps growing, costs keep growing&lt;/h3&gt;
&lt;p&gt;The longer the conversation, the larger the input, the slower the speed, the higher the cost, and it may even exceed limits.
There must be a mechanism to &amp;ldquo;compress history while preserving key information.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="one-core-serving-multiple-interaction-modes"&gt;One core serving multiple interaction modes&lt;/h3&gt;
&lt;p&gt;The same Agent must run on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Terminal UI (TUI)&lt;/li&gt;
&lt;li&gt;Remote Procedure Call (RPC)&lt;/li&gt;
&lt;li&gt;Future Web or App interfaces&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the &amp;ldquo;intelligent core&amp;rdquo; and &amp;ldquo;presentation layer&amp;rdquo; must be decoupled (independent, not bound together).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="from-problems-to-requirements-then-to-design"&gt;From Problems to Requirements, Then to Design&lt;/h2&gt;
&lt;h3 id="requirements-checklist"&gt;Requirements Checklist&lt;/h3&gt;
&lt;p&gt;A usable Agent framework must at minimum satisfy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Looping: supports &amp;ldquo;think → call tool → think again&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Observable: every step is visible to UI or logging&lt;/li&gt;
&lt;li&gt;Controllable: can pause, cancel, interrupt, resume&lt;/li&gt;
&lt;li&gt;Recoverable: retry on failure, can continue from the last session&lt;/li&gt;
&lt;li&gt;Extensible: add new tools, new models, new frontends&lt;/li&gt;
&lt;li&gt;Governable: clear boundaries on cost, context, and permissions&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="end-to-end-flowchart"&gt;End-to-End Flowchart&lt;/h3&gt;
&lt;p&gt;Going from problems to requirements, and requirements to design, we get the following flowchart:&lt;/p&gt;
&lt;div class="mermaid"&gt;flowchart TD
 A[User states goal] --&gt; B[Agent understands current task]
 B --&gt; C{Need a tool?}
 C -- No --&gt; D[Give answer directly]
 C -- Yes --&gt; E[Generate tool call request]
 E --&gt; F[Execute tool]
 F --&gt; G[Get tool result]
 G --&gt; H{Result sufficient?}
 H -- No --&gt; B
 H -- Yes --&gt; D

 D --&gt; I[Stream back to user]
 I --&gt; J[User can interrupt/add requirements]
 J --&gt; B&lt;/div&gt;
&lt;p&gt;This diagram expresses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An Agent is a closed-loop system, not a single function call.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Tools&amp;rdquo; are capability amplifiers, not accessories.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The user is in the loop, not outside it.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="overall-architecture-diagram"&gt;Overall Architecture Diagram&lt;/h3&gt;
&lt;div class="mermaid"&gt;graph LR
 subgraph Interaction Layer
 UI1[TUI/CLI]
 UI2[RPC/API]
 UI3[Web/App]
 end

 subgraph Runtime Layer
 SESSION[Session Orchestrator]
 POLICY[Policy Center: Retry/Compression/Budget]
 end

 subgraph Core Layer
 LOOP[Agent Decision Loop]
 STATE[State Management]
 EVENTS[Event Bus]
 end

 subgraph Capability Layer
 TOOLS[Tool System]
 MODEL[Model Adapter]
 MEMORY[Memory and Context Management]
 end

 UI1 --&gt; SESSION
 UI2 --&gt; SESSION
 UI3 --&gt; SESSION
 SESSION --&gt; LOOP
 SESSION --&gt; POLICY
 LOOP &lt;--&gt; STATE
 LOOP --&gt; EVENTS
 LOOP --&gt; TOOLS
 LOOP --&gt; MODEL
 POLICY &lt;--&gt; MEMORY
 MODEL --&gt; LLM[External Model Service]&lt;/div&gt;
&lt;h3 id="component-diagram-understanding-who-owns-what"&gt;Component Diagram (Understanding &amp;ldquo;Who Owns What&amp;rdquo;)&lt;/h3&gt;
&lt;div class="mermaid"&gt;flowchart LR
 USER[User]
 ORCH[Session Orchestrator]
 CORE[Agent Core]
 ADAPTER[Model Adapter]
 TOOLRUN[Tool Executor]
 OBS[Observation and Events]

 USER &lt;--&gt; ORCH
 ORCH &lt;--&gt; CORE
 CORE &lt;--&gt; ADAPTER
 CORE &lt;--&gt; TOOLRUN
 CORE --&gt; OBS
 OBS --&gt; ORCH&lt;/div&gt;
&lt;p&gt;Responsibility split:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Session Orchestrator: handles user input, session state, retry and compression policies.&lt;/li&gt;
&lt;li&gt;Agent Core: only does the &amp;ldquo;thinking loop&amp;rdquo; and &amp;ldquo;state advancement.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Model Adapter: shields differences between model providers.&lt;/li&gt;
&lt;li&gt;Tool Executor: uniformly executes local or remote tools.&lt;/li&gt;
&lt;li&gt;Observation and Events: turns the process into visible signals for UI/log systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="to-land-these-designs-what-protocols-and-foundational-patterns-are-required"&gt;To Land These Designs, What Protocols and Foundational Patterns Are Required?&lt;/h2&gt;
&lt;p&gt;This section is the &amp;ldquo;minimum necessities&amp;rdquo; to complete the design above. We need to consider which engineering practices to introduce from a protocols and design-patterns standpoint. (Like building a skyscraper, you need to define the materials, the common engineering designs you can reuse, and how to make the structure mechanically stand the test of time.)&lt;/p&gt;
&lt;p&gt;Most of these protocols are currently designed and implemented by developers on demand, but standards will likely emerge in the near future.&lt;/p&gt;
&lt;h3 id="required-protocols-skipping-any-causes-loss-of-control"&gt;Required Protocols (Skipping Any Causes Loss of Control)&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Message Protocol&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Unifies how user messages, assistant messages, and tool results are described.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Event Protocol&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Unifies how start, update, end, error, and tool execution status are described.&lt;/li&gt;
&lt;li&gt;Purpose: lets UI and logs see the &amp;ldquo;process,&amp;rdquo; not just the &amp;ldquo;outcome.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Tool Contract&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Tool name, parameter structure (Schema), and execution return format must be fixed.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Streaming Contract&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Supports incremental output (delta) to guarantee real-time user feedback.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Cancellation Contract&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Any link in the chain should respond to abort signals, avoiding &amp;ldquo;can&amp;rsquo;t stop.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Error Contract&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Failures must be structured (machine-processable), not just string error messages.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="foundational-design-patterns-to-understand"&gt;Foundational Design Patterns to Understand&lt;/h3&gt;
&lt;p&gt;For readers without programming experience, you&amp;rsquo;ll need to learn about these basic programming design patterns from other sources first.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;State Machine&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;An Agent has state transitions at every step (e.g., waiting for input → generating output → tool execution → back to generating).&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Publish/Subscribe (Pub/Sub)&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Core emits events, UI/logs subscribe to events.&lt;/li&gt;
&lt;li&gt;Benefit: core logic doesn&amp;rsquo;t depend on specific interfaces.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Adapter&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Wraps different model interfaces into a unified calling convention.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Strategy&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Retry strategies, tool concurrency strategies, compression strategies are interchangeable.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Pipeline&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Input preprocessing → model call → tool execution → post-processing is a pluggable chain.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Idempotency and Recoverability&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Repeating the same operation should not produce catastrophic side effects; failure should be recoverable.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="case-study-pi-agents-design-philosophy-and-architecture"&gt;Case Study: PI Agent&amp;rsquo;s Design Philosophy and Architecture&lt;/h2&gt;
&lt;p&gt;The above covers &amp;ldquo;general Agent framework design.&amp;rdquo; Now let&amp;rsquo;s ground it in the recently popular minimalist framework &lt;a href="https://github.com/earendil-works/pi"&gt;PI Agent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at how this framework designs an Agent.&lt;/p&gt;
&lt;h3 id="design-philosophy"&gt;Design Philosophy&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Minimal Core&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Core only handles the loop, state, events, and tool orchestration.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Pluggable Periphery&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Models, tools, retries, and context handling are all replaceable.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Process Over Outcome&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;First ensure the process is visible and controllable, then pursue &amp;ldquo;smart output.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Session Over Request&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Treat the Agent as a long-term session system, not a single API call.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="agent-core-logic-flowchart"&gt;Agent Core Logic Flowchart&lt;/h3&gt;
&lt;div class="mermaid"&gt;flowchart TD
 START[Start a session turn] --&gt; TURN[Open turn]
 TURN --&gt; CALL[Call model and stream output]
 CALL --&gt; CHECK{Tool call in output?}

 CHECK -- No --&gt; STOPCHECK{Stop?}
 CHECK -- Yes --&gt; TOOL[Execute tool batch]
 TOOL --&gt; MERGE[Write tool results back to context]
 MERGE --&gt; STOPCHECK

 STOPCHECK -- Stop --&gt; END[End and emit end event]
 STOPCHECK -- Continue --&gt; NEXT[Enter next turn]
 NEXT --&gt; TURN&lt;/div&gt;
&lt;h3 id="agent-core-component-diagram"&gt;Agent Core Component Diagram&lt;/h3&gt;
&lt;div class="mermaid"&gt;graph TD
 CORE[Agent Core]
 S[State Storage]
 L[Loop Turn Cycle]
 E[Events Emission]
 T[Tool Executor]
 M[Model Stream Call]
 Q[Queue: steer/followUp]

 CORE --&gt; S
 CORE --&gt; L
 L --&gt; M
 L --&gt; T
 L --&gt; E
 L --&gt; Q
 T --&gt; E
 M --&gt; E
 E --&gt; S&lt;/div&gt;
&lt;p&gt;The value of this structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The interaction layer only sees events, doesn&amp;rsquo;t touch core state.&lt;/li&gt;
&lt;li&gt;Model replacement doesn&amp;rsquo;t change the loop skeleton.&lt;/li&gt;
&lt;li&gt;Tool extension doesn&amp;rsquo;t break the core control flow.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Agent architecture isn&amp;rsquo;t &amp;ldquo;making the model smarter&amp;rdquo;—it&amp;rsquo;s &amp;ldquo;making an uncertain model work reliably inside a controllable system.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You can remember it as this formula:&lt;/p&gt;
&lt;p&gt;$$
\text{Usable Agent} = \text{Model Capability} \times \text{Engineering Control Capability}
$$&lt;/p&gt;
&lt;p&gt;Where engineering control capability mainly comes from:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Loop design&lt;/li&gt;
&lt;li&gt;Protocol design&lt;/li&gt;
&lt;li&gt;Event observability&lt;/li&gt;
&lt;li&gt;Recoverability and extensibility&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Judging by current trends, this will very likely be the foundational paradigm for the next generation of application software.&lt;/p&gt;</description></item></channel></rss>