OpenManus 架构设计

深入理解 OpenManus 的分层架构、模块划分与设计理念

本章概览

Agent 继承层次：从 BaseAgent 到 Manus
工具系统架构：BaseTool 与 ToolCollection
流程编排：BaseFlow 与 PlanningFlow
LLM 抽象层：统一的模型调用接口
配置管理：灵活的多环境配置系统

1. 整体架构

1.1 分层设计理念

OpenManus 采用清晰的分层架构，每一层都有明确的职责：

┌─────────────────────────────────────────────────────────────┐
│                      用户接口层                              │
│         main.py  │  run_flow.py  │  run_mcp.py             │
├─────────────────────────────────────────────────────────────┤
│                      流程编排层                              │
│              BaseFlow  →  PlanningFlow                      │
├─────────────────────────────────────────────────────────────┤
│                      Agent 层                               │
│    BaseAgent → ReActAgent → ToolCallAgent → Manus          │
├─────────────────────────────────────────────────────────────┤
│                      工具层                                  │
│   BaseTool │ ToolCollection │ 各种具体工具实现               │
├─────────────────────────────────────────────────────────────┤
│                      基础设施层                              │
│       LLM  │  Memory  │  Config  │  Logger                 │
└─────────────────────────────────────────────────────────────┘

通俗比喻：想象一个公司的组织架构：

用户接口层 = 前台接待（接收客户需求）
流程编排层 = 项目经理（规划任务、分配工作）
Agent 层 = 执行团队（思考和执行具体任务）
工具层 = 各种专业工具和设备
基础设施层 = IT 系统、通讯设备（支撑一切运转）

1.2 模块依赖关系

mermaid

graph TD
    subgraph "入口层"
        M[main.py]
        RF[run_flow.py]
        RM[run_mcp.py]
    end

    subgraph "Agent 层"
        BA[BaseAgent]
        RA[ReActAgent]
        TCA[ToolCallAgent]
        MAN[Manus]
        DA[DataAnalysis]
    end

    subgraph "Flow 层"
        BF[BaseFlow]
        PF[PlanningFlow]
    end

    subgraph "Tool 层"
        BT[BaseTool]
        TC[ToolCollection]
        PE[PythonExecute]
        BU[BrowserUseTool]
        SE[StrReplaceEditor]
        MCP[MCPClients]
    end

    subgraph "基础设施"
        LLM[LLM]
        MEM[Memory]
        CFG[Config]
        SCH[Schema]
    end

    M --> MAN
    RF --> PF
    RM --> MAN

    BA --> RA
    RA --> TCA
    TCA --> MAN
    TCA --> DA

    BF --> PF
    PF --> MAN

    MAN --> TC
    TC --> BT
    BT --> PE
    BT --> BU
    BT --> SE
    TC --> MCP

    BA --> LLM
    BA --> MEM
    LLM --> CFG
    BA --> SCH

2. Agent 继承体系

2.1 BaseAgent：抽象基类

BaseAgent 是所有 Agent 的基础，定义了 Agent 的核心属性和生命周期：

python

# app/agent/base.py
class BaseAgent(BaseModel, ABC):
    """Agent 基类：管理状态和执行循环"""

    # 基本属性
    name: str                           # Agent 名称
    description: Optional[str]           # 描述信息

    # 提示词
    system_prompt: Optional[str]         # 系统提示词
    next_step_prompt: Optional[str]      # 下一步提示词

    # 核心依赖
    llm: LLM                            # 语言模型实例
    memory: Memory                       # 记忆存储
    state: AgentState                    # 当前状态

    # 执行控制
    max_steps: int = 10                  # 最大执行步数
    current_step: int = 0                # 当前步数

关键设计点：

状态机模式：Agent 有明确的状态转换

python

class AgentState(Enum):
    IDLE = "IDLE"           # 空闲，可接受新任务
    RUNNING = "RUNNING"     # 运行中
    FINISHED = "FINISHED"   # 任务完成
    ERROR = "ERROR"         # 发生错误

执行循环：run() 方法实现了核心的执行逻辑

python

async def run(self, request: Optional[str] = None) -> str:
    """执行主循环"""
    if request:
        self.update_memory("user", request)

    results = []
    async with self.state_context(AgentState.RUNNING):
        while self.current_step < self.max_steps and self.state != AgentState.FINISHED:
            self.current_step += 1
            step_result = await self.step()  # 子类实现

            # 检测是否陷入循环
            if self.is_stuck():
                self.handle_stuck_state()

            results.append(step_result)

    return "\n".join(results)

防循环机制：检测并处理重复响应

python

def is_stuck(self) -> bool:
    """检测 Agent 是否陷入循环"""
    if len(self.memory.messages) < 2:
        return False

    last_message = self.memory.messages[-1]
    # 统计相同内容出现的次数
    duplicate_count = sum(
        1 for msg in reversed(self.memory.messages[:-1])
        if msg.role == "assistant" and msg.content == last_message.content
    )

    return duplicate_count >= self.duplicate_threshold

2.2 ReActAgent：思考-行动模式

ReActAgent 实现了经典的 ReAct（Reasoning + Acting）模式：

python

# app/agent/react.py
class ReActAgent(BaseAgent, ABC):
    """ReAct 模式 Agent"""

    @abstractmethod
    async def think(self) -> bool:
        """思考：分析当前状态，决定下一步行动"""
        pass

    @abstractmethod
    async def act(self) -> str:
        """行动：执行决定的动作"""
        pass

    async def step(self) -> str:
        """单步执行：先思考，再行动"""
        should_act = await self.think()
        if not should_act:
            return "Thinking complete - no action needed"
        return await self.act()

ReAct 循环示意：

┌─────────────────────────────────────────────────┐
│                  ReAct 循环                      │
│                                                 │
│    ┌──────────┐      ┌──────────┐              │
│    │  Think   │ ───▶ │   Act    │              │
│    │  思考    │      │   行动   │              │
│    └──────────┘      └──────────┘              │
│         ▲                  │                    │
│         │                  ▼                    │
│         │            ┌──────────┐              │
│         └─────────── │ Observe  │              │
│                      │  观察    │              │
│                      └──────────┘              │
└─────────────────────────────────────────────────┘

2.3 ToolCallAgent：工具调用 Agent

ToolCallAgent 扩展了 ReActAgent，增加了工具调用能力：

python

# app/agent/toolcall.py
class ToolCallAgent(ReActAgent):
    """支持工具调用的 Agent"""

    # 可用工具集合
    available_tools: ToolCollection

    # 工具选择策略
    tool_choices: ToolChoice = ToolChoice.AUTO

    # 特殊工具（如 Terminate）
    special_tool_names: List[str]

    # 当前工具调用列表
    tool_calls: List[ToolCall]

think() 方法实现：

python

async def think(self) -> bool:
    """思考并决定使用哪些工具"""
    # 添加下一步提示
    if self.next_step_prompt:
        self.messages += [Message.user_message(self.next_step_prompt)]

    # 调用 LLM，获取工具选择
    response = await self.llm.ask_tool(
        messages=self.messages,
        system_msgs=[Message.system_message(self.system_prompt)],
        tools=self.available_tools.to_params(),
        tool_choice=self.tool_choices,
    )

    # 解析工具调用
    self.tool_calls = response.tool_calls or []

    # 记录到内存
    assistant_msg = Message.from_tool_calls(
        content=response.content,
        tool_calls=self.tool_calls
    )
    self.memory.add_message(assistant_msg)

    return bool(self.tool_calls)

act() 方法实现：

python

async def act(self) -> str:
    """执行工具调用"""
    results = []

    for command in self.tool_calls:
        result = await self.execute_tool(command)

        # 截断过长结果
        if self.max_observe:
            result = result[:self.max_observe]

        # 添加工具响应到内存
        tool_msg = Message.tool_message(
            content=result,
            tool_call_id=command.id,
            name=command.function.name,
        )
        self.memory.add_message(tool_msg)
        results.append(result)

    return "\n\n".join(results)

2.4 Manus：主 Agent 实现

Manus 是 OpenManus 的核心 Agent，整合了所有功能：

python

# app/agent/manus.py
class Manus(ToolCallAgent):
    """通用多功能 Agent"""

    name: str = "Manus"
    description: str = "能使用多种工具解决各类任务的通用 Agent"

    # 系统提示词
    system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)

    # 内置工具集
    available_tools: ToolCollection = ToolCollection(
        PythonExecute(),       # Python 代码执行
        BrowserUseTool(),      # 浏览器自动化
        StrReplaceEditor(),    # 文件编辑
        AskHuman(),            # 询问用户
        Terminate(),           # 终止任务
    )

    # MCP 客户端
    mcp_clients: MCPClients

    # 浏览器上下文助手
    browser_context_helper: BrowserContextHelper

Manus 的特色功能：

MCP 服务器动态连接：

python

async def connect_mcp_server(self, server_url: str, server_id: str = ""):
    """连接 MCP 服务器并添加其工具"""
    if use_stdio:
        await self.mcp_clients.connect_stdio(server_url, args, server_id)
    else:
        await self.mcp_clients.connect_sse(server_url, server_id)

    # 将新工具添加到可用工具集
    new_tools = [tool for tool in self.mcp_clients.tools
                 if tool.server_id == server_id]
    self.available_tools.add_tools(*new_tools)

浏览器上下文感知：

python

async def think(self) -> bool:
    """增强的思考方法，包含浏览器状态"""
    # 检测是否正在使用浏览器
    recent_messages = self.memory.messages[-3:]
    browser_in_use = any(
        tc.function.name == BrowserUseTool().name
        for msg in recent_messages if msg.tool_calls
        for tc in msg.tool_calls
    )

    # 如果使用浏览器，添加浏览器状态到提示词
    if browser_in_use:
        self.next_step_prompt = await self.browser_context_helper.format_next_step_prompt()

    return await super().think()

3. 工具系统架构

3.1 BaseTool：工具基类

所有工具都继承自 BaseTool：

python

# app/tool/base.py
class BaseTool(ABC, BaseModel):
    """工具基类"""

    name: str                          # 工具名称
    description: str                   # 工具描述
    parameters: Optional[dict] = None  # 参数 JSON Schema

    @abstractmethod
    async def execute(self, **kwargs) -> Any:
        """执行工具"""
        pass

    def to_param(self) -> Dict:
        """转换为 OpenAI function calling 格式"""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }

ToolResult：统一的执行结果：

python

class ToolResult(BaseModel):
    """工具执行结果"""

    output: Any = None              # 正常输出
    error: Optional[str] = None     # 错误信息
    base64_image: Optional[str] = None  # 图片（如截图）
    system: Optional[str] = None    # 系统消息

    def __str__(self):
        return f"Error: {self.error}" if self.error else self.output

3.2 ToolCollection：工具集合管理

python

# app/tool/tool_collection.py
class ToolCollection:
    """工具集合管理器"""

    def __init__(self, *tools: BaseTool):
        self.tools = tools
        self.tool_map = {tool.name: tool for tool in tools}

    def to_params(self) -> List[Dict]:
        """转换所有工具为参数格式"""
        return [tool.to_param() for tool in self.tools]

    async def execute(self, *, name: str, tool_input: Dict) -> ToolResult:
        """执行指定工具"""
        tool = self.tool_map.get(name)
        if not tool:
            return ToolFailure(error=f"Tool {name} is invalid")

        try:
            result = await tool(**tool_input)
            return result
        except ToolError as e:
            return ToolFailure(error=e.message)

    def add_tools(self, *tools: BaseTool):
        """动态添加工具"""
        for tool in tools:
            if tool.name not in self.tool_map:
                self.tools += (tool,)
                self.tool_map[tool.name] = tool

3.3 具体工具实现示例

PythonExecute：代码执行工具

python

# app/tool/python_execute.py
class PythonExecute(BaseTool):
    name: str = "python_execute"
    description: str = "执行 Python 代码，只有 print 输出可见"

    parameters: dict = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "要执行的 Python 代码",
            },
        },
        "required": ["code"],
    }

    async def execute(self, code: str, timeout: int = 5) -> Dict:
        """在子进程中安全执行代码"""
        with multiprocessing.Manager() as manager:
            result = manager.dict({"observation": "", "success": False})

            # 在独立进程中执行
            proc = multiprocessing.Process(
                target=self._run_code,
                args=(code, result, safe_globals)
            )
            proc.start()
            proc.join(timeout)

            # 超时处理
            if proc.is_alive():
                proc.terminate()
                return {"observation": "执行超时", "success": False}

            return dict(result)

BrowserUseTool：浏览器自动化工具

python

# app/tool/browser_use_tool.py
class BrowserUseTool(BaseTool):
    name: str = "browser_use"
    description: str = "强大的浏览器自动化工具"

    # 支持的操作类型
    parameters: dict = {
        "type": "object",
        "properties": {
            "action": {
                "type": "string",
                "enum": [
                    "go_to_url",        # 访问 URL
                    "click_element",    # 点击元素
                    "input_text",       # 输入文本
                    "scroll_down",      # 向下滚动
                    "web_search",       # 网页搜索
                    "extract_content",  # 提取内容
                    # ... 更多操作
                ],
            },
            "url": {"type": "string"},
            "index": {"type": "integer"},
            "text": {"type": "string"},
            # ... 更多参数
        },
        "required": ["action"],
    }

    # 浏览器实例（懒加载）
    browser: Optional[BrowserUseBrowser] = None
    context: Optional[BrowserContext] = None

4. 流程编排层

4.1 BaseFlow：流程基类

python

# app/flow/base.py
class BaseFlow(BaseModel, ABC):
    """执行流程基类，支持多 Agent"""

    agents: Dict[str, BaseAgent]      # Agent 字典
    tools: Optional[List] = None       # 共享工具
    primary_agent_key: Optional[str]   # 主 Agent 键

    @property
    def primary_agent(self) -> Optional[BaseAgent]:
        """获取主 Agent"""
        return self.agents.get(self.primary_agent_key)

    @abstractmethod
    async def execute(self, input_text: str) -> str:
        """执行流程"""
        pass

4.2 PlanningFlow：规划执行流程

PlanningFlow 实现了 "先规划后执行" 的模式：

python

# app/flow/planning.py
class PlanningFlow(BaseFlow):
    """规划执行流程"""

    llm: LLM                           # 用于创建计划的 LLM
    planning_tool: PlanningTool        # 规划工具
    active_plan_id: str                # 当前计划 ID
    current_step_index: Optional[int]  # 当前步骤索引

    async def execute(self, input_text: str) -> str:
        """执行规划流程"""
        # 1. 创建初始计划
        if input_text:
            await self._create_initial_plan(input_text)

        result = ""
        while True:
            # 2. 获取当前待执行步骤
            step_index, step_info = await self._get_current_step_info()

            # 3. 无更多步骤则完成
            if step_index is None:
                result += await self._finalize_plan()
                break

            # 4. 选择合适的 Agent 执行步骤
            executor = self.get_executor(step_info.get("type"))
            step_result = await self._execute_step(executor, step_info)
            result += step_result

        return result

计划创建流程：

mermaid

sequenceDiagram
    participant User as 用户
    participant PF as PlanningFlow
    participant LLM as LLM
    participant PT as PlanningTool
    participant Agent as Agent

    User->>PF: 输入任务
    PF->>LLM: 请求创建计划
    LLM->>PT: 调用 planning 工具
    PT-->>PF: 返回计划步骤

    loop 每个步骤
        PF->>Agent: 执行步骤
        Agent-->>PF: 返回结果
        PF->>PT: 更新步骤状态
    end

    PF->>LLM: 生成总结
    LLM-->>User: 返回完成报告

5. LLM 抽象层

5.1 统一的 LLM 接口

python

# app/llm.py
class LLM:
    """LLM 封装层"""

    _instances: Dict[str, "LLM"] = {}  # 单例缓存

    def __new__(cls, config_name: str = "default"):
        """单例模式，相同配置复用实例"""
        if config_name not in cls._instances:
            instance = super().__new__(cls)
            instance.__init__(config_name)
            cls._instances[config_name] = instance
        return cls._instances[config_name]

    def __init__(self, config_name: str = "default"):
        """初始化 LLM 客户端"""
        if hasattr(self, "client"):
            return  # 避免重复初始化

        # 加载配置
        llm_config = config.llm.get(config_name, config.llm["default"])
        self.model = llm_config.model
        self.max_tokens = llm_config.max_tokens

        # 创建客户端
        if self.api_type == "azure":
            self.client = AsyncAzureOpenAI(...)
        elif self.api_type == "aws":
            self.client = BedrockClient()
        else:
            self.client = AsyncOpenAI(...)

5.2 核心方法

python

class LLM:
    async def ask(self, messages, system_msgs=None, stream=True) -> str:
        """普通对话请求"""
        ...

    async def ask_with_images(self, messages, images, ...) -> str:
        """带图片的多模态请求"""
        ...

    async def ask_tool(self, messages, tools, tool_choice, ...) -> ChatCompletionMessage:
        """工具调用请求"""
        ...

5.3 Token 管理

python

class TokenCounter:
    """Token 计数器"""

    def count_message_tokens(self, messages: List[dict]) -> int:
        """计算消息列表的 token 数"""
        total = self.FORMAT_TOKENS

        for message in messages:
            tokens = self.BASE_MESSAGE_TOKENS
            tokens += self.count_text(message.get("role", ""))
            tokens += self.count_content(message.get("content"))
            tokens += self.count_tool_calls(message.get("tool_calls", []))
            total += tokens

        return total

6. 配置管理系统

6.1 配置结构

python

# app/config.py
class AppConfig(BaseModel):
    """应用配置"""
    llm: Dict[str, LLMSettings]                    # LLM 配置
    sandbox: Optional[SandboxSettings]             # 沙箱配置
    browser_config: Optional[BrowserSettings]      # 浏览器配置
    search_config: Optional[SearchSettings]        # 搜索配置
    mcp_config: Optional[MCPSettings]              # MCP 配置
    run_flow_config: Optional[RunflowSettings]     # 流程配置

6.2 配置文件示例

toml

# config/config.toml

# 全局 LLM 配置
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..."
max_tokens = 4096
temperature = 0.0

# 视觉模型配置（覆盖默认值）
[llm.vision]
model = "gpt-4o"
max_tokens = 8192

# 浏览器配置
[browser]
headless = false
disable_security = true
max_content_length = 2000

# 搜索配置
[search]
engine = "Google"
fallback_engines = ["DuckDuckGo", "Bing"]
lang = "zh"
country = "cn"

# 多 Agent 流程配置
[runflow]
use_data_analysis_agent = true

7. 架构设计总结

7.1 设计原则

原则	体现
单一职责	每个类只负责一个功能（Agent 负责执行，Tool 负责具体操作）
开闭原则	通过继承扩展（新 Agent 继承 ToolCallAgent）
依赖倒置	依赖抽象而非实现（Agent 依赖 BaseTool 接口）
组合优于继承	ToolCollection 组合多个 Tool
配置外置	所有配置通过 config.toml 管理

7.2 架构优势

模块化：各层独立，易于测试和维护
可扩展：新增工具只需实现 BaseTool
灵活配置：支持多环境、多模型配置
渐进增强：从简单对话到复杂任务编排

7.3 架构图总览

┌────────────────────────────────────────────────────────────────┐
│                         OpenManus                              │
├────────────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                     入口层                                │  │
│  │    main.py ←→ run_flow.py ←→ run_mcp.py                 │  │
│  └────────────────────────┬─────────────────────────────────┘  │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                   流程编排层                              │  │
│  │         BaseFlow ──▶ PlanningFlow                        │  │
│  │              │            │                               │  │
│  │         multi-agent   step-by-step                       │  │
│  └────────────────────────┬─────────────────────────────────┘  │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    Agent 层                               │  │
│  │    BaseAgent ──▶ ReActAgent ──▶ ToolCallAgent ──▶ Manus  │  │
│  │         │            │              │                     │  │
│  │      状态管理    think/act      工具调用                   │  │
│  └────────────────────────┬─────────────────────────────────┘  │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    工具层                                 │  │
│  │    BaseTool ←── ToolCollection                           │  │
│  │        ↑                                                  │  │
│  │   ┌────┴────┬─────────┬───────────┬──────────┐           │  │
│  │   │         │         │           │          │           │  │
│  │ Python  Browser   Editor    WebSearch    MCP             │  │
│  │ Execute   Use    Replace                 Tools           │  │
│  └────────────────────────┬─────────────────────────────────┘  │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                   基础设施层                              │  │
│  │      LLM    Memory    Schema    Config    Logger         │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

下一章：22.3 核心概念 - 深入理解 Agent、Tool、Memory 等核心抽象

OpenManus 架构设计 ​

本章概览 ​

1. 整体架构 ​

1.1 分层设计理念 ​

1.2 模块依赖关系 ​

2. Agent 继承体系 ​

2.1 BaseAgent：抽象基类 ​

2.2 ReActAgent：思考-行动模式 ​

2.3 ToolCallAgent：工具调用 Agent ​

2.4 Manus：主 Agent 实现 ​

3. 工具系统架构 ​

3.1 BaseTool：工具基类 ​

3.2 ToolCollection：工具集合管理 ​

3.3 具体工具实现示例 ​

4. 流程编排层 ​

4.1 BaseFlow：流程基类 ​

4.2 PlanningFlow：规划执行流程 ​

5. LLM 抽象层 ​

5.1 统一的 LLM 接口 ​

5.2 核心方法 ​

5.3 Token 管理 ​

6. 配置管理系统 ​

6.1 配置结构 ​

6.2 配置文件示例 ​

7. 架构设计总结 ​

7.1 设计原则 ​

7.2 架构优势 ​

7.3 架构图总览 ​

OpenManus 架构设计

本章概览

1. 整体架构

1.1 分层设计理念

1.2 模块依赖关系

2. Agent 继承体系

2.1 BaseAgent：抽象基类

2.2 ReActAgent：思考-行动模式

2.3 ToolCallAgent：工具调用 Agent

2.4 Manus：主 Agent 实现

3. 工具系统架构

3.1 BaseTool：工具基类

3.2 ToolCollection：工具集合管理

3.3 具体工具实现示例

4. 流程编排层

4.1 BaseFlow：流程基类

4.2 PlanningFlow：规划执行流程

5. LLM 抽象层

5.1 统一的 LLM 接口

5.2 核心方法

5.3 Token 管理

6. 配置管理系统

6.1 配置结构

6.2 配置文件示例

7. 架构设计总结

7.1 设计原则

7.2 架构优势

7.3 架构图总览