18.5 Autonomous Coding：自主编码 Agent

项目定位：基于 Claude Agent SDK 的自主编码演示，展示如何构建能够跨多个会话持续工作、通过 git 持久化进度的编码 Agent。

1. 项目概述

1.1 核心创新

Autonomous Coding 解决了一个关键问题：如何让 AI Agent 在多个会话中持续完成大型编码任务？

传统单次对话：
┌──────────────────────────────────────────────────────┐
│  Session 1                                           │
│  用户: 构建一个完整的电商网站                         │
│  Claude: (上下文窗口耗尽，任务中断)                   │
└──────────────────────────────────────────────────────┘

Autonomous Coding 方案：
┌──────────────────────────────────────────────────────┐
│  Session 1 (Initializer)                             │
│  • 分析需求，生成 feature_list.json (200个功能点)    │
│  • 初始化项目结构                                    │
│  • git init                                          │
└───────────────────────┬──────────────────────────────┘
                        │ 进度持久化
                        ▼
┌──────────────────────────────────────────────────────┐
│  Session 2 (Coding)                                  │
│  • 读取 feature_list.json                            │
│  • 实现功能 1-10                                     │
│  • git commit                                        │
└───────────────────────┬──────────────────────────────┘
                        │
                        ▼
┌──────────────────────────────────────────────────────┐
│  Session 3, 4, 5... (Coding)                         │
│  • 继续实现剩余功能                                  │
│  • 每完成一批功能就 commit                            │
└──────────────────────────────────────────────────────┘

1.2 两阶段 Agent 架构

阶段	Agent 类型	职责
Phase 1	Initializer Agent	需求分析、功能拆分、项目初始化
Phase 2+	Coding Agent	逐个实现功能、运行测试、git 提交

1.3 技术栈

核心 SDK:      Claude Agent SDK (@anthropic-ai/claude-code)
编程语言:      Python 3.8+
版本控制:      Git
模型:          Claude Sonnet 4.5 (默认)
安全机制:      命令白名单 + 文件系统隔离

2. 快速开始

2.1 环境准备

bash

# 安装 Claude Code CLI（需要最新版本）
npm install -g @anthropic-ai/claude-code

# 验证安装
claude --version

# 安装 Python 依赖
pip install -r requirements.txt

# 验证 SDK
pip show claude-code-sdk

# 设置 API Key
export ANTHROPIC_API_KEY='your-api-key'

2.2 运行 Demo

bash

# 基础运行
python autonomous_agent_demo.py --project-dir ./my_project

# 限制迭代次数（用于测试）
python autonomous_agent_demo.py --project-dir ./my_project --max-iterations 3

# 指定模型
python autonomous_agent_demo.py --project-dir ./my_project --model claude-opus-4-5-20251101

2.3 运行流程

第一次运行：
1. 创建项目目录
2. 复制 app_spec.txt 到项目
3. 启动 Initializer Agent
4. 生成 feature_list.json (约5-10分钟)
5. 初始化 git 仓库
6. Agent 完成，自动进入下一阶段

后续运行（自动或手动重启）：
1. 读取 feature_list.json
2. 启动 Coding Agent
3. 实现未完成的功能
4. 更新 feature_list.json
5. git commit
6. 3秒后自动开始下一个 session

Ctrl+C 暂停：
• 进度已保存在 feature_list.json 和 git
• 重新运行相同命令即可继续

3. 项目结构

3.1 源码结构

autonomous-coding/
├── autonomous_agent_demo.py   # 主入口
├── agent.py                   # Agent 会话逻辑
├── client.py                  # Claude SDK 客户端配置
├── security.py                # 安全策略（命令白名单）
├── progress.py                # 进度追踪工具
├── prompts.py                 # Prompt 加载工具
├── prompts/
│   ├── app_spec.txt          # 应用规格说明
│   ├── initializer_prompt.md # 初始化阶段 Prompt
│   └── coding_prompt.md      # 编码阶段 Prompt
└── requirements.txt

3.2 生成的项目结构

my_project/
├── feature_list.json         # 功能清单（进度源）
├── app_spec.txt              # 复制的应用规格
├── init.sh                   # 环境初始化脚本
├── claude-progress.txt       # 会话进度笔记
├── .claude_settings.json     # 安全设置
├── .git/                     # Git 仓库
└── [应用代码文件]             # Agent 生成的代码

3.3 feature_list.json 格式

json

{
  "features": [
    {
      "id": 1,
      "name": "用户注册功能",
      "description": "实现用户注册表单和验证逻辑",
      "status": "passing",  // pending | in_progress | passing | failing
      "test_command": "npm test -- --grep 'user registration'"
    },
    {
      "id": 2,
      "name": "用户登录功能",
      "description": "实现登录认证和 JWT 生成",
      "status": "pending",
      "test_command": "npm test -- --grep 'user login'"
    }
    // ... 更多功能
  ],
  "metadata": {
    "total": 200,
    "completed": 15,
    "in_progress": 1,
    "pending": 184
  }
}

4. 核心实现

4.1 主入口

python

# autonomous_agent_demo.py

import argparse
import asyncio
from agent import run_agent_session
from progress import load_progress, is_first_session

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--project-dir', default='./autonomous_demo_project')
    parser.add_argument('--max-iterations', type=int, default=None)
    parser.add_argument('--model', default='claude-sonnet-4-5-20250929')
    args = parser.parse_args()

    while True:
        # 判断是初始化还是编码阶段
        if is_first_session(args.project_dir):
            prompt_file = 'prompts/initializer_prompt.md'
        else:
            prompt_file = 'prompts/coding_prompt.md'

        # 运行 Agent 会话
        result = asyncio.run(run_agent_session(
            project_dir=args.project_dir,
            prompt_file=prompt_file,
            model=args.model,
            max_iterations=args.max_iterations
        ))

        # 检查是否完成所有功能
        progress = load_progress(args.project_dir)
        if progress['metadata']['pending'] == 0:
            print("🎉 所有功能已完成！")
            break

        # 自动继续下一个 session
        print("⏳ 3秒后开始下一个 session...")
        time.sleep(3)

if __name__ == '__main__':
    main()

4.2 Agent 会话逻辑

python

# agent.py

from claude_code_sdk import Agent, Session
from client import create_client
from security import create_security_hook

async def run_agent_session(
    project_dir: str,
    prompt_file: str,
    model: str,
    max_iterations: int = None
):
    # 加载 Prompt
    with open(prompt_file) as f:
        system_prompt = f.read()

    # 创建客户端
    client = create_client(model=model)

    # 创建安全钩子
    security_hook = create_security_hook()

    # 创建 Agent 会话
    session = Session(
        client=client,
        system_prompt=system_prompt,
        working_directory=project_dir,
        hooks=[security_hook],
    )

    # 运行 Agent
    iteration = 0
    async for event in session.run():
        iteration += 1

        if event.type == 'tool_use':
            print(f"[Tool: {event.tool_name}] {event.input[:100]}...")

        elif event.type == 'text':
            print(event.text)

        elif event.type == 'error':
            print(f"Error: {event.error}")

        # 检查迭代限制
        if max_iterations and iteration >= max_iterations:
            print(f"达到最大迭代次数: {max_iterations}")
            break

    return session.result

4.3 安全策略

python

# security.py

# 允许执行的命令白名单
ALLOWED_COMMANDS = {
    # 文件查看
    'ls', 'cat', 'head', 'tail', 'wc', 'grep',

    # Node.js
    'npm', 'node', 'npx',

    # 版本控制
    'git',

    # 进程管理（仅限开发进程）
    'ps', 'lsof', 'sleep',
    'pkill',  # 仅用于终止开发服务器
}

def create_security_hook():
    """创建安全钩子，阻止未授权命令"""

    async def security_hook(event):
        if event.type != 'tool_use':
            return True

        if event.tool_name != 'bash':
            return True

        command = event.input.get('command', '')
        first_word = command.split()[0] if command else ''

        if first_word not in ALLOWED_COMMANDS:
            print(f"🚫 命令被阻止: {command}")
            return {
                'blocked': True,
                'message': f"Command '{first_word}' is not in the allowed list"
            }

        return True

    return security_hook

4.4 客户端配置

python

# client.py

from claude_code_sdk import Client

def create_client(model: str = 'claude-sonnet-4-5-20250929'):
    """创建配置好的 Claude SDK 客户端"""

    return Client(
        model=model,
        # 启用沙箱模式
        sandbox=True,
        # 文件系统限制
        allowed_paths=[
            './my_project',  # 仅允许项目目录
        ],
        # 网络限制
        network_policy='deny_all',
    )

5. Prompt 设计

5.1 Initializer Prompt

markdown

# prompts/initializer_prompt.md

你是一个专业的软件架构师和项目初始化专家。

## 你的任务

1. 阅读 `app_spec.txt`，理解应用需求
2. 创建 `feature_list.json`，包含 200 个可测试的功能点
3. 每个功能点应该是原子性的、可独立测试的
4. 初始化项目基础结构
5. 创建 `init.sh` 环境初始化脚本
6. 初始化 git 仓库并进行首次提交

## feature_list.json 格式要求

```json
{
  "features": [
    {
      "id": 1,
      "name": "功能名称",
      "description": "详细描述",
      "status": "pending",
      "test_command": "测试命令",
      "dependencies": [其他功能ID列表]
    }
  ]
}

重要原则

功能拆分要细致，每个功能5-15分钟可完成
功能之间有清晰的依赖关系
测试命令应该是可执行的
基础设施功能放在前面

开始工作！


### 5.2 Coding Prompt

```markdown
# prompts/coding_prompt.md

你是一个高效的软件开发者，专注于增量式功能实现。

## 你的任务

1. 读取 `feature_list.json`
2. 找到下一个待实现的功能（status: "pending"）
3. 实现该功能
4. 运行测试验证
5. 如果测试通过，更新状态为 "passing"
6. 如果测试失败，更新状态为 "failing" 并记录原因
7. git commit 当前进度

## 工作流程

读取 feature_list.json ↓ 选择优先级最高的 pending 功能 ↓ 更新状态为 "in_progress" ↓ 实现代码 ↓ 运行测试 ↓ 更新状态 (passing/failing) ↓ git commit ↓ 继续下一个功能


## 重要原则

- 一次只做一个功能
- 每个功能完成后立即 commit
- 遇到阻塞及时记录
- 保持代码简洁
- 复用已有组件

继续之前的工作！

6. 时间预期

⚠️ 警告：这个 Demo 需要很长时间运行！

阶段	预计时间	说明
Initializer	5-15 分钟	生成 200 个功能点
每个 Coding Session	5-15 分钟	取决于功能复杂度
完整应用	数小时	需要多个 session

快速测试建议

bash

# 1. 减少功能数量
# 修改 prompts/initializer_prompt.md 中的 "200" 为 "20"

# 2. 限制迭代次数
python autonomous_agent_demo.py --project-dir ./test_project --max-iterations 5

7. 定制与扩展

7.1 修改应用规格

编辑 prompts/app_spec.txt：

markdown

# 我的应用规格

## 概述
构建一个任务管理 API 服务

## 功能需求
1. 用户认证（注册、登录、JWT）
2. 任务 CRUD
3. 任务分类和标签
4. 任务搜索和筛选

## 技术栈
- Node.js + Express
- MongoDB
- Jest 测试框架

## API 规范
RESTful 风格，JSON 格式

7.2 扩展命令白名单

python

# security.py

ALLOWED_COMMANDS = {
    # 原有命令...

    # 添加 Python 支持
    'python', 'python3', 'pip', 'pip3', 'pytest',

    # 添加 Docker 支持
    'docker', 'docker-compose',
}

7.3 自定义进度回调

python

# 在 agent.py 中添加

async def on_feature_complete(feature_id: int, status: str):
    """功能完成时的回调"""
    print(f"✅ Feature #{feature_id}: {status}")

    # 可以添加通知、日志等
    if status == 'failing':
        await send_alert(f"Feature {feature_id} failed!")

8. 运行生成的应用

bash

# 进入生成的项目目录
cd my_project

# 运行 Agent 创建的初始化脚本
./init.sh

# 或手动操作（典型 Node.js 项目）
npm install
npm run dev

应用通常运行在 http://localhost:3000。

9. 架构亮点

9.1 设计决策

决策	理由
两阶段 Agent	分离关注点：规划 vs 执行
Git 持久化	利用成熟的版本控制系统
JSON 进度文件	简单可读，易于解析
命令白名单	防止意外或恶意操作
自动继续	减少人工干预

9.2 安全层次

┌─────────────────────────────────────────────────────────────────┐
│                        安全防护架构                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  第一层：OS 级沙箱                                               │
│  ├─ Bash 命令在隔离环境执行                                      │
│  └─ 进程资源限制                                                 │
│                                                                  │
│  第二层：文件系统隔离                                             │
│  ├─ 仅允许操作项目目录                                           │
│  └─ 禁止访问系统敏感路径                                          │
│                                                                  │
│  第三层：命令白名单                                               │
│  ├─ 只允许预定义的安全命令                                        │
│  └─ 阻止危险操作（rm -rf、sudo 等）                               │
│                                                                  │
│  第四层：网络隔离                                                 │
│  ├─ 默认禁止网络访问                                             │
│  └─ 可按需开放特定端点                                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

10. 与其他方案对比

10.1 vs GitHub Copilot Workspace

特性	Autonomous Coding	Copilot Workspace
运行方式	本地 CLI	云端 Web
自主程度	完全自主	人工确认
持续时间	无限制	单次会话
定制性	完全可控	受限
模型	Claude	GPT-4

10.2 vs Devin

特性	Autonomous Coding	Devin
开源	✅ 完全开源	❌ 商业产品
模型选择	可切换	固定
安全控制	本地可控	云端托管
成本	API 费用	订阅制

11. 总结

Autonomous Coding 展示了一种务实的长时间编码 Agent 实现：

方面	评价
创新性	⭐⭐⭐⭐⭐ 多会话持久化方案
实用性	⭐⭐⭐⭐ 可用于真实项目
安全性	⭐⭐⭐⭐ 多层防护机制
可扩展性	⭐⭐⭐⭐ 易于定制

适用场景：

从规格文档生成完整项目
大规模代码迁移
自动化重构
批量代码修改

学习要点：

Claude Agent SDK 使用
多会话状态管理
安全沙箱设计
Git 驱动的进度追踪

下一节，我们将学习 Agents，了解最基础的 Agent 循环实现模式。

18.5 Autonomous Coding：自主编码 Agent ​

1. 项目概述 ​

1.1 核心创新 ​

1.2 两阶段 Agent 架构 ​

1.3 技术栈 ​

2. 快速开始 ​

2.1 环境准备 ​

2.2 运行 Demo ​

2.3 运行流程 ​

3. 项目结构 ​

3.1 源码结构 ​

3.2 生成的项目结构 ​

3.3 feature_list.json 格式 ​

4. 核心实现 ​

4.1 主入口 ​

4.2 Agent 会话逻辑 ​

4.3 安全策略 ​

4.4 客户端配置 ​

5. Prompt 设计 ​

5.1 Initializer Prompt ​

重要原则 ​

6. 时间预期 ​

快速测试建议 ​

7. 定制与扩展 ​

7.1 修改应用规格 ​

7.2 扩展命令白名单 ​

7.3 自定义进度回调 ​

8. 运行生成的应用 ​

9. 架构亮点 ​

9.1 设计决策 ​

9.2 安全层次 ​

10. 与其他方案对比 ​

10.1 vs GitHub Copilot Workspace ​

10.2 vs Devin ​

11. 总结 ​

18.5 Autonomous Coding：自主编码 Agent

1. 项目概述

1.1 核心创新

1.2 两阶段 Agent 架构

1.3 技术栈

2. 快速开始

2.1 环境准备

2.2 运行 Demo

2.3 运行流程

3. 项目结构

3.1 源码结构

3.2 生成的项目结构

3.3 feature_list.json 格式

4. 核心实现

4.1 主入口

4.2 Agent 会话逻辑

4.3 安全策略

4.4 客户端配置

5. Prompt 设计

5.1 Initializer Prompt

重要原则

6. 时间预期

快速测试建议

7. 定制与扩展

7.1 修改应用规格

7.2 扩展命令白名单

7.3 自定义进度回调

8. 运行生成的应用

9. 架构亮点

9.1 设计决策

9.2 安全层次

10. 与其他方案对比

10.1 vs GitHub Copilot Workspace

10.2 vs Devin

11. 总结