5.3 Memory Schema - Profile - 详细解读

一、概述

1.1 本节简介

本节是 LangChain Academy Module-5 的第三部分，主要内容是将非结构化的记忆升级为结构化的 Profile Schema（用户资料模式），并使用 Trustcall 库来高效地创建和更新这些结构化记忆。

1.2 从 5.2 到 5.3 的进化

5.2 节的局限性：

python

# 5.2 的记忆格式（自由文本）
{
    "memory": "- User's name is Lance\n- Likes biking in San Francisco"
}

问题：

❌ 没有结构，难以查询和处理
❌ 每次更新都需要重新生成整个文本
❌ 容易丢失信息
❌ 难以验证数据完整性

5.3 的改进（结构化 Schema）：

python

# 5.3 的记忆格式（结构化）
{
    "user_name": "Lance",
    "user_location": "San Francisco",
    "interests": ["biking", "bakeries"]
}

优势：

✅ 清晰的数据结构
✅ 增量更新（只更新变化的部分）
✅ 数据验证和类型检查
✅ 易于查询和处理

1.3 学习目标

通过学习本节内容，你将掌握：

结构化数据的定义：
- TypedDict
- Pydantic BaseModel
- 字段类型和验证
结构化输出：
- with_structured_output() 方法
- 强制模型遵循 Schema
- 处理复杂 Schema 的挑战
Trustcall 库：
- 创建和更新结构化记忆
- JSON Patch 机制
- 处理复杂嵌套 Schema
实际应用：
- 构建带有结构化 Profile 的聊天机器人
- 高效的记忆更新策略

1.4 本节的核心价值

非结构化记忆 (5.2)         结构化记忆 (5.3)
     ↓                           ↓
自由文本                    定义 Schema
     ↓                           ↓
重新生成                    增量更新
     ↓                           ↓
容易丢失信息               精确控制更新
     ↓                           ↓
难以查询                    易于处理

二、结构化数据基础

2.1 什么是结构化数据？

结构化数据：具有明确定义的格式和类型的数据，可以被程序轻松解析和处理。

对比：

特性	非结构化	结构化
格式	自由文本	固定格式
类型	无类型约束	严格类型
验证	难以验证	自动验证
查询	需要解析	直接访问
示例	"Lance喜欢骑车"	`{"name": "Lance", "hobby": "biking"}`

2.2 Python 中的结构化数据类型

Python 提供多种结构化数据的方式：

2.2.1 普通字典（Dict）

python

# 普通字典
user = {
    "name": "Lance",
    "age": 30,
    "interests": ["biking", "coffee"]
}

# 优点：简单、灵活
# 缺点：无类型检查、易出错

2.2.2 TypedDict

python

from typing import TypedDict, List

class UserProfile(TypedDict):
    """用户资料的类型化字典"""
    user_name: str                # 字符串类型
    interests: List[str]          # 字符串列表

# 创建实例
user: UserProfile = {
    "user_name": "Lance",
    "interests": ["biking", "coffee"]
}

TypedDict 的特点：

✅ 提供类型提示
✅ IDE 支持（代码补全）
✅ 类型检查（mypy 等工具）
❌ 运行时不验证
❌ 无默认值支持

2.2.3 Pydantic BaseModel

python

from pydantic import BaseModel, Field
from typing import List

class UserProfile(BaseModel):
    """用户资料的 Pydantic 模型"""
    user_name: str = Field(description="用户的首选名称")
    interests: List[str] = Field(
        description="用户的兴趣列表",
        default_factory=list
    )

# 创建实例
user = UserProfile(
    user_name="Lance",
    interests=["biking"]
)

# 自动验证
try:
    invalid_user = UserProfile(
        user_name=123,  # 错误：应该是字符串
        interests="biking"  # 错误：应该是列表
    )
except Exception as e:
    print(f"验证失败：{e}")

Pydantic 的特点：

✅ 运行时数据验证
✅ 自动类型转换
✅ 默认值支持
✅ 详细的错误信息
✅ JSON 序列化/反序列化
✅ 文档生成

Python 知识点：

类型注解（Type Hints）：

python

user_name: str              # 字符串类型
age: int                    # 整数类型
interests: List[str]        # 字符串列表
location: Optional[str]     # 可选字符串（可以是 None）

Field 函数：

python

Field(
    description="字段描述",    # 用于文档和 LLM 提示
    default="默认值",          # 默认值
    default_factory=list,     # 默认值工厂函数
    ge=0,                     # 大于等于 0（数字验证）
    max_length=100            # 最大长度（字符串验证）
)

2.3 为什么使用 Pydantic？

在 LangChain 生态系统中，Pydantic 是首选的结构化数据定义方式：

与 LLM 集成：
- 自动生成 JSON Schema 供 LLM 理解
- with_structured_output() 原生支持
数据验证：
- 防止无效数据进入系统
- 提供清晰的错误信息
文档化：
- description 字段帮助 LLM 理解
- 自动生成 API 文档
序列化：
- model_dump()：转换为字典
- model_dump_json()：转换为 JSON 字符串

三、定义 Profile Schema

3.1 简单的 UserProfile（TypedDict）

让我们从一个简单的 TypedDict 开始：

python

from typing import TypedDict, List

class UserProfile(TypedDict):
    """用户资料的类型化字典"""
    user_name: str                # 用户的首选名称
    interests: List[str]          # 用户的兴趣列表

字段说明：

user_name：存储用户的名字（如 "Lance"）
interests：存储用户的兴趣列表（如 ["biking", "coffee"]）

3.2 创建 TypedDict 实例

python

# 创建用户资料实例
user_profile: UserProfile = {
    "user_name": "Lance",
    "interests": ["biking", "technology", "coffee"]
}

print(user_profile)
# 输出：{'user_name': 'Lance', 'interests': ['biking', 'technology', 'coffee']}

Python 知识点：

类型注解的作用：

python

user_profile: UserProfile = {...}
# 这告诉 IDE 和类型检查工具：
# - user_profile 应该是 UserProfile 类型
# - 提供代码补全
# - 静态类型检查

3.3 保存 Schema 到 Store

python

import uuid
from langgraph.store.memory import InMemoryStore

# 初始化 Store
in_memory_store = InMemoryStore()

# 定义命名空间
user_id = "1"
namespace_for_memory = (user_id, "memory")

# 保存 Profile
key = "user_profile"
value = user_profile  # TypedDict 实例（本质上是字典）
in_memory_store.put(namespace_for_memory, key, value)

重要提示：

Store 接受任何 Python 字典作为 value
TypedDict 在运行时就是普通字典
可以直接保存到 Store

3.4 从 Store 检索 Schema

python

# 搜索命名空间中的所有数据
for m in in_memory_store.search(namespace_for_memory):
    print(m.dict())

输出：

python

{
    'value': {
        'user_name': 'Lance',
        'interests': ['biking', 'technology', 'coffee']
    },
    'key': 'user_profile',
    'namespace': ['1', 'memory'],
    'created_at': '2024-11-04T23:37:34.871675+00:00',
    'updated_at': '2024-11-04T23:37:34.871680+00:00'
}

获取特定对象：

python

# 通过命名空间和键获取
profile = in_memory_store.get(namespace_for_memory, "user_profile")
print(profile.value)
# 输出：{'user_name': 'Lance', 'interests': ['biking', 'technology', 'coffee']}

四、结构化输出（Structured Output）

4.1 什么是结构化输出？

结构化输出：强制 LLM 的输出符合预定义的 Schema，而不是自由文本。

为什么需要结构化输出？

用户输入："My name is Lance, I like to bike."

自由文本输出：
"The user's name is Lance and he enjoys biking."

结构化输出：
{
    "user_name": "Lance",
    "interests": ["biking"]
}

优势：

✅ 输出格式可预测
✅ 易于解析和处理
✅ 数据完整性保证
✅ 直接用于下游任务

4.2 使用 with_structured_output()

LangChain 的聊天模型提供 with_structured_output() 方法：

python

from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

# 初始化模型
model = ChatOpenAI(model="gpt-4o", temperature=0)

# 绑定 Schema 到模型
model_with_structure = model.with_structured_output(UserProfile)

# 调用模型
structured_output = model_with_structure.invoke([
    HumanMessage("My name is Lance, I like to bike.")
])

print(structured_output)
# 输出：{'user_name': 'Lance', 'interests': ['biking']}

工作原理：

1. 用户消息
   ↓
2. with_structured_output() 将 Schema 转换为工具定义
   ↓
3. 模型使用工具调用（Tool Calling）生成结构化输出
   ↓
4. 自动解析为 Python 对象（字典或 Pydantic 模型）

Python 知识点：

方法链（Method Chaining）：

python

model                          # ChatOpenAI 实例
  .with_structured_output(...)  # 返回新的 Runnable
  .invoke(...)                  # 调用并获取结果

4.3 在聊天机器人中使用结构化输出

让我们将其集成到之前的聊天机器人中：

python

def write_memory(state: MessagesState, config: RunnableConfig, store: BaseStore):
    """反思聊天历史并保存记忆到 store"""

    # 获取用户 ID
    user_id = config["configurable"]["user_id"]

    # 检索现有记忆
    namespace = ("memory", user_id)
    existing_memory = store.get(namespace, "user_memory")

    # 格式化现有记忆
    if existing_memory and existing_memory.value:
        memory_dict = existing_memory.value
        formatted_memory = (
            f"Name: {memory_dict.get('user_name', 'Unknown')}\n"
            f"Interests: {', '.join(memory_dict.get('interests', []))}"
        )
    else:
        formatted_memory = None

    # 创建指令
    system_msg = CREATE_MEMORY_INSTRUCTION.format(memory=formatted_memory)

    # 使用 with_structured_output 生成结构化记忆
    new_memory = model_with_structure.invoke([
        SystemMessage(content=system_msg)
    ] + state['messages'])

    # 保存到 store
    key = "user_memory"
    store.put(namespace, key, new_memory)

关键改变：

使用 model_with_structure 而不是普通的 model
new_memory 自动是符合 Schema 的字典
直接保存到 Store，无需额外处理

五、复杂 Schema 的挑战

5.1 简单 Schema vs 复杂 Schema

简单 Schema（容易提取）：

python

class UserProfile(BaseModel):
    user_name: str
    interests: List[str]

复杂嵌套 Schema（难以提取）：

python

class OutputFormat(BaseModel):
    preference: str
    sentence_preference_revealed: str

class TelegramPreferences(BaseModel):
    preferred_encoding: Optional[List[OutputFormat]] = None
    favorite_telegram_operators: Optional[List[OutputFormat]] = None
    preferred_telegram_paper: Optional[List[OutputFormat]] = None

class MorseCode(BaseModel):
    preferred_key_type: Optional[List[OutputFormat]] = None
    favorite_morse_abbreviations: Optional[List[OutputFormat]] = None

class Semaphore(BaseModel):
    preferred_flag_color: Optional[List[OutputFormat]] = None
    semaphore_skill_level: Optional[List[OutputFormat]] = None

class TrustFallPreferences(BaseModel):
    preferred_fall_height: Optional[List[OutputFormat]] = None
    trust_level: Optional[List[OutputFormat]] = None
    preferred_catching_technique: Optional[List[OutputFormat]] = None

class CommunicationPreferences(BaseModel):
    telegram: TelegramPreferences
    morse_code: MorseCode
    semaphore: Semaphore

class UserPreferences(BaseModel):
    communication_preferences: CommunicationPreferences
    trust_fall_preferences: TrustFallPreferences

class TelegramAndTrustFallPreferences(BaseModel):
    pertinent_user_preferences: UserPreferences

复杂度分析：

6 层嵌套
10+ 个类定义
大量可选字段
复杂的依赖关系

5.2 使用 with_structured_output() 的失败案例

让我们尝试提取这个复杂 Schema：

python

from pydantic import ValidationError

# 绑定复杂 Schema
model_with_structure = model.with_structured_output(TelegramAndTrustFallPreferences)

# 对话内容
conversation = """Operator: How may I assist with your telegram, sir?
Customer: I need to send a message about our trust fall exercise.
Operator: Certainly. Morse code or standard encoding?
Customer: Morse, please. I love using a straight key.
Operator: Excellent. What's your message?
Customer: Tell him I'm ready for a higher fall, and I prefer the diamond formation for catching.
Operator: Done. Shall I use our "Daredevil" paper for this daring message?
Customer: Perfect! Send it by your fastest carrier pigeon.
Operator: It'll be there within the hour, sir."""

# 尝试提取
try:
    result = model_with_structure.invoke(
        f"""Extract the preferences from the following conversation:
        <convo>
        {conversation}
        </convo>"""
    )
except ValidationError as e:
    print(e)

输出（错误）：

1 validation error for TelegramAndTrustFallPreferences
pertinent_user_preferences.communication_preferences.semaphore
  Input should be a valid dictionary or instance of Semaphore [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type

失败原因：

❌ Schema 太复杂，模型难以理解
❌ 嵌套层级太深
❌ 可选字段处理不当（返回了 None 而不是对象）
❌ 即使使用 GPT-4o 这样的高级模型也会失败

5.3 为什么需要 Trustcall？

问题总结：

问题	with_structured_output	Trustcall
复杂 Schema	❌ 容易失败	✅ 专门处理
更新效率	❌ 重新生成	✅ 增量更新
信息丢失	❌ 可能丢失	✅ 保留现有信息
Token 消耗	❌ 每次生成全部	✅ 只生成变化部分
错误恢复	❌ 需要重试	✅ 自动修正

六、Trustcall 深入解析

6.1 什么是 Trustcall？

Trustcall 是一个专门用于创建和更新 JSON Schema 的开源库，由 LangChain 团队的 Will Fu-Hinthorn 开发。

核心理念：

增量更新：使用 JSON Patch 只更新变化的部分
智能提取：更好地处理复杂嵌套 Schema
自我修正：自动处理验证错误
保留信息：不会丢失现有数据

6.2 JSON Patch 简介

JSON Patch 是一个描述 JSON 文档变更的标准（RFC 6902）。

示例：

原始数据：

json

{
    "user_name": "Lance",
    "interests": ["biking"]
}

JSON Patch 操作：

json

[
    {
        "op": "add",
        "path": "/interests/-",
        "value": "bakeries"
    }
]

结果：

json

{
    "user_name": "Lance",
    "interests": ["biking", "bakeries"]
}

JSON Patch 操作类型：

操作	说明	示例
add	添加字段或数组元素	`{"op": "add", "path": "/email", "value": "..."}`
remove	删除字段或数组元素	`{"op": "remove", "path": "/interests/0"}`
replace	替换值	`{"op": "replace", "path": "/name", "value": "..."}`
move	移动值	`{"op": "move", "from": "/a", "path": "/b"}`
copy	复制值	`{"op": "copy", "from": "/a", "path": "/b"}`
test	测试值	`{"op": "test", "path": "/name", "value": "Lance"}`

6.3 Trustcall 的工作原理

┌─────────────────────────────────────────────┐
│           Trustcall Workflow                │
│                                             │
│  1. 接收输入                                │
│     - 新的对话消息                          │
│     - 现有的 Schema（如果有）               │
│                                             │
│  2. 分析变化                                │
│     - 识别新信息                            │
│     - 识别需要更新的字段                    │
│                                             │
│  3. 生成 JSON Patch                         │
│     - 创建精确的更新操作                    │
│     - 只修改变化的部分                      │
│                                             │
│  4. 应用 Patch                              │
│     - 更新现有 Schema                       │
│     - 保留未变化的信息                      │
│                                             │
│  5. 验证结果                                │
│     - 确保符合 Schema 定义                  │
│     - 如果失败，自动修正                    │
└─────────────────────────────────────────────┘

6.4 创建 Trustcall Extractor

python

from trustcall import create_extractor
from pydantic import BaseModel, Field

# 定义 Schema
class UserProfile(BaseModel):
    """用户资料"""
    user_name: str = Field(description="用户的首选名称")
    interests: List[str] = Field(description="用户的兴趣列表")

# 初始化模型
model = ChatOpenAI(model="gpt-4o", temperature=0)

# 创建 Trustcall 提取器
trustcall_extractor = create_extractor(
    model,
    tools=[UserProfile],        # 可以传入多个 Schema
    tool_choice="UserProfile"   # 强制使用 UserProfile 工具
)

参数说明：

model：要使用的语言模型
tools：Schema 列表（可以是 Pydantic 模型、TypedDict、或 JSON Schema）
tool_choice：指定使用哪个工具
- 如果不指定，模型可以选择任意工具或不使用工具
- 指定后，强制模型使用该工具

Python 知识点：

列表作为参数：

python

tools=[UserProfile]
# 等价于：
tools=[UserProfile,]  # 单元素列表，后面的逗号可选

6.5 基础使用：提取信息

python

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

# 对话内容
conversation = [
    HumanMessage(content="Hi, I'm Lance."),
    AIMessage(content="Nice to meet you, Lance."),
    HumanMessage(content="I really like biking around San Francisco.")
]

# 提取指令
system_msg = "Extract the user profile from the following conversation"

# 调用提取器
result = trustcall_extractor.invoke({
    "messages": [SystemMessage(content=system_msg)] + conversation
})

返回结果结构：

python

result = {
    "messages": [...],           # AI 的工具调用消息
    "responses": [...],          # 解析后的 Pydantic 对象
    "response_metadata": [...]   # 元数据（ID 等）
}

查看消息：

python

for m in result["messages"]:
    m.pretty_print()

输出：

================================== Ai Message ==================================
Tool Calls:
  UserProfile (call_spGGUsoaUFXU7oOrUNCASzfL)
 Call ID: call_spGGUsoaUFXU7oOrUNCASzfL
  Args:
    user_name: Lance
    interests: ['biking around San Francisco']

查看提取的 Schema：

python

schema = result["responses"]
print(schema)
# 输出：[UserProfile(user_name='Lance', interests=['biking around San Francisco'])]

# 转换为字典
print(schema[0].model_dump())
# 输出：{'user_name': 'Lance', 'interests': ['biking around San Francisco']}

查看元数据：

python

print(result["response_metadata"])
# 输出：[{'id': 'call_spGGUsoaUFXU7oOrUNCASzfL'}]

6.6 高级使用：更新现有 Schema

Trustcall 的真正威力在于更新现有 Schema。

6.6.1 准备现有 Schema

python

# 保存现有 Schema 为字典
existing_schema_dict = schema[0].model_dump()
# {'user_name': 'Lance', 'interests': ['biking around San Francisco']}

Python 知识点：

model_dump() vs dict()：

python

# Pydantic v2
schema.model_dump()      # 推荐方式

# Pydantic v1（旧版）
schema.dict()            # 已弃用

# 对于普通对象
vars(obj)                # 获取对象的 __dict__

6.6.2 更新操作

python

# 更新对话
updated_conversation = [
    HumanMessage(content="Hi, I'm Lance."),
    AIMessage(content="Nice to meet you, Lance."),
    HumanMessage(content="I really like biking around San Francisco."),
    AIMessage(content="San Francisco is a great city! Where do you go after biking?"),
    HumanMessage(content="I really like to go to a bakery after biking."),
]

# 更新指令
system_msg = "Update the memory (JSON doc) to incorporate new information from the following conversation"

# 调用提取器，传入现有 Schema
result = trustcall_extractor.invoke(
    {"messages": [SystemMessage(content=system_msg)] + updated_conversation},
    {"existing": {"UserProfile": existing_schema_dict}}
)

关键参数：

python

{"existing": {"UserProfile": existing_schema_dict}}

"existing"：告诉 Trustcall 这是更新操作
"UserProfile"：Schema 的名称（工具名称）
existing_schema_dict：现有的 Schema 数据（字典）

查看更新结果：

python

for m in result["messages"]:
    m.pretty_print()

输出：

================================== Ai Message ==================================
Tool Calls:
  UserProfile (call_WeZl0ACfQStxblim0ps8LNKT)
 Call ID: call_WeZl0ACfQStxblim0ps8LNKT
  Args:
    user_name: Lance
    interests: ['biking', 'visiting bakeries']

注意变化：

原来：['biking around San Francisco']
更新后：['biking', 'visiting bakeries']
模型智能地合并了信息

获取更新后的 Schema：

python

updated_schema = result["responses"][0]
print(updated_schema.model_dump())
# 输出：{'user_name': 'Lance', 'interests': ['biking', 'visiting bakeries']}

6.7 处理复杂 Schema

现在让我们用 Trustcall 处理之前失败的复杂 Schema：

python

# 创建复杂 Schema 的提取器
bound = create_extractor(
    model,
    tools=[TelegramAndTrustFallPreferences],
    tool_choice="TelegramAndTrustFallPreferences",
)

# 对话内容（同之前）
conversation = """Operator: How may I assist with your telegram, sir?
Customer: I need to send a message about our trust fall exercise.
Operator: Certainly. Morse code or standard encoding?
Customer: Morse, please. I love using a straight key.
Operator: Excellent. What's your message?
Customer: Tell him I'm ready for a higher fall, and I prefer the diamond formation for catching.
Operator: Done. Shall I use our "Daredevil" paper for this daring message?
Customer: Perfect! Send it by your fastest carrier pigeon.
Operator: It'll be there within the hour, sir."""

# 提取
result = bound.invoke(
    f"""Extract the preferences from the following conversation:
    <convo>
    {conversation}
    </convo>"""
)

# 成功提取！
preferences = result["responses"][0]
print(preferences)

成功输出：

python

TelegramAndTrustFallPreferences(
    pertinent_user_preferences=UserPreferences(
        communication_preferences=CommunicationPreferences(
            telegram=TelegramPreferences(
                preferred_encoding=[OutputFormat(
                    preference='standard encoding',
                    sentence_preference_revealed='standard encoding'
                )],
                favorite_telegram_operators=None,
                preferred_telegram_paper=[OutputFormat(
                    preference='Daredevil',
                    sentence_preference_revealed='Daredevil'
                )]
            ),
            morse_code=MorseCode(
                preferred_key_type=[OutputFormat(
                    preference='straight key',
                    sentence_preference_revealed='straight key'
                )],
                favorite_morse_abbreviations=None
            ),
            semaphore=Semaphore(
                preferred_flag_color=None,
                semaphore_skill_level=None
            )
        ),
        trust_fall_preferences=TrustFallPreferences(
            preferred_fall_height=[OutputFormat(
                preference='higher',
                sentence_preference_revealed='higher'
            )],
            trust_level=None,
            preferred_catching_technique=[OutputFormat(
                preference='diamond formation',
                sentence_preference_revealed='diamond formation'
            )]
        )
    )
)

成功的关键：

✅ Trustcall 正确处理了嵌套结构
✅ 正确初始化了所有必需的字段（如 Semaphore）
✅ 提取了所有相关信息
✅ 即使是 6 层嵌套也能正确处理

6.8 Trustcall vs with_structured_output 对比

特性	with_structured_output	Trustcall
简单 Schema	✅ 效果好	✅ 效果好
复杂嵌套 Schema	❌ 容易失败	✅ 专门优化
更新策略	❌ 完全重新生成	✅ 增量更新（JSON Patch）
Token 效率	❌ 每次生成全部	✅ 只生成变化部分
信息保留	⚠️ 可能丢失	✅ 保留未变化信息
错误恢复	❌ 需要手动处理	✅ 自动修正
设置复杂度	✅ 简单	⚠️ 稍复杂
适用场景	简单 Schema、创建操作	复杂 Schema、更新操作

七、构建带有 Profile Schema 的聊天机器人

7.1 完整的 Schema 定义

python

from pydantic import BaseModel, Field
from typing import List

class UserProfile(BaseModel):
    """用户资料"""
    user_name: str = Field(description="用户的首选名称")
    user_location: str = Field(description="用户的位置")
    interests: List[str] = Field(description="用户的兴趣列表")

为什么添加 user_location？

更完整的用户资料
帮助提供本地化建议
展示如何处理多个字段

7.2 创建 Trustcall Extractor

python

from langchain_openai import ChatOpenAI
from trustcall import create_extractor

# 初始化模型
model = ChatOpenAI(model="gpt-4o", temperature=0)

# 创建提取器
trustcall_extractor = create_extractor(
    model,
    tools=[UserProfile],
    tool_choice="UserProfile",  # 强制使用 UserProfile
)

7.3 定义系统提示词

python

# 聊天机器人提示词
MODEL_SYSTEM_MESSAGE = """You are a helpful assistant with memory that provides information about the user.
If you have memory for this user, use it to personalize your responses.
Here is the memory (it may be empty): {memory}"""

# Trustcall 提取指令
TRUSTCALL_INSTRUCTION = """Create or update the memory (JSON doc) to incorporate information from the following conversation:"""

提示词设计要点：

MODEL_SYSTEM_MESSAGE：
- 简洁明了
- 强调个性化
- 提供记忆占位符
TRUSTCALL_INSTRUCTION：
- 明确任务（创建或更新）
- 简洁（Trustcall 会自动处理细节）

7.4 call_model 节点

python

from langgraph.graph import MessagesState
from langchain_core.runnables.config import RunnableConfig
from langgraph.store.base import BaseStore
from langchain_core.messages import SystemMessage

def call_model(state: MessagesState, config: RunnableConfig, store: BaseStore):
    """从 store 加载记忆并用它来个性化聊天机器人的响应"""

    # 获取用户 ID
    user_id = config["configurable"]["user_id"]

    # 从 store 检索记忆
    namespace = ("memory", user_id)
    existing_memory = store.get(namespace, "user_memory")

    # 格式化记忆为系统提示
    if existing_memory and existing_memory.value:
        memory_dict = existing_memory.value
        formatted_memory = (
            f"Name: {memory_dict.get('user_name', 'Unknown')}\n"
            f"Location: {memory_dict.get('user_location', 'Unknown')}\n"
            f"Interests: {', '.join(memory_dict.get('interests', []))}"
        )
    else:
        formatted_memory = None

    # 在系统提示中格式化记忆
    system_msg = MODEL_SYSTEM_MESSAGE.format(memory=formatted_memory)

    # 使用记忆和聊天历史生成响应
    response = model.invoke([
        SystemMessage(content=system_msg)
    ] + state["messages"])

    return {"messages": response}

代码逐行解析：

第 7-9 行：获取用户 ID 和检索记忆

python

user_id = config["configurable"]["user_id"]
namespace = ("memory", user_id)
existing_memory = store.get(namespace, "user_memory")

第 11-19 行：格式化记忆

python

if existing_memory and existing_memory.value:
    memory_dict = existing_memory.value
    formatted_memory = (
        f"Name: {memory_dict.get('user_name', 'Unknown')}\n"
        f"Location: {memory_dict.get('user_location', 'Unknown')}\n"
        f"Interests: {', '.join(memory_dict.get('interests', []))}"
    )
else:
    formatted_memory = None

为什么使用 .get() 方法？

python

memory_dict.get('user_name', 'Unknown')
# 如果 'user_name' 键存在，返回其值
# 如果不存在，返回默认值 'Unknown'
# 避免 KeyError

格式化示例：

python

# memory_dict = {
#     'user_name': 'Lance',
#     'user_location': 'San Francisco',
#     'interests': ['biking', 'bakeries']
# }

formatted_memory = """Name: Lance
Location: San Francisco
Interests: biking, bakeries"""

第 21-27 行：生成响应

python

system_msg = MODEL_SYSTEM_MESSAGE.format(memory=formatted_memory)
response = model.invoke([
    SystemMessage(content=system_msg)
] + state["messages"])
return {"messages": response}

7.5 write_memory 节点

python

def write_memory(state: MessagesState, config: RunnableConfig, store: BaseStore):
    """反思聊天历史并保存记忆到 store"""

    # 获取用户 ID
    user_id = config["configurable"]["user_id"]

    # 从 store 检索现有记忆
    namespace = ("memory", user_id)
    existing_memory = store.get(namespace, "user_memory")

    # 获取 Profile 作为值，并转换为 Trustcall 格式
    existing_profile = {"UserProfile": existing_memory.value} if existing_memory else None

    # 调用 Trustcall 提取器
    result = trustcall_extractor.invoke({
        "messages": [SystemMessage(content=TRUSTCALL_INSTRUCTION)] + state["messages"],
        "existing": existing_profile
    })

    # 获取更新后的 Profile 作为 JSON 对象
    updated_profile = result["responses"][0].model_dump()

    # 保存更新后的 Profile
    key = "user_memory"
    store.put(namespace, key, updated_profile)

代码逐行解析：

第 11 行：准备现有 Profile

python

existing_profile = {"UserProfile": existing_memory.value} if existing_memory else None

为什么使用这个格式？

python

# Trustcall 期望的格式：
{
    "工具名称": {实际数据}
}

# 示例：
{
    "UserProfile": {
        "user_name": "Lance",
        "user_location": "San Francisco",
        "interests": ["biking"]
    }
}

第 14-18 行：调用 Trustcall

python

result = trustcall_extractor.invoke({
    "messages": [SystemMessage(content=TRUSTCALL_INSTRUCTION)] + state["messages"],
    "existing": existing_profile
})

参数说明：

messages：系统指令 + 对话历史
existing：现有的 Profile（如果有）

第 21 行：获取更新后的 Profile

python

updated_profile = result["responses"][0].model_dump()

result["responses"] 是 Pydantic 模型列表
[0] 获取第一个（也是唯一的）模型
.model_dump() 转换为字典

第 23-25 行：保存到 Store

python

key = "user_memory"
store.put(namespace, key, updated_profile)

7.6 构建完整的图

python

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.store.memory import InMemoryStore

# 定义图
builder = StateGraph(MessagesState)
builder.add_node("call_model", call_model)
builder.add_node("write_memory", write_memory)
builder.add_edge(START, "call_model")
builder.add_edge("call_model", "write_memory")
builder.add_edge("write_memory", END)

# 长期记忆（跨线程）
across_thread_memory = InMemoryStore()

# 短期记忆（线程内）
within_thread_memory = MemorySaver()

# 编译图
graph = builder.compile(
    checkpointer=within_thread_memory,
    store=across_thread_memory
)

图结构：

START → call_model → write_memory → END

八、实战演示

8.1 第一次交互：创建 Profile

python

# 配置
config = {
    "configurable": {
        "thread_id": "1",
        "user_id": "1"
    }
}

# 用户输入
input_messages = [HumanMessage(content="Hi, my name is Lance")]

# 运行图
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

输出：

================================ Human Message =================================
Hi, my name is Lance

================================== Ai Message ==================================
Hello, Lance! It's nice to meet you. How can I assist you today?

幕后发生了什么：

call_model：没有找到记忆，使用默认响应
write_memory：Trustcall 提取了名字并创建 Profile

8.2 第二次交互：更新 Profile

python

# 用户输入
input_messages = [HumanMessage(content="I like to bike around San Francisco")]

# 运行图
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

输出：

================================ Human Message =================================
I like to bike around San Francisco

================================== Ai Message ==================================
That sounds like a great way to explore the city! San Francisco has some
beautiful routes and views. Do you have any favorite trails or spots you like
to visit while biking?

分析：

AI 没有明确提到用户的名字（因为上下文中没有必要）
但记忆已经被更新

8.3 查看 Profile

python

# 获取记忆
user_id = "1"
namespace = ("memory", user_id)
existing_memory = across_thread_memory.get(namespace, "user_memory")

print(existing_memory.dict())

输出：

python

{
    'value': {
        'user_name': 'Lance',
        'user_location': 'San Francisco',
        'interests': ['biking']
    },
    'key': 'user_memory',
    'namespace': ['memory', '1'],
    'created_at': '2024-11-04T23:51:17.662428+00:00',
    'updated_at': '2024-11-04T23:51:41.697652+00:00'
}

关键观察：

user_name：从第一次对话中提取
user_location：从第二次对话中推断（"around San Francisco"）
interests：从第二次对话中提取
updated_at 更新了时间戳

8.4 继续更新 Profile

python

# 用户输入
input_messages = [HumanMessage(content="I also enjoy going to bakeries")]

# 运行图
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

输出：

================================ Human Message =================================
I also enjoy going to bakeries

================================== Ai Message ==================================
Biking and visiting bakeries sounds like a delightful combination! San Francisco
has some fantastic bakeries. Do you have any favorites, or are you looking for
new recommendations to try out?

查看更新后的 Profile：

python

existing_memory = across_thread_memory.get(namespace, "user_memory")
print(existing_memory.value)

输出：

python

{
    'user_name': 'Lance',
    'user_location': 'San Francisco',
    'interests': ['biking', 'bakeries']  # 新增了 'bakeries'
}

Trustcall 的智能更新：

✅ 保留了 user_name 和 user_location
✅ 只更新了 interests 列表
✅ 避免了重新生成整个 Profile

8.5 跨会话测试

python

# 新的线程，相同的用户
config = {
    "configurable": {
        "thread_id": "2",  # 新会话
        "user_id": "1"     # 相同用户
    }
}

# 用户输入
input_messages = [HumanMessage(content="What bakeries do you recommend for me?")]

# 运行图
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

输出：

================================ Human Message =================================
What bakeries do you recommend for me?

================================== Ai Message ==================================
Since you're in San Francisco and enjoy going to bakeries, here are a few
recommendations you might like:

1. **Tartine Bakery** - Known for its delicious bread and pastries, it's a
   must-visit for any bakery enthusiast.
2. **B. Patisserie** - Offers a delightful selection of French pastries,
   including their famous kouign-amann.
3. **Arsicault Bakery** - Renowned for its croissants, which have been praised
   as some of the best in the country.
4. **Craftsman and Wolves** - Known for their inventive pastries and the "Rebel
   Within," a savory muffin with a soft-cooked egg inside.
5. **Mr. Holmes Bakehouse** - Famous for their cruffins and other creative pastries.

These spots should offer a great variety of treats for you to enjoy. Happy bakery hopping!

长期记忆的威力：

✅ 新会话，但记得用户信息
✅ "Since you're in San Francisco"（记得位置）
✅ "enjoy going to bakeries"（记得兴趣）
✅ 提供了个性化的本地建议

九、Trustcall 的高级特性

9.1 多 Schema 支持

Trustcall 可以同时处理多个 Schema：

python

class UserProfile(BaseModel):
    user_name: str
    interests: List[str]

class UserPreferences(BaseModel):
    language: str
    notification_enabled: bool

# 创建支持多个 Schema 的提取器
multi_extractor = create_extractor(
    model,
    tools=[UserProfile, UserPreferences],  # 多个 Schema
    # 不指定 tool_choice，让模型选择
)

9.2 自动错误修正

Trustcall 内置了自动错误修正机制。当模型生成的输出不符合 Schema 时：

捕获验证错误
分析错误原因
生成修正提示
重新调用模型
应用修正

查看 LangSmith 追踪示例可以看到这个过程。

9.3 enable_inserts 参数

python

trustcall_extractor = create_extractor(
    model,
    tools=[ToDo],
    tool_choice="ToDo",
    enable_inserts=True  # 允许插入新项目
)

用途：

当 Schema 是集合（如 ToDo 列表）时
允许添加新的独立项目
而不是只更新单一对象

示例：

python

# 没有 enable_inserts：更新现有的单一 Profile
# 有 enable_inserts：可以添加多个 ToDo 项

9.4 并行工具调用

Trustcall 支持并行调用多个工具：

python

# 系统指令中提示并行处理
instruction = """Use parallel tool calling to handle updates and insertions simultaneously."""

# Trustcall 会自动：
# 1. 识别需要更新的现有项
# 2. 识别需要创建的新项
# 3. 并行执行这些操作

十、设计模式和最佳实践

10.1 何时使用 with_structured_output？

适用场景：

✅ 简单的 Schema（2-3 层嵌套）
✅ 创建新对象（不需要更新）
✅ 每次生成的数据量较小
✅ 不担心信息丢失

示例：

python

class SimpleProfile(BaseModel):
    name: str
    age: int

model_with_structure = model.with_structured_output(SimpleProfile)

10.2 何时使用 Trustcall？

适用场景：

✅ 复杂的嵌套 Schema
✅ 需要更新现有数据
✅ 关注 Token 效率
✅ 需要保留现有信息
✅ 需要自动错误修正

示例：

python

trustcall_extractor = create_extractor(
    model,
    tools=[ComplexProfile],
    tool_choice="ComplexProfile"
)

10.3 Profile Schema 设计原则

1. 字段命名：

python

# 好的命名
user_name: str          # 清晰、具体
user_location: str      # 描述性强
interests: List[str]    # 复数形式表示列表

# 不好的命名
name: str               # 太通用
loc: str                # 缩写不清晰
interest: str           # 单数但实际是列表

2. 字段描述：

python

# 好的描述
user_name: str = Field(description="The user's preferred name or nickname")

# 不好的描述
user_name: str = Field(description="name")  # 没有额外信息

3. 默认值：

python

# 合理使用默认值
interests: List[str] = Field(default_factory=list)  # 空列表
language: str = Field(default="en")                # 默认语言

# 避免
required_field: str = Field(default="")  # 必需字段不应有空默认值

4. 可选字段：

python

from typing import Optional

# 真正可选的字段
middle_name: Optional[str] = None
bio: Optional[str] = None

# 避免过度使用 Optional
# 如果字段通常都有值，不要设为 Optional

10.4 更新策略选择

场景	推荐策略	原因
单一 Profile	Trustcall 增量更新	保留信息，高效
集合数据（多项）	Trustcall with enable_inserts	可以添加新项
简单临时数据	完全重新生成	简单，数据量小
频繁更新	Trustcall	节省 Token
偶尔更新	任意方式	影响不大

10.5 错误处理

python

def write_memory_with_error_handling(state, config, store):
    try:
        # 尝试调用 Trustcall
        result = trustcall_extractor.invoke({
            "messages": [...],
            "existing": existing_profile
        })

        # 保存结果
        updated_profile = result["responses"][0].model_dump()
        store.put(namespace, key, updated_profile)

    except Exception as e:
        # 记录错误
        print(f"Error updating profile: {e}")

        # 降级方案：保持现有记忆不变
        # 或者使用简单的方式重新生成
        pass

十一、Python 知识点深入

11.1 Pydantic Field 的高级用法

python

from pydantic import BaseModel, Field, validator
from typing import List

class UserProfile(BaseModel):
    user_name: str = Field(
        description="用户名",
        min_length=1,           # 最小长度
        max_length=50,          # 最大长度
        pattern=r"^[a-zA-Z\s]+$"  # 正则表达式验证
    )

    age: int = Field(
        description="年龄",
        ge=0,                   # 大于等于 0
        le=150                  # 小于等于 150
    )

    interests: List[str] = Field(
        description="兴趣",
        min_items=0,            # 最少项目数
        max_items=20,           # 最多项目数
        default_factory=list
    )

    # 自定义验证器
    @validator('user_name')
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError('名字不能为空')
        return v.title()  # 首字母大写

11.2 类型注解的完整语法

python

from typing import (
    List,           # 列表
    Dict,           # 字典
    Set,            # 集合
    Tuple,          # 元组
    Optional,       # 可选（可以是 None）
    Union,          # 联合类型（多选一）
    Literal,        # 字面量（固定的几个值）
    Any,            # 任意类型
)

# 基础类型
name: str
age: int
height: float
is_active: bool

# 容器类型
interests: List[str]                    # 字符串列表
scores: Dict[str, int]                  # 字符串到整数的字典
tags: Set[str]                          # 字符串集合
coordinates: Tuple[float, float]        # 两个浮点数的元组

# 可选类型
middle_name: Optional[str]              # str 或 None
# 等价于
middle_name: Union[str, None]

# 联合类型
id: Union[int, str]                     # int 或 str

# 字面量类型
status: Literal["active", "inactive"]   # 只能是这两个值之一

# 嵌套类型
matrix: List[List[int]]                 # 整数矩阵
user_data: Dict[str, Union[str, int]]   # 混合类型字典

11.3 模型的序列化和反序列化

python

from pydantic import BaseModel

class UserProfile(BaseModel):
    user_name: str
    interests: List[str]

# 创建实例
profile = UserProfile(user_name="Lance", interests=["biking"])

# 序列化为字典
dict_data = profile.model_dump()
# {'user_name': 'Lance', 'interests': ['biking']}

# 序列化为 JSON 字符串
json_str = profile.model_dump_json()
# '{"user_name":"Lance","interests":["biking"]}'

# 从字典反序列化
profile2 = UserProfile(**dict_data)

# 从 JSON 字符串反序列化
profile3 = UserProfile.model_validate_json(json_str)

Python 知识点：

** 操作符（字典解包）：

python

dict_data = {'user_name': 'Lance', 'interests': ['biking']}

# 使用 **
UserProfile(**dict_data)
# 等价于
UserProfile(user_name='Lance', interests=['biking'])

11.4 列表推导式和字典推导式

python

# 列表推导式
interests = ['biking', 'coffee', 'coding']
uppercase = [item.upper() for item in interests]
# ['BIKING', 'COFFEE', 'CODING']

# 带条件的列表推导式
long_interests = [item for item in interests if len(item) > 5]
# ['biking', 'coffee', 'coding']

# 字典推导式
interest_lengths = {item: len(item) for item in interests}
# {'biking': 6, 'coffee': 6, 'coding': 6}

# 嵌套推导式
matrix = [[i*j for j in range(3)] for i in range(3)]
# [[0, 0, 0], [0, 1, 2], [0, 2, 4]]

十二、与前面章节的对比总结

12.1 记忆格式的演进

章节	记忆格式	优点	缺点
5.2	自由文本	灵活、简单	难以查询、易丢失信息
5.3 (TypedDict)	结构化字典	有类型提示	无运行时验证
5.3 (Pydantic)	严格 Schema	完整验证、文档化	设置稍复杂

12.2 更新策略的演进

章节	更新方式	Token 效率	信息保留
5.2	LLM 重新生成全文	❌ 低	⚠️ 可能丢失
5.3 (basic)	with_structured_output	⚠️ 中等	⚠️ 可能丢失
5.3 (Trustcall)	JSON Patch 增量更新	✅ 高	✅ 保证保留

12.3 复杂度对比

5.2: Store → 自由文本
     简单 ★☆☆☆☆

5.3 (basic): Store → with_structured_output → Schema
             中等 ★★★☆☆

5.3 (Trustcall): Store → Trustcall → 复杂嵌套 Schema
                 复杂 ★★★★☆

12.4 学习路径建议

1. 掌握 5.2（基础）
   ↓
   理解 Store 的基本概念
   ↓
2. 学习 5.3 基础部分
   ↓
   理解结构化数据的优势
   ↓
3. 掌握 with_structured_output
   ↓
   处理简单 Schema
   ↓
4. 学习 Trustcall
   ↓
   处理复杂 Schema 和更新

十三、扩展思考和未来方向

13.1 Schema 的版本管理

当 Schema 需要演进时：

python

# V1
class UserProfileV1(BaseModel):
    user_name: str
    interests: List[str]

# V2 - 添加新字段
class UserProfileV2(BaseModel):
    user_name: str
    interests: List[str]
    user_location: str = Field(default="Unknown")  # 新字段有默认值
    created_at: datetime = Field(default_factory=datetime.now)

# 迁移函数
def migrate_v1_to_v2(v1_data: dict) -> dict:
    return {
        **v1_data,
        "user_location": "Unknown",
        "created_at": datetime.now().isoformat()
    }

13.2 多用户隔离

确保用户数据隔离：

python

def get_user_namespace(user_id: str, data_type: str) -> tuple:
    """生成用户特定的命名空间"""
    return ("user", user_id, data_type)

# 使用
profile_ns = get_user_namespace("user_123", "profile")
# ("user", "user_123", "profile")

todos_ns = get_user_namespace("user_123", "todos")
# ("user", "user_123", "todos")

13.3 数据导出和备份

python

def export_user_data(store: BaseStore, user_id: str) -> dict:
    """导出用户的所有数据"""
    data = {}

    # 导出 Profile
    profile = store.get(("memory", user_id), "user_memory")
    if profile:
        data["profile"] = profile.value

    # 导出其他数据...

    return data

def import_user_data(store: BaseStore, user_id: str, data: dict):
    """导入用户数据"""
    if "profile" in data:
        store.put(("memory", user_id), "user_memory", data["profile"])

    # 导入其他数据...

13.4 搜索和查询优化

python

def search_users_by_interest(store: BaseStore, interest: str) -> List[str]:
    """查找有特定兴趣的用户"""
    matching_users = []

    # 这需要扫描所有用户（效率较低）
    # 在生产环境中，考虑使用专门的搜索引擎

    # 伪代码
    for user_id in all_user_ids:
        profile = store.get(("memory", user_id), "user_memory")
        if profile and interest in profile.value.get("interests", []):
            matching_users.append(user_id)

    return matching_users

十四、总结

14.1 核心要点

结构化记忆优于自由文本
- 明确的数据结构
- 类型验证
- 易于查询和处理
Pydantic 是首选的 Schema 定义方式
- 运行时验证
- 丰富的功能
- 与 LangChain 生态系统集成
with_structured_output 适合简单场景
- 创建新 Schema
- 简单的数据结构
- 快速开始
Trustcall 是复杂场景的利器
- 复杂嵌套 Schema
- 增量更新
- 自动错误修正
增量更新优于完全重新生成
- 节省 Token
- 保留信息
- 更精确的控制

14.2 学到的技能

Python 技能：

TypedDict 和 Pydantic 的使用
类型注解的完整语法
Field 验证和自定义验证器
模型序列化和反序列化
字典解包和列表推导式

LangChain 技能：

with_structured_output() 方法
结构化输出的生成
Schema 与 LLM 的集成

Trustcall 技能：

create_extractor 的使用
JSON Patch 机制理解
复杂 Schema 的处理
增量更新策略

系统设计：

Profile Schema 设计原则
更新策略选择
错误处理和降级方案
数据版本管理

14.3 实践建议

开始小项目：

从简单的 TypedDict 开始
迁移到 Pydantic
集成 with_structured_output
在需要时引入 Trustcall

关注点：

数据验证
错误处理
Token 效率
用户体验

测试：

测试 Schema 验证
测试更新逻辑
测试边界情况
性能测试

14.4 下一步

继续学习 Module-5 的其他部分：

5.4 Memory Schema - Collection：
- 管理多个记忆项
- Collection 的增删改查
- 与 Profile 的区别
高级主题：
- 记忆的优先级和重要性
- 记忆的检索和排序
- 大规模记忆管理
生产实践：
- 持久化存储（数据库）
- 缓存策略
- 监控和日志
- 安全性考虑

十五、附录

A.1 完整代码清单

python

# 导入依赖
from IPython.display import Image, display
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.store.base import BaseStore
from langgraph.store.memory import InMemoryStore
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.runnables.config import RunnableConfig
from langchain_openai import ChatOpenAI
from trustcall import create_extractor
from pydantic import BaseModel, Field
from typing import List

# 定义 Schema
class UserProfile(BaseModel):
    """用户资料"""
    user_name: str = Field(description="用户的首选名称")
    user_location: str = Field(description="用户的位置")
    interests: List[str] = Field(description="用户的兴趣列表")

# 初始化模型
model = ChatOpenAI(model="gpt-4o", temperature=0)

# 创建 Trustcall 提取器
trustcall_extractor = create_extractor(
    model,
    tools=[UserProfile],
    tool_choice="UserProfile",
)

# 系统提示词
MODEL_SYSTEM_MESSAGE = """You are a helpful assistant with memory that provides information about the user.
If you have memory for this user, use it to personalize your responses.
Here is the memory (it may be empty): {memory}"""

TRUSTCALL_INSTRUCTION = """Create or update the memory (JSON doc) to incorporate information from the following conversation:"""

# 定义节点
def call_model(state: MessagesState, config: RunnableConfig, store: BaseStore):
    user_id = config["configurable"]["user_id"]
    namespace = ("memory", user_id)
    existing_memory = store.get(namespace, "user_memory")

    if existing_memory and existing_memory.value:
        memory_dict = existing_memory.value
        formatted_memory = (
            f"Name: {memory_dict.get('user_name', 'Unknown')}\n"
            f"Location: {memory_dict.get('user_location', 'Unknown')}\n"
            f"Interests: {', '.join(memory_dict.get('interests', []))}"
        )
    else:
        formatted_memory = None

    system_msg = MODEL_SYSTEM_MESSAGE.format(memory=formatted_memory)
    response = model.invoke([SystemMessage(content=system_msg)] + state["messages"])
    return {"messages": response}

def write_memory(state: MessagesState, config: RunnableConfig, store: BaseStore):
    user_id = config["configurable"]["user_id"]
    namespace = ("memory", user_id)
    existing_memory = store.get(namespace, "user_memory")

    existing_profile = {"UserProfile": existing_memory.value} if existing_memory else None

    result = trustcall_extractor.invoke({
        "messages": [SystemMessage(content=TRUSTCALL_INSTRUCTION)] + state["messages"],
        "existing": existing_profile
    })

    updated_profile = result["responses"][0].model_dump()
    key = "user_memory"
    store.put(namespace, key, updated_profile)

# 构建图
builder = StateGraph(MessagesState)
builder.add_node("call_model", call_model)
builder.add_node("write_memory", write_memory)
builder.add_edge(START, "call_model")
builder.add_edge("call_model", "write_memory")
builder.add_edge("write_memory", END)

# 编译图
across_thread_memory = InMemoryStore()
within_thread_memory = MemorySaver()
graph = builder.compile(
    checkpointer=within_thread_memory,
    store=across_thread_memory
)

# 使用示例
config = {"configurable": {"thread_id": "1", "user_id": "1"}}
input_messages = [HumanMessage(content="Hi, my name is Lance")]

for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()

A.2 Schema 设计模板

python

from pydantic import BaseModel, Field, validator
from typing import List, Optional
from datetime import datetime

class UserProfile(BaseModel):
    """用户资料 Schema 模板"""

    # 必需字段
    user_name: str = Field(
        description="用户的首选名称",
        min_length=1,
        max_length=100
    )

    # 可选字段
    user_location: Optional[str] = Field(
        default=None,
        description="用户的位置"
    )

    # 列表字段
    interests: List[str] = Field(
        default_factory=list,
        description="用户的兴趣列表",
        max_items=50
    )

    # 时间戳
    created_at: datetime = Field(
        default_factory=datetime.now,
        description="创建时间"
    )

    # 自定义验证
    @validator('user_name')
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError('名字不能为空白')
        return v.strip()

    # 配置
    class Config:
        json_schema_extra = {
            "example": {
                "user_name": "Lance",
                "user_location": "San Francisco",
                "interests": ["biking", "bakeries"]
            }
        }

A.3 常用命令速查

python

# 创建 Trustcall 提取器
from trustcall import create_extractor

extractor = create_extractor(
    model,
    tools=[Schema],
    tool_choice="Schema"
)

# 提取（创建）
result = extractor.invoke({
    "messages": [...]
})

# 提取（更新）
result = extractor.invoke(
    {"messages": [...]},
    {"existing": {"Schema": existing_data}}
)

# 获取结果
pydantic_obj = result["responses"][0]
dict_data = pydantic_obj.model_dump()

# 保存到 Store
store.put(namespace, key, dict_data)

# 从 Store 获取
item = store.get(namespace, key)
data = item.value if item else None

文档版本：1.0 最后更新：2024-11-05 作者：AI Assistant 基于：LangChain Academy Module-5 Lesson 5.3

希望这份详细解读能帮助你深入理解结构化记忆和 Trustcall 的使用！

5.3 Memory Schema - Profile - 详细解读 ​

一、概述 ​

1.1 本节简介 ​

1.2 从 5.2 到 5.3 的进化 ​

1.3 学习目标 ​

1.4 本节的核心价值 ​

二、结构化数据基础 ​

2.1 什么是结构化数据？ ​

2.2 Python 中的结构化数据类型 ​

2.2.1 普通字典（Dict） ​

2.2.2 TypedDict ​

2.2.3 Pydantic BaseModel ​

2.3 为什么使用 Pydantic？ ​

三、定义 Profile Schema ​

3.1 简单的 UserProfile（TypedDict） ​

3.2 创建 TypedDict 实例 ​

3.3 保存 Schema 到 Store ​

3.4 从 Store 检索 Schema ​

四、结构化输出（Structured Output） ​

4.1 什么是结构化输出？ ​

4.2 使用 with_structured_output() ​

4.3 在聊天机器人中使用结构化输出 ​

五、复杂 Schema 的挑战 ​

5.1 简单 Schema vs 复杂 Schema ​

5.2 使用 with_structured_output() 的失败案例 ​

5.3 为什么需要 Trustcall？ ​

六、Trustcall 深入解析 ​

6.1 什么是 Trustcall？ ​

6.2 JSON Patch 简介 ​

6.3 Trustcall 的工作原理 ​

6.4 创建 Trustcall Extractor ​

6.5 基础使用：提取信息 ​

6.6 高级使用：更新现有 Schema ​

6.6.1 准备现有 Schema ​

6.6.2 更新操作 ​

6.7 处理复杂 Schema ​

6.8 Trustcall vs with_structured_output 对比 ​

七、构建带有 Profile Schema 的聊天机器人 ​

7.1 完整的 Schema 定义 ​

7.2 创建 Trustcall Extractor ​

7.3 定义系统提示词 ​

7.4 call_model 节点 ​

7.5 write_memory 节点 ​

7.6 构建完整的图 ​

八、实战演示 ​

8.1 第一次交互：创建 Profile ​

8.2 第二次交互：更新 Profile ​

8.3 查看 Profile ​

8.4 继续更新 Profile ​

8.5 跨会话测试 ​

九、Trustcall 的高级特性 ​

9.1 多 Schema 支持 ​

9.2 自动错误修正 ​

9.3 enable_inserts 参数 ​

9.4 并行工具调用 ​

十、设计模式和最佳实践 ​

10.1 何时使用 with_structured_output？ ​

10.2 何时使用 Trustcall？ ​

10.3 Profile Schema 设计原则 ​

10.4 更新策略选择 ​

10.5 错误处理 ​

十一、Python 知识点深入 ​

11.1 Pydantic Field 的高级用法 ​

11.2 类型注解的完整语法 ​

11.3 模型的序列化和反序列化 ​

11.4 列表推导式和字典推导式 ​

十二、与前面章节的对比总结 ​

12.1 记忆格式的演进 ​

12.2 更新策略的演进 ​

12.3 复杂度对比 ​

12.4 学习路径建议 ​

十三、扩展思考和未来方向 ​

13.1 Schema 的版本管理 ​

13.2 多用户隔离 ​

13.3 数据导出和备份 ​

13.4 搜索和查询优化 ​

十四、总结 ​

14.1 核心要点 ​

14.2 学到的技能 ​

14.3 实践建议 ​

5.3 Memory Schema - Profile - 详细解读

一、概述

1.1 本节简介

1.2 从 5.2 到 5.3 的进化

1.3 学习目标

1.4 本节的核心价值

二、结构化数据基础

2.1 什么是结构化数据？

2.2 Python 中的结构化数据类型

2.2.1 普通字典（Dict）

2.2.2 TypedDict

2.2.3 Pydantic BaseModel

2.3 为什么使用 Pydantic？

三、定义 Profile Schema

3.1 简单的 UserProfile（TypedDict）

3.2 创建 TypedDict 实例

3.3 保存 Schema 到 Store

3.4 从 Store 检索 Schema

四、结构化输出（Structured Output）

4.1 什么是结构化输出？

4.2 使用 with_structured_output()

4.3 在聊天机器人中使用结构化输出

五、复杂 Schema 的挑战

5.1 简单 Schema vs 复杂 Schema

5.2 使用 with_structured_output() 的失败案例

5.3 为什么需要 Trustcall？

六、Trustcall 深入解析

6.1 什么是 Trustcall？

6.2 JSON Patch 简介

6.3 Trustcall 的工作原理

6.4 创建 Trustcall Extractor

6.5 基础使用：提取信息

6.6 高级使用：更新现有 Schema

6.6.1 准备现有 Schema

6.6.2 更新操作

6.7 处理复杂 Schema

6.8 Trustcall vs with_structured_output 对比

七、构建带有 Profile Schema 的聊天机器人

7.1 完整的 Schema 定义

7.2 创建 Trustcall Extractor

7.3 定义系统提示词

7.4 call_model 节点

7.5 write_memory 节点

7.6 构建完整的图

八、实战演示

8.1 第一次交互：创建 Profile

8.2 第二次交互：更新 Profile

8.3 查看 Profile

8.4 继续更新 Profile

8.5 跨会话测试

九、Trustcall 的高级特性

9.1 多 Schema 支持

9.2 自动错误修正

9.3 enable_inserts 参数

9.4 并行工具调用

十、设计模式和最佳实践

10.1 何时使用 with_structured_output？

10.2 何时使用 Trustcall？

10.3 Profile Schema 设计原则

10.4 更新策略选择

10.5 错误处理

十一、Python 知识点深入

11.1 Pydantic Field 的高级用法

11.2 类型注解的完整语法

11.3 模型的序列化和反序列化

11.4 列表推导式和字典推导式

十二、与前面章节的对比总结

12.1 记忆格式的演进

12.2 更新策略的演进

12.3 复杂度对比

12.4 学习路径建议

十三、扩展思考和未来方向

13.1 Schema 的版本管理

13.2 多用户隔离

13.3 数据导出和备份

13.4 搜索和查询优化

十四、总结

14.1 核心要点

14.2 学到的技能

14.3 实践建议