LangChain 文档学习 No.5 - 记忆

入门

底层的 LLM 和聊天模型都是无状态的,所以 LangChain 的链式模型和代理模型同样都是无状态的,意味着它们会独立处理每次调用

某些应用程序中,比如聊天机器人,记住先前的交互是至关重要的;LangChain 提供了用于管理和操作以前的聊天消息的辅助工具,这些工具被设计成模块化的,其次 LangChain 提供了将这些工具轻松整合到链式模型中的方法

ChatMessageHistory

轻量级的包装器,方便保存人类消息、AI 消息,以及获取的方法

1
2
3
4
5
6
7
8
from langchain.memory import ChatMessageHistory
from langchain.memory import ConversationBufferMemory
history = ChatMessageHistory()
history.add_user_message("hi!")
history.add_ai_message("whats up?")

history.messages
# >> [HumanMessage(content='hi!', additional_kwargs={}),AIMessage(content='whats up?', additional_kwargs={})]

ConversationBufferMemory

ConversationBufferMemoryChatMessageHistory 的一个包装器,可以提取变量中的消息

可以首先将其提取为字符串

1
2
3
4
5
6
7
8
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("whats up?")

memory.load_memory_variables({})
# >> {'history': 'Human: hi!\nAI: whats up?'}

还可以将历史记录作为消息列表获取

1
2
3
4
5
6
7
memory = ConversationBufferMemory(return_messages=True)
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("whats up?")

memory.load_memory_variables({})

# >> {'history': [HumanMessage(content='hi!', additional_kwargs={}),AIMessage(content='whats up?', additional_kwargs={})]}

在链中使用

最后在链中使用它(设置 verbose=True,这样我们就可以看到提示)

verbose

是否在详细模式下运行,在详细模式下,一些中间日志将打印到控制台

可通过langchain.globals.get_verbose() 访问

1
2
3
4
5
6
7
8
9
10
11
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm,
verbose=True,
memory=ConversationBufferMemory()
)
1
2
3
4
5
6
fir = conversation.predict(input="Hello World!")
sec = conversation.predict(input="How to evaluate the world?")
print(fir)
# >> Hello! How can I assist you today?
print(sec)
# >> Evaluating the world can be a complex task as it involves considering multiple factors and perspectives. Some common approaches to evaluating the world include assessing the state of the economy, analyzing social and political systems, examining environmental conditions, and evaluating the well-being of individuals and communities. It can also involve considering ethical and moral values, cultural differences, and historical contexts. Ultimately, the process of evaluating the world is subjective and can vary depending on individual beliefs, values, and priorities. Is there any specific aspect of the world you would like to evaluate?

因为 verbose 参数,会在过程中输出如下日志

1
2
3
4
5
6
7
8
9
10
11
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello World!
AI: Hello! How can I assist you today?
Human: How to evaluate the world?
AI:

> Finished chain.

保存消息历史

在使用中,我们可能需要经常保存历史消息,并在后续流程加载和使用

可以通过先将消息转换为普通的 Python 字典,保存这些字典(如 Json 或其他格式),然后加载它们来轻松完成

1
2
3
history = ChatMessageHistory()
history.add_user_message("hi!")
history.add_ai_message("whats up?")

转换成字典

1
2
3
4
dicts = messages_to_dict(history.messages)
print(dicts)

# >> [{'type': 'human', 'data': {'content': 'hi!', 'additional_kwargs': {}, 'type': 'human', 'example': False}}, {'type': 'ai', 'data': {'content': 'whats up?', 'additional_kwargs': {}, 'type': 'ai', 'example': False}}]

字典转换为 message

1
2
3
4
new_messages = messages_from_dict(dicts)
print(new_messages)

# >> [HumanMessage(content='hi!', additional_kwargs={}),AIMessage(content='whats up?', additional_kwargs={})]

记忆类型

缓冲(Buffer)

ConversationBufferMemory 用来存储历史记录,并从中提取历史记录

入门中的示例就是使用 ConversationBufferMemory 来进行实现的

缓冲窗口(Buffer Window)

ConversationBufferWindowMemory 会在一段时间内保持对话的交互列表,它只使用最后的 K 个交互,这样做的好处是缓冲区就不会变得太大

1
2
3
4
5
6
7
8
9
10
11
12
from langchain.memory import ConversationBufferWindowMemory

# k = 1,即只使用最后的 1 个交互
memory = ConversationBufferWindowMemory(k=1)

memory.save_context({"input": "hi"}, {"output": "whats up"})
print(memory.load_memory_variables({}))
# >> {'history': 'Human: hi\nAI: whats up'}

memory.save_context({"input": "not much you"}, {"output": "not much"})
print(memory.load_memory_variables({}))
# >> {'history': 'Human: not much you\nAI: not much'}

摘要(Summary)

ConversationSummaryMemory 会随着时间的推移创建一份对话总结,有效地从对话中压缩信息

摘要记忆将对话进行总结并将总结内容存储在记忆中,然后可以将此记忆用于将迄今为止的对话摘要注入到提示或者链中;此记忆对于较长的对话非常有用,如果在提示中保存所有的历史消息将会占用太多 token

1
2
3
4
5
6
7
8
9
10
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory
from langchain.llms import OpenAI

memory = ConversationSummaryMemory(
llm=chat_model,
return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
print(memory.load_memory_variables({}))
# >> {'history': [SystemMessage(content='The human greets the AI with "Hello World!" and the AI responds with "Yes!"')]}

也可以直接调用 predict_new_summary 方法,传入 message 集合,直接获取摘要内容

1
2
3
4
messages = memory.chat_memory.messages
previous_summary = ""
print(memory.predict_new_summary(messages, previous_summary))
# >> The human greets the AI with "Hello World!" and the AI responds with "Yes!"

可以选择使用以前生成的摘要来加快初始化速度,并通过直接初始化来避免重新生成摘要

1
2
3
4
5
6
7
memory = ConversationSummaryMemory(
llm=chat_module,
# 这里的 buffer 构造参数用于初始化
buffer="The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.",
chat_memory=history,
return_messages=True
)

混合(Buffer Summary)

ConversationSummaryBufferMemory 结合了缓冲和摘要两种特点,保留了最近交互的缓冲信息,但它不只是完全刷新旧的交互数据,而是将它们同时总结为摘要

通过 max_token_limit 这个参数进行实现的,当最新的对话文字长度在参数范围之内的时候,LangChain 会记忆原始对话内容;当对话文字超出了这个参数的长度,那么模型就会把所有超过预设长度的内容进行总结,以节省 Token 数量

1
2
3
4
5
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

print(memory.load_memory_variables({}))

这里没有成功调用,因为 Azure 的 gpt-35-turbo-16k 模型还没有实现 token 相关的 get_num_tokens_from_messages 方法(详见 langchain.chat_models.openai.ChatOpenAI.get_num_tokens_from_messages)

1
2
3
4
5
6
7
else:
raise NotImplementedError(
f"get_num_tokens_from_messages() is not presently implemented "
f"for model {model}."
"See https://github.com/openai/openai-python/blob/main/chatml.md for "
"information on how messages are converted to tokens."
)

这里摘录一下极客时间相关课程的示例,感受下 buffer 和 summary 的共同作用


第一回合输出

1
2
3
{'input': '我姐姐明天要过生日,我需要一束生日花束。',
'history': '',
'response': ' 哇,你姐姐要过生日啊!那太棒了!我建议你去买一束色彩鲜艳的花束,因为这样可以代表你给她的祝福和祝愿。你可以去你家附近的花店,或者也可以从网上订购,你可以看看有没有特别的花束,比如彩色玫瑰或者百合花,它们会更有特色。'}

第二回合的输出

1
2
3
{'input': '她喜欢粉色玫瑰,颜色是粉色的。',
'history': 'Human: 我姐姐明天要过生日,我需要一束生日花束。\nAI:  哇,你姐姐要过生日啊!那太棒了!我建议你去买一束色彩鲜艳的花束,因为这样可以代表你给她的祝福和祝愿。你可以去你家附近的花店,或者也可以从网上订购,你可以看看有没有特别的花束,比如彩色玫瑰或者百合花,它们会更有特色。',
'response': ' 好的,那粉色玫瑰就是一个很好的选择!你可以买一束粉色玫瑰花束,这样你姐姐会很开心的!你可以在花店里找到粉色玫瑰,也可以从网上订购,你可以根据你的预算,选择合适的数量。另外,你可以考虑添加一些装饰,比如细绳、彩带或者小礼品'}

第三回合的输出

1
2
3
{'input': '我又来了,还记得我昨天为什么要来买花吗?',
'history': "System: \nThe human asked the AI for advice on buying a bouquet for their sister's birthday. The AI suggested buying a vibrant bouquet as a representation of their wishes and blessings, and recommended looking for special bouquets like colorful roses or lilies for something more unique.\nHuman: 她喜欢粉色玫瑰,颜色是粉色的。\nAI:  好的,那粉色玫瑰就是一个很好的选择!你可以买一束粉色玫瑰花束,这样你姐姐会很开心的!你可以在花店里找到粉色玫瑰,也可以从网上订购,你可以根据你的预算,选择合适的数量。另外,你可以考虑添加一些装饰,比如细绳、彩带或者小礼品",
'response': ' 是的,我记得你昨天来买花是为了给你姐姐的生日。你想买一束粉色玫瑰花束来表达你的祝福和祝愿,你可以在花店里找到粉色玫瑰,也可以从网上订购,你可以根据你的预算,选择合适的数量。另外,你可以考虑添加一些装饰,比如细绳、彩带或者小礼品}

不难看出在第二回合,记忆机制完整地记录了第一回合的对话,但是在第三回合,它察觉出前两轮的对话已经超出了设置的 token 数量(示例中设置的是 300),就把早期的对话加以总结,以节省 token 资源

ConversationSummaryBufferMemory 的优势是通过总结可以回忆起较早的互动,而且有缓冲区确保我们不会错过最近的互动信息,当然对于较短的对话,ConversationSummaryBufferMemory 也可能会增加 token 数量

Token 缓冲(Token Buffer)

ConversationTokenBufferMemory,在记忆中保留最近交互的互动缓冲,并使用 token 长度而不是交互次数来确定何时刷新交互

个人认为类似缓冲窗口,但是这个记忆类型是根据 token 长度判断丢弃哪些内容,即窗口滑动的标准是 token 长度

1
2
3
4
5
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

print(memory.load_memory_variables({}))

这个操作同样依赖 get_num_tokens_from_messages 方法,暂时无法实现

实体(Entity)

ConversationEntityMemory 会记忆关于会话中特定实体的给定事实

它提取关于实体的信息(通过使用 LLM),并随着时间的推移积累关于该实体的知识(也通过使用 LLM)

1
2
3
4
5
6
7
8
9
memory = ConversationEntityMemory(llm=Azure.chat_model)
_input = {"input": "Deven & Sam are working on a hackathon project"}
# 这里生成新的 entity key
memory.load_memory_variables(_input)
# 这里会对 entity 进行总结
memory.save_context(
_input,
{"output": " That sounds like a great project! What kind of project are they working on?"}
)
1
2
3
4
5
print(memory.load_memory_variables({"input": 'The relation between Deven and Sam?'}))
# >> {'history': 'Human: Deven & Sam are working on a hackathon project\nAI: That sounds like a great project! What kind of project are they working on?', 'entities': {'Deven': 'Deven is currently working on a hackathon project with Sam.', 'Sam': 'Sam is working on a hackathon project with Deven.'}}

print(memory.load_memory_variables({"input": 'What is Sam doing?'}))
# >> {'history': 'Human: Deven & Sam are working on a hackathon project\nAI: That sounds like a great project! What kind of project are they working on?', 'entities': {'Sam': 'Sam is working on a hackathon project with Deven.'}}

上面的例子可以看出来,当话题和两个人相关时 The relation between Deven and Sam?,记忆会总结并同时返回 Deven 和 Sam 的信息;当话题只和 Sam 有关时 What is Sam doing?,则只会返回 Sam 的实体信息

知识图(Knowledge Graph)

ConversationKGMemory 使用知识图来重新创建记忆

1
2
3
4
5
6
llm = Azure.chat_model
memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to Sam"}, {"output": "who is Sam"})
memory.save_context({"input": "Sam is a friend"}, {"output": "okay"})
memory.save_context({"input": "Sam and Deven are my teachers"}, {"output": "okay,I known"})
memory.save_context({"input": "Deven has a white watch"}, {"output": "okay"})
1
2
3
4
5
print(memory.load_memory_variables({"input": "who is sam"}))
# >> {'history': 'On Sam: Sam is a person. Sam is a friend. Sam is my teacher.\nOn Deven: Deven is my teacher. Deven has white watch.'}

print(memory.load_memory_variables({"input": "who is Deven"}))
# >> {'history': 'On Deven: Deven is my teacher. Deven has white watch.'}

和实体类似,都是使用 LLM 对内容进行概括和匹配

向量存储(Vector Store)

VectorStoreRetrieverMemory 将记忆存储在向量存储中,并在每次调用时查询前 K 个最匹配的文档

与大多数其他记忆类不同的是,它不明确跟踪交互的顺序,”文档“是历史的对话片段,有助于帮助 AI 了解早期历史的内容

初始化向量存储工具

取决于所使用的向量存储工具

1
2
3
4
5
6
7
8
9
import faiss

from langchain.docstore import InMemoryDocstore
from langchain.vectorstores import FAISS

embedding_size = 1536 # OpenAIEmbeddings 的维度
index = faiss.IndexFlatL2(embedding_size)
embedding_fn = OpenAIEmbeddings().embed_query
vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {})

创建 VectorStoreRetrieverMemory

1
2
3
4
5
6
7
8
# 在实际使用中,您可以将参数 k 设置得更高,但我们使用 k=1 来展示向量查找仍然返回语义相关的信息
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)

# 当添加到代理程序时,内存对象可以保存来自对话或使用工具的相关信息
memory.save_context({"input": "My favorite food is pizza"}, {"output": "that's good to know"})
memory.save_context({"input": "My favorite sport is soccer"}, {"output": "..."})
memory.save_context({"input": "I don't the Celtics"}, {"output": "ok"})

使用

1
2
print(memory.load_memory_variables({"prompt": "what sport should i watch?"})["history"])
# >> input: My favorite sport is soccer\noutput: ...

总结

因为实体、向量等类型是后续版本提供的,这里只进行基础的四种记忆类型的总结

类型 优点 缺点
ConversationBufferMemory(缓冲) 提供完整信息;简单、直观 使用更多 token,提高响应时间和成本;对话可能超出 token 限制
ConversationBufferWindowMemory(缓冲窗口) token 使用量较小;窗口灵活 无法记录早期互动;窗口可能设置不合理
ConversationSummaryMemory(摘要) 对于长对话减少 token 使用;允许进行更长时间的对话 短对话反而增加 token 的使用;对话记忆取决于 LLM 的总结能力;总结时会使用额外的 token
ConversationSummaryBufferMemory(混合) 能够回忆起早期互动;不会错过最近消息;灵活性高 短对话反而增加 token 的使用;存储原始活动也会增加 token 的使用

不同记忆类型交互数量和 token 数量增长统计图

在 LLMChain 中记忆

如何将记忆类和 LLMChain 一起使用,关键在于正确设置提示

在下面的提示中,我们有两个输入键:一个用于实际输入,另一个用于记忆类的输入,需要注意,我们要确保 PromptTemplateConversationBufferMemory 中的键匹配(chat_history)

1
2
3
4
5
6
7
8
9
10
template = """You are a chatbot having a conversation with a human.

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
input_variables=["chat_history", "human_input"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")
1
2
3
4
5
6
7
8
9
llm = OpenAI()
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)

llm_chain.predict(human_input="Hi there my friend")

聊天模型同理

ConversationChain

ConversationChainLLMChain 的子类,最主要的特点是它提供了包含 AI 前缀和人类前缀的对话摘要格式,这个对话格式和记忆机制结合得非常紧密

看一下 ConversationChain 内置的模板

1
2
3
4
5
# 初始化对话链
conv_chain = ConversationChain(llm=llm)

# 打印对话的模板
print(conv_chain.prompt.template)

输出

1
2
3
4
5
6
7
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:

这个提示试图通过说明以下内容来减少幻觉,也就是尽量减少模型编造的信息:

  • 如果 AI 不知道问题的答案,它就会如实说它不知道
  • 两个参数会通过提示模板传递给 LLM,我们希望返回的输出只是对话的延续
    • {history} 存储会话记忆
    • {input} 设置新的输入

当有了 {history} 参数,以及 Human 和 AI 这两个前缀,我们就能够把历史对话信息存储在提示模板中,并作为新的提示内容在新一轮的对话过程中传递给 LLM,这就是记忆机制的原理

自定义会话记忆

上面介绍了 ConversationChain 将以对话形式格式化历史记录并生成提示,依赖所用的记忆类,AI 默认前缀为 AI,人类的默认前缀为 Human

1
2
3
4
5
6
class ConversationBufferMemory(BaseChatMemory):
"""Buffer for storing conversation memory."""

human_prefix: str = "Human"
ai_prefix: str = "AI"
...

可以自定义进行前缀的修改

1
2
3
4
5
6
7
8
9
10
11
# AI 前缀修改为 AI Assistant
memory = ConversationBufferMemory(ai_prefix="AI Assistant")
# 人类前缀修改为 Friend
memory = ConversationBufferMemory(human_prefix="Friend")

conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=memory,
)

自定义记忆

已经存在一些预定义类型的记忆,也可以实现自定义的记忆类

通过继承 BaseMemory 来实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class FavoriteFoodMemory(BaseMemory):
"""Memory class for storing information about entities."""

# Define dictionary to store information about entities.
entities: dict = {}
# Define key to pass information about entities into prompt.
memory_key: str = "entities"

def clear(self):
self.entities = {}

@property
def memory_variables(self) -> List[str]:
"""Define the variables we are providing to the prompt."""
return [self.memory_key]

def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Load the memory variables, in this case the entity key."""
s = ""
for key, value in self.entities.items():
s += (value + "\n")
return {self.memory_key: s}

def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
pattern = r"My favorite food is (\w+)"
for key, value in inputs.items():
matches = re.findall(pattern, value)
num_matches = len(matches)
if num_matches > 0:
match = re.search(pattern, value)
food_name = match.group(1)
self.entities[key] = food_name

这里实现了一个简单的类,用于获取用户喜欢吃的食物

  • 定义了记忆中变量 name,为 entities
  • load_memory_variables 方法获取 entities 保存的喜好,拼装成字符串输出
  • save_context 通过正则 r"My favorite food is (\w+) 匹配用户的食物喜好并保存
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
template = """You are a chef, directly output the dish name you want to make based on user preferences.
Don't continue asking questions.
Do not reply to questions.

User Preferences:
{entities}

Conversation:
Human: {input}
AI:"""
# prompt,让 AI 扮演一个厨师
prompt = PromptTemplate(input_variables=["entities", "input"], template=template)
# 自定义记忆类
custom_memory = FavoriteFoodMemory()
# 模拟记忆类事先保存了互动,提取过用户的喜好食物
custom_memory.save_context({'input': 'My favorite food is banana.'}, {'output': 'Greate!'})

# 创建 ConversationChain
conversation = ConversationChain(
llm=Azure.chat_model, prompt=prompt, verbose=True, memory=custom_memory
)

使用

1
2
print(conversation.run("I'm hungry."))
# >> Banana Split

过程中日志,可以看到正确地选择了记忆,并格式化到提示中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> Entering new ConversationChain chain...
Prompt after formatting:
You are a chef, directly output the dish name you want to make based on user preferences.
Don't continue asking questions.
Do not reply to questions.

User Preferences:
banana


Conversation:
Human: I'm hungry.
AI:

> Finished chain.

组合记忆

可以在同一个链中使用多个记忆类,要组合多个记忆类需要初始化并使用 CombinedMemory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 缓冲记忆
conv_memory = ConversationBufferMemory(
memory_key="chat_history_lines", input_key="input"
)
# 摘要记忆
summary_memory = ConversationSummaryMemory(llm=Azure.chat_model, input_key="input")
# 通过 CombinedMemory 组合
memory = CombinedMemory(memories=[conv_memory, summary_memory])
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

Summary of conversation:
{history}
Current conversation:
{chat_history_lines}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input", "chat_history_lines"],
template=_DEFAULT_TEMPLATE,
)
# ConversationChain
conversation = ConversationChain(llm=Azure.chat_model, verbose=True, memory=memory, prompt=PROMPT)

执行查看结果

1
2
3
4
5
6
7
8
9
conversation.run("Hi")

pprint(conversation.memory.memories[0].load_memory_variables({}))
# 这里是组合的 ConversationBufferMemory
# >> {'chat_history_lines': 'Human: Hi\nAI: Hello! How can I assist you today?'}

pprint(conversation.memory.memories[1].load_memory_variables({}))
# 这里是组合的 ConversationSummaryMemory
# >> {'history': 'The human greets the AI and the AI asks how it can assist the human.'}

参考

Backed by a Vector Store | 🦜️🔗 Langchain

支持向量存储 | 🦜️🔗 Langchain

LangChain 实战课 (geekbang.org)