LangChain 文档学习 No.2 - 提示模板

发表于 2023-11-30 分类于开发 > 开源学习 > LangChain

模板格式

默认情况下，PromptTemplate 会将提供的模板视为 Python f-string

您可以通过 template_format 参数指定其他模板格式，如下的 template_format="jinja2"

# 确保安装了 jinja2

jinja2_template = "告诉我一个关于{{ content }}的笑话"
prompt_template = PromptTemplate.from_template(template=jinja2_template, template_format="jinja2")

prompt_template.format(content="熊猫")
# -> 告诉我一个关于熊猫的笑话

Args:

template: The template string.

template_format: The template format. Should be one of "f-string" or "jinja2".

f-string 亦称为格式化字符串常量（formatted string literals），是 Python3.6 新引入的一种字符串格式化方法 The new f-strings in Python 3.6 | Seasoned & Agile (cito.github.io)

验证模板

默认情况下，PromptTemplate 会通过检查 input_variables 是否与 template 中定义的变量匹配来验证 template 字符串

可以将 validate_template 设置为 False 来禁用此行为

template = "我学习{tool}是因为{reason}"
prompt_template = PromptTemplate(template=template, input_variables=["reason"], validate_template=True)

# >> pydantic.v1.error_wrappers.ValidationError: 1 validation error for PromptTemplate\n__root__\nInvalid prompt schema; check for mismatched or missing input parameters. 'tool' (type=value_error)

这里应该是文档版本问题？我使用的 LangChain 版本是 0.0.340（应该是现在的最新版本）

1
2
3

Name: langchain
Version: 0.0.340
Summary: Building applications with LLMs through composability

其中 validate_template 参数的默认值是 FALSE

1 2	validate_template: bool = False """Whether or not to try validating the template."""

如果直接使用 from_template 则会自动识别参数

template = "我学习{tool}是因为{reason}."
prompt_template = PromptTemplate.from_template(template)
print(prompt_template)

# >> input_variables=['reason', 'tool'] template='我学习{tool}是因为{reason}'

格式模板输出

格式方法的输出可作为字符串、消息列表和 ChatPromptValue 使用

以下面 template 为例

# system message
template = "You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)

# human message
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

字符串

# format
result = chat_prompt.format(input_language="English", output_language="French", text="I love programming.")

# 或者 format_prompt 后 to_string
result = chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_string()

# >> System: You are a helpful assistant that translates English to French.\nHuman: I love programming.

消息列表

# format_messages
result = chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

# format_prompt 后 to_messages
result = chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_messages()

# >>  [SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}),\nHumanMessage(content='I love programming.', additional_kwargs={})]

ChatPromptValue

1
2
3

result = chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.")

# >> ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}), HumanMessage(content='I love programming.', additional_kwargs={})])

部分格式化

模板中的 input_variables 不一定需要一次全部格式化完成，可以传入所需值的子集，以创建仅期望剩余子集值的新提示模板

LangChain 提供了两种方式来支持这种操作：

使用字符串值进行部分格式化
使用返回字符串值的函数进行部分格式化

使用字符串值

假设有一个模板拥有两个输入变量 a 和 b，如果在链组件中早期获得了 a 的值，但稍后才获得 b 的值，那么等到两个变量在同一个位置时将它们传递给提示模板可能会很麻烦

prompt = PromptTemplate(template="{a}{b}", input_variables=["a", "b"])
partial_prompt = prompt.partial(a="[param-a]")
print(partial_prompt)
print(partial_prompt.format(b="[param-b]"))

# >> input_variables=['b'] partial_variables={'a': '[param-a]'} template='{a}{b}'
# >> [param-a][param-b]

还可以直接使用 partial_variables 变量初始化提示

1	prompt = PromptTemplate(template="{a}{b}",partial_variables={'a': '[param-a]'},input_variables=["a", "b"])

使用函数

典型的例子是日期或时间，使用一个始终返回当前日期的函数来部分填充提示非常方便

from datetime import datetime

def _get_datetime():
    now = datetime.now()
    return now.strftime("【%Y/%m/%d】")

prompt = PromptTemplate(
    template="告诉我一个{adjective}关于{date}日期的笑话", 
    input_variables=["adjective", "date"]
)
partial_prompt = prompt.partial(date=_get_datetime)
print(partial_prompt.format(adjective="有趣的"))

# >> 告诉我一个有趣的关于【2023/11/28】日期的笑话

少量示例提示模板

创建一个示例集

每个示例应该是一个字典，其中键是输入变量，值是这些输入变量的值

examples = [
  {
    "question": "Who lived longer, Muhammad Ali or Alan Turing?",
    "answer": 
"""
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
"""
  },
  {
    "question": "When was the founder of craigslist born?",
    "answer": 
"""
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
"""
  },
  {
    "question": "Who was the maternal grandfather of George Washington?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
"""
  },
  {
    "question": "Are both the directors of Jaws and Casino Royale from the same country?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
"""
  }
]

创建少量示例的格式化程序

配置一个将少量示例格式化为字符串的格式化程序，该程序是一个 PromptTemplate 对象

1
2
3

example_prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}\nAnswer: {answer}")

print(example_prompt.format(**examples[0]))

Question: Who lived longer, Muhammad Ali or Alan Turing?
Answer:
            Are follow up questions needed here: Yes.
            Follow up: How old was Muhammad Ali when he died?
            Intermediate answer: Muhammad Ali was 74 years old when he died.
            Follow up: How old was Alan Turing when he died?
            Intermediate answer: Alan Turing was 41 years old when he died.
            So the final answer is: Muhammad Ali

将示例和格式化程序提供给 FewShotPromptTemplate

创建一个 FewShotPromptTemplate 对象，该对象接受少量示例和少量示例的格式化程序

prompt = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="Question: {input}", 
    input_variables=["input"]
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?
Answer:
            Are follow up questions needed here: Yes.
            Follow up: How old was Muhammad Ali when he died?
            Intermediate answer: Muhammad Ali was 74 years old when he died.
            Follow up: How old was Alan Turing when he died?
            Intermediate answer: Alan Turing was 41 years old when he died.
            So the final answer is: Muhammad Ali
            

Question: When was the founder of craigslist born?
Answer:
            Are follow up questions needed here: Yes.
            Follow up: Who was the founder of craigslist?
            Intermediate answer: Craigslist was founded by Craig Newmark.
            Follow up: When was Craig Newmark born?
            Intermediate answer: Craig Newmark was born on December 6, 1952.
            So the final answer is: December 6, 1952
            

Question: Who was the maternal grandfather of George Washington?
Answer:
            Are follow up questions needed here: Yes.
            Follow up: Who was the mother of George Washington?
            Intermediate answer: The mother of George Washington was Mary Ball Washington.
            Follow up: Who was the father of Mary Ball Washington?
            Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
            So the final answer is: Joseph Ball
            

Question: Are both the directors of Jaws and Casino Royale from the same country?
Answer:
            Are follow up questions needed here: Yes.
            Follow up: Who is the director of Jaws?
            Intermediate Answer: The director of Jaws is Steven Spielberg.
            Follow up: Where is Steven Spielberg from?
            Intermediate Answer: The United States.
            Follow up: Who is the director of Casino Royale?
            Intermediate Answer: The director of Casino Royale is Martin Campbell.
            Follow up: Where is Martin Campbell from?
            Intermediate Answer: New Zealand.
            So the final answer is: No
            

Question: Who was the father of Mary Ball Washington?

少量示例 chat 提示模板

关于如何最好地使用少样本提示，还没有确定的共识，还没有对此进行任何抽象的确定（我理解是如何在对话类型下表现提示还没有确定的模式）

文档提供了两种形式：

AI 和人类消息交替
系统消息

AI 和人类消息交替

# 系统消息
template = "你是一个乐于助人的助手，能把汉语翻译成猫语"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)

# 示例对话
example_human = HumanMessagePromptTemplate.from_template("你好")
example_ai = AIMessagePromptTemplate.from_template("喵喵~")

# 人类输入对话
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

# 组合 ChatPromptTemplate
chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, example_human, example_ai, human_message_prompt]
)

# format
chat_prompt.format(text = "晚上好")
print(chat_prompt)

# >> input_variables=['text'] messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='你是一个乐于助人的助手，能把汉语翻译成猫语')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='你好')), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='喵喵~')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], template='{text}'))]

系统消息

OpenAI 提供了一个可选的 name 参数，他们建议与系统消息一起使用来进行少样本提示

template = "你是一个乐于助人的助手，能把汉语翻译成猫语"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)

# 区别在这里
# 相当于使用系统消息的 name 区分出示例内容
example_human = SystemMessagePromptTemplate.from_template(
    "你好", additional_kwargs={"name": "示例-用户"}
)
example_ai = SystemMessagePromptTemplate.from_template(
    "喵喵~", additional_kwargs={"name": "示例-AI"}
)

human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, example_human, example_ai, human_message_prompt]
)
chat_prompt.format(text = "晚上好")
print(chat_prompt)

# >> input_variables=['text'] messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='你是一个乐于助人的助手，能把汉语翻译成猫语')), SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='你好'), additional_kwargs={'name': '示例-用户'}), SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='喵喵~'), additional_kwargs={'name': '示例-AI'}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], template='{text}'))]

消息提示模板类型

LangChain 提供了不同类型的 MessagePromptTemplate

最常用的：

AIMessagePromptTemplate：AI 消息
SystemMessagePromptTemplate：系统消息
HumanMessagePromptTemplate：人工消息
ChatMessagePromptTemplate：指定角色名消息

ChatMessagePromptTemplate

对话模型支持使用任意角色的情况下可以使用 ChatMessagePromptTemplate，该模板允许用户指定角色名

from langchain.prompts import ChatMessagePromptTemplate

prompt = "感受这被囚禁了一万年的{subject}"

chat_message_prompt = ChatMessagePromptTemplate.from_template(role="伊利丹", template=prompt)
print(chat_message_prompt.format(subject="怒火"))

# >> content='感受这被囚禁了一万年的怒火' role='伊利丹'

MessagesPlaceholder

MessagesPlaceholder 可以完全控制格式化过程中要呈现的消息

适用于在格式化过程中插入消息列表时（理解为一组消息的占位符）

human_prompt = "用{count}个词语概括我们迄今为止的对话质量"
human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

# 占位消息，name = conversation
chat_prompt = ChatPromptTemplate.from_messages([MessagesPlaceholder(variable_name="conversation"), human_message_template])
print(chat_prompt)

# >> input_variables=['conversation', 'count'] input_types={'conversation': typing.List[typing.Union[langchain.schema.messages.AIMessage, langchain.schema.messages.HumanMessage, langchain.schema.messages.ChatMessage, langchain.schema.messages.SystemMessage, langchain.schema.messages.FunctionMessage, langchain.schema.messages.ToolMessage]]} messages=[MessagesPlaceholder(variable_name='conversation'), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['count'], template='用{count}个词语概括我们迄今为止的对话质量'))]

针对 conversation 占位消息填充真正的消息内容

human_message = HumanMessage(content="学习编程最好的方式是什么")
ai_message = AIMessage(content=
"""
1. 选择一种编程语言：决定你想学习的编程语言
2. 从基础知识开始：熟悉变量、数据类型和控制结构等基本编程概念
3. 实践，实践，实践：学习编程的最佳方式是亲身体验
""")

result = chat_prompt.format_prompt(conversation=[human_message, ai_message], count="3").to_messages()
print(result)

# >> [HumanMessage(content='学习编程最好的方式是什么'), AIMessage(content='\n1. 选择一种编程语言：决定你想学习的编程语言\n2. 从基础知识开始：熟悉变量、数据类型和控制结构等基本编程概念\n3. 实践，实践，实践：学习编程的最佳方式是亲身体验\n'), HumanMessage(content='用3个词语概括我们迄今为止的对话质量')]

序列化

提示的序列化可以方便地共享、存储和版本化提示

在高层次上，序列化遵循以下设计原则：

支持 JSON 和 YAML；希望支持人类在磁盘上可读的序列化方法
支持将所有内容都存储在一个文件中，或者将不同的组件（模板、示例等）存储在不同的文件中并进行引用；这也场景可以进行拆分，如长模板、大型示例、可复用组件等

通过以下唯一的一个入口进行加载

1 2	# 所有的提示都通过 `load_prompt` 函数加载 from langchain.prompts import load_prompt

例如

_type: prompt

input_variables:

    ["adjective", "content"]

template: 

    Tell me a {adjective} joke about {content}.

1 2	prompt = load_prompt("simple_prompt.yaml") print(prompt.format(adjective="funny", content="chickens"))

更多的使用示例可以查看文档序列化（Serialization） | 🦜️🔗 Langchain

管道提示进行组合

想要重用提示的部分时，可以通过 PipelinePrompt 来实现

PipelinePrompt 由两个主要部分组成：

最终提示：返回的最终提示
管道提示：由一个字符串名称和一个提示模板组成的元组列表；每个提示模板将被格式化，然后作为相同名称的变量传递给未来的提示模板

定义最终提示

full_template = """{introduction}

{example}

{start}"""
full_prompt = PromptTemplate.from_template(full_template)

introduction 提示

1 2	introduction_template = """你在模仿{person}""" introduction_prompt = PromptTemplate.from_template(introduction_template)

example 提示

example_template = """下面是一个交互的示例：
Q: {example_q}
A: {example_a}"""
example_prompt = PromptTemplate.from_template(example_template)

start 提示

start_template = """现在开始这么做！

Q: {question}
A:"""
start_prompt = PromptTemplate.from_template(start_template)

通过 PipelinePrompt 组装

input_prompts = [
    ("introduction", introduction_prompt),
    ("example", example_prompt),
    ("start", start_prompt)
]
pipeline_prompt = PipelinePromptTemplate(final_prompt=full_prompt, pipeline_prompts=input_prompts)

format PipelinePrompt

print(pipeline_prompt.format(
    person="雷军",
    example_q="你最喜欢的手机品牌是什么？",
    example_a="小米",
    question="你最喜欢的杀毒软件是什么？"
))

You are impersonating Elon Musk.
Here's an example of an interaction: 
Q: What's your favorite car?
A: Telsa
Now, do this for real!

Q: What's your favorite social media site?
A:

自定义提示模板

自定义模板也分为两种类型：

字符串提示模板
聊天提示模板

在本指南中，我们将使用字符串提示模板创建自定义提示

要创建一个自定义的字符串提示模板，需要满足两个要求：

它具有 input_variables 属性，公开了提示模板预期的输入变量
它公开了一个 format 方法，该方法接受与预期的 input_variables 相对应的关键字参数，并返回格式化后的提示

下面实现了构造分析函数源码的提示的模板

from langchain.prompts import StringPromptTemplate
from pydantic import v1 as pydantic_v1
import inspect


class FunctionExplainerPromptTemplate(StringPromptTemplate, BaseModel):
    """A custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function."""

    # input_variables
    @pydantic_v1.validator("input_variables")
    def validate_input_variables(cls, v):
        """Validate that the input variables are correct."""
        if len(v) != 1 or "function_name" not in v:
            raise ValueError("function_name must be the only input_variable.")
        return v

    # format
    def format(self, **kwargs) -> str:
        # Get the source code of the function
        source_code = get_source_code(kwargs["function_name"])

        # Generate the prompt to be sent to the language model
        prompt = f"""
        给定函数名称和源码，生成函数的英文解释
        函数名称: {kwargs["function_name"].__name__}
        源码:
        {source_code}
        解释:
        """
        return prompt

    def _prompt_type(self):
        return "function-explainer"

# 获取源码字符串
def get_source_code(function_name):
    # Get the source code of the function
    return inspect.getsource(function_name)

使用

fn_explainer = FunctionExplainerPromptTemplate(input_variables=["function_name"])
# Generate a prompt for the function "get_source_code"
prompt = fn_explainer.format(function_name=get_source_code)
print(prompt)

需要注意这里官方文档给的示例可能存在问题，报错信息

1
2

class FunctionExplainerPromptTemplate(StringPromptTemplate, BaseModel):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

和 pydantic 库的版本有关，v2 版本下官方代码会存在不兼容问题，上面代码块使用 from pydantic import v1 as pydantic_v1 来指定了 v1 版本

特征存储

特征存储是传统机器学习中的一个概念，确保输入模型的数据是最新的且相关的；有关更多信息，请参见：What Is a Feature Store? | Tecton

LangChain 提供了一种将这些数据与 LLMs 结合的简单方法

文档中展示如何将提示模板与特征存储连接起来，基本思想是在提示模板内部调用特征存储以检索值，然后将其格式化为提示

LangChain 文档中对接如下特征存储工具：

Feast：流行的开源特征存储框架
Tecton：完全托管的特征平台，专为协调完整的 ML 特征生命周期而构建，包括从转换到在线服务的全过程，并提供企业级的 SLA
Featureform：开源的、企业级的特征存储

参考

提示(Prompts) | 🦜️🔗 Langchain