如何構建通用 LLM Agent

1. LLM Agent 是什麼

大語言模型智能體是一種程序，其執行邏輯由其底層模型控制。

大語言模型智能體與少樣本提示或固定工作流程等方法的區別在於，它能夠定義和調整執行用戶查詢所需的步驟。通過使用一組工具（如代碼執行或網絡搜索），智能體可以決定使用哪個工具、如何使用它，並根據輸出對結果進行迭代。這種適應性使系統能夠以最小的配置處理各種用例。

智能體架構存在於一個範圍內，從固定工作流程的可靠性到自主智能體的靈活性。例如，像檢索增強生成（RAG）這樣的固定流程可以通過自我反思循環進行增強，使程序在初始響應不足時能夠進行迭代。或者，一個反應式（ReAct）智能體可以配備固定流程作爲工具，提供一種靈活而結構化的方法。架構的選擇最終取決於用例以及在可靠性和靈活性之間的期望權衡。

2. 構建通用型大語言模型智能體的步驟

2.1 選擇合適的大語言模型

選擇合適的模型對於實現期望的性能至關重要。需要考慮多個因素，如許可證、成本和語言支持。構建大語言模型智能體時，最重要的考慮因素是模型在關鍵任務（如編碼、工具調用和推理）上的性能。評估基準包括：

大規模多任務語言理解（MMLU）（推理）
伯克利函數調用排行榜（工具選擇與工具調用）
HumanEval 和 BigCodeBench（編碼）
模型的上下文窗口也是一個關鍵因素。智能體工作流程可能會消耗大量 token，有時甚至超過 10 萬個，較大的上下文窗口非常有幫助。
可供考慮的模型：
前沿模型（GPT4 - o、Claude 3.5）；
開源模型（Llama3.2、Qwen2.5）。

一般來說，較大的模型往往性能更好，但能夠在本地運行的較小模型仍然是一個不錯的選擇。使用較小的模型時，你將侷限於更簡單的用例，並且可能只能將智能體連接到一兩個基本工具。

2.2 定義智能體的控制邏輯（即通信結構）

LLM 和智能體之間的主要區別在於系統提示。在 LLM 的背景下，系統提示是在模型處理用戶查詢之前提供給它的一組指令和上下文信息。大語言模型的智能體行爲可以在系統提示中進行編碼。以下是一些常見的智能體模式，可以根據需要進行定製：

工具使用：智能體決定何時將查詢路由到適當的工具或依賴自身知識。
反思：智能體在回答用戶之前審查並糾正自己的答案。大多數大語言模型系統也可以添加反思步驟。
推理 - 行動（ReAct）：智能體迭代地推理如何解決查詢，執行一個動作，觀察結果，並決定是否採取另一個動作或提供響應。
計劃 - 執行：智能體預先計劃，將任務分解爲子步驟（如果需要），然後執行每個步驟。

最後兩種模式（ReAct 和計劃 - 執行）通常是構建通用型單智能體的最佳起點。

爲了有效地實現這些行爲，你需要進行一些提示詞工程。你可能還想使用結構化生成技術。這基本上意味着塑造大語言模型的輸出以匹配特定的格式或模式，以便智能體的響應與你所期望的通信風格保持一致。

示例：以下是來自 Bee Agent 框架的反應式（ReAct）風格智能體的提示 (https://github.com/i-am-bee/bee-agent-framework/blob/main/src/agents/bee/prompts.ts)。

# Communication structure
You communicate only in instruction lines. The format is: "Instruction: expected output\n". You must only use these instruction lines and must not enter empty lines between them. Each instruction must start on a new line.
{{#tools.length}}
You must skip the instruction lines Function Name, Function Input and Function Output if no function calling is required.
{{/tools.length}}
Message: User's message. You never use this instruction line.
{{^tools.length}}
Thought: A single-line plan of how to answer the user's message, including an explanation of the reasoning behind it. It must be immediately followed by Final Answer.
{{/tools.length}}
{{#tools.length}}
Thought: A single-line step-by-step plan of how to answer the user's message, including an explanation of the reasoning behind it. You can use the available functions defined above. This instruction line must be immediately followed by Function Name if one of the available functions defined above needs to be called, or by Final Answer. Do not provide the answer here.
Function Name: Name of the function. This instruction line must be immediately followed by Function Input.
Function Input: Function parameters. Empty object is a valid parameter.
Function Output: Output of the function in JSON format.
Thought: Continue your thinking process.
{{/tools.length}}
Final Answer: Answer the user or ask for more information or clarification. It must always be preceded by Thought.
## Examples
Message: Can you translate "How are you" into French?
Thought: The user wants to translate a text into French. I can do that.
Final Answer: Comment vas-tu?

2.3 定義智能體的核心指令

我們往往認爲大語言模型開箱即用就帶有許多功能。其中一些功能很棒，但其他功能可能不完全符合你的需求。爲了獲得你期望的性能，在系統提示中明確列出你想要和不想要的所有功能非常重要。這可能包括以下指令：

智能體名稱和角色：智能體的名稱以及它的用途。
語氣和簡潔性：它應該聽起來多麼正式或隨意，以及應該多麼簡短。
何時使用工具：決定何時依賴外部工具而不是模型自身的知識。
處理錯誤：當工具或過程出現問題時，智能體應該怎麼做。

示例：以下是 Bee Agent 框架中指令部分的一個片段。

# Instructions
User can only see the Final Answer, all answers must be provided there.
{{^tools.length}}
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by Final Answer.
{{/tools.length}}
{{#tools.length}}
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by either Function Name or Final Answer.
You must use Functions to retrieve factual or historical information to answer the message.
{{/tools.length}}
If the user suggests using a function that is not available, answer that the function is not available. You can suggest alternatives if appropriate.
When the message is unclear or you need more information from the user, ask in Final Answer.
# Your capabilities
Prefer to use these capabilities over functions.
- You understand these languages: English, Spanish, French.
- You can translate, analyze and summarize, even long documents.
# Notes
- If you don't know the answer, say that you don't know.
- The current time and date in ISO format can be found in the last message.
- When answering the user, use friendly formats for time and date.
- Use markdown syntax for formatting code snippets, links, JSON, tables, images, files.
- Sometimes, things don't go as planned. Functions may not provide useful information on the first few tries. You should always try a few different approaches before declaring the problem unsolvable.
- When the function doesn't give you what you were asking for, you must either use another function or a different function input.
  - When using search engines, you try different formulations of the query, possibly even in a different language.
- You cannot do complex calculations, computations, or data manipulations without using functions.

2.4 定義和優化核心工具

工具賦予智能體超能力。通過一組定義明確的有限工具，你可以實現廣泛的功能。關鍵工具包括代碼執行、網絡搜索、文件讀取和數據分析。對於每個工具，你需要定義以下內容並將其作爲系統提示的一部分：

工具名稱：功能的唯一、描述性名稱。
工具描述：清晰解釋工具的作用以及何時使用它。這有助於智能體確定何時選擇正確的工具。
工具輸入模式：一個模式，概述必需和可選參數、它們的類型以及任何約束。智能體根據用戶查詢使用此模式填寫所需的輸入。
指向運行工具的位置 / 方式。

示例：以下是來自 Langchain (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/tools/arxiv/tool.py) 社區的 Arxiv 工具實現的摘錄。此實現需要一個 ArxivAPIWrapper 實現 (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/arxiv.py)。

"""Tool for the Arxiv API."""
from typing import Optional, Type
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from langchain_community.utilities.arxiv import ArxivAPIWrapper
class ArxivInput(BaseModel):
    """Input for the Arxiv tool."""
    query: str = Field(description="search query to look up")
class ArxivQueryRun(BaseTool):  # type: ignore[override, override]
    """Tool that searches the Arxiv API."""
    name: str = "arxiv"
    description: str = (
        "A wrapper around Arxiv.org "
        "Useful for when you need to answer questions about Physics, Mathematics, "
        "Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "
        "Electrical Engineering, and Economics "
        "from scientific articles on arxiv.org. "
        "Input should be a search query."
    )
    api_wrapper: ArxivAPIWrapper = Field(default_factory=ArxivAPIWrapper)  # type: ignore[arg-type]
    args_schema: Type[BaseModel] = ArxivInput
    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the Arxiv tool."""
        return self.api_wrapper.run(query)

在某些情況下，仍需要優化工具以獲得期望的性能。這可能涉及通過一些提示工程調整工具名稱或描述，設置高級配置來處理常見錯誤，或者過濾工具的輸出。

2.5 決定內存處理策略

大語言模型受其上下文窗口的限制，即它們一次可以 “記住” 的令牌數量。在多輪對話中的過去交互、冗長的工具輸出或智能體所基於的額外上下文等情況下，內存可能會很快填滿。這就是爲什麼擁有可靠的內存處理策略至關重要。

在智能體中，內存是指系統存儲、記憶和利用過去交互信息的能力。這使智能體能夠隨着時間的推移保持上下文，根據先前的交流改進其響應，並提供更個性化的體驗。

常見內存處理策略：

滑動內存：在內存中保留最後 k 輪對話，並丟棄更早的對話。
token 內存：保留最後 n 個 token，丟棄其餘的。
總結式內存：使用大語言模型在每一輪對話時總結對話內容，並丟棄單個消息

此外，你還可以讓大語言模型檢測關鍵時刻並存儲在長期記憶中。這使得智能體能夠 “記住” 關於用戶的重要事實，從而使體驗更加個性化。

到目前爲止我們所涵蓋的五個步驟爲設置智能體奠定了基礎。但是，如果在這個階段我們通過大語言模型運行用戶查詢會發生什麼呢？

以下是可能出現的情況的一個示例：

User Message: Extract key insighs from this dataset
Files: bill-of-materials.csv
Thought: First, I need to inspect the columns of the dataset and provide basic data statistics.
Function Name: Python
Function Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]}
Function Output:

在這一點上，智能體會產生原始文本輸出。那麼我們如何讓它實際執行下一步呢？這就是解析和編排發揮作用的地方。

2.6 解析智能體的原始輸出

解析器是一種將原始數據轉換爲應用程序可以理解和處理的格式（如具有屬性的對象）的函數。

對於我們正在構建的智能體，解析器需要識別我們在 2.2 中定義的通信結構，並返回結構化輸出，如 JSON。這使應用程序更容易處理和執行智能體的下一步驟。

注意：一些模型提供商（如 OpenAI）默認可以返回可解析的輸出。對於其他模型，尤其是開源模型，這需要進行配置。

2.7 編排 Agent 的下一步驟

最後一步是設置編排邏輯。這決定了大語言模型輸出結果後會發生什麼。根據輸出，你將：

執行工具調用，或者
返回答案，即對用戶查詢的最終響應或要求更多信息的後續請求。

如果觸發了工具調用，工具的輸出將被髮送回大語言模型（作爲其工作記憶的一部分）。然後，大語言模型將決定如何處理這個新信息：要麼執行另一個工具調用，要麼向用戶返回答案。

以下是這種編排邏輯在代碼中可能呈現的示例：

def orchestrator(llm_agent, llm_output, tools, user_query):
    """
    Orchestrates the response based on LLM output and iterates if necessary.
    Parameters:
    - llm_agent (callable): The LLM agent function for processing tool outputs.
    - llm_output (dict): Initial output from the LLM, specifying the next action.
    - tools (dict): Dictionary of available tools with their execution methods.
    - user_query (str): The original user query.
    Returns:
    - str: The final response to the user.
    """
    while True:
        action = llm_output.get("action")
        if action == "tool_call":
            # Extract tool name and parameters
            tool_name = llm_output.get("tool_name")
            tool_params = llm_output.get("tool_params", {})
            if tool_name in tools:
                try:
                    # Execute the tool
                    tool_result = tools[tool_name](**tool_params)
                    # Send tool output back to the LLM agent for further processing
                    llm_output = llm_agent({"tool_output": tool_result})
                except Exception as e:
                    return f"Error executing tool '{tool_name}': {str(e)}"
            else:
                return f"Error: Tool '{tool_name}' not found."
        elif action == "return_answer":
            # Return the final answer to the user
            return llm_output.get("answer", "No answer provided.")
        else:
            return "Error: Unrecognized action type from LLM output."

現在你擁有了一個能夠處理各種各樣場景的系統，從競爭分析和高級研究到自動化複雜工作流程。

2.8 多智能體系統在什麼情況下適用

雖然這一代大語言模型非常強大，但它們有一個關鍵限制：它們難以處理信息過載。過多的上下文或工具可能會使模型不堪重負，導致性能問題。通用型單智能體最終會遇到這個瓶頸，尤其是因爲 Agent 通常消耗大量 token。

對於某些應用場景而言，採用多智能體的設置可能更合理。通過將職責分配給多個智能體，你能夠避免單個大語言模型智能體的上下文負擔過重，並提高整體效率。

話雖如此，通用的單智能體設置對於製作原型來說是一個很好的開始。它可以幫助你快速測試你的使用案例，並確定在哪些地方開始出現問題。通過這個過程，你可以：

瞭解任務的哪些部分確實能從智能體方法中獲益。
確定在更大的工作流程中可作爲獨立流程拆分出來的組件。

從單個智能體入手能讓你獲得寶貴的見解，以便在擴展到更復雜的系統時改進你的方法。

3. 入門建議

使用框架是快速測試和迭代智能體配置的好方法。

如果你計劃使用像 Llama 3 這樣的開源模型，可以嘗試 Bee Agent Framework （https://github.com/i-am-bee/bee-agent-framework）的入門模板（https://github.com/i-am-bee/bee-agent-framework-starter）。
如果你計劃使用像 OpenAI 這樣的前沿模型，可以嘗試 LangGraph 的教程（https://langchain-ai.github.io/langgraph/how-tos/react-agent-from-scratch/#define-nodes-and-edges）。

原文鏈接：https://towardsdatascience.com/build-a-general-purpose-ai-agent-c40be49e7400

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/seToClUrbHn9ZA2bjskvlA