Dev (#1618)

* 增加了仅限GPT4的agent功能，陆续补充，中文版readme已写 * issue提到的一个bug * 温度最小改成0，但是不应该支持负数 * 修改了最小的温度 * 增加了部分Agent支持和修改了启动文件的部分bug * 修改了GPU数量配置文件 * 1 1 * 修复配置文件错误 * 更新readme，稳定测试
2023-09-28 20:19:26 +08:00 · 2023-09-28 20:19:26 +08:00 · efd8edda16
parent 8fa99026c8
commit efd8edda16
16 changed files with 229 additions and 144 deletions
--- a/README.md
+++ b/README.md
@ -57,6 +57,25 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch

 ---

+## 环境最低要求
+
+想顺利运行本代码，请按照以下的最低要求进行配置：
+ Python版本: >= 3.8.5, < 3.11
+ Cuda版本: >= 11.7, 且能顺利安装Python
+
+如果想要顺利在GPU运行本地模型(int4版本)，你至少需要以下的硬件配置:
+
+ chatglm2-6b & LLaMA-7B  最低显存要求: 7GB   推荐显卡: RTX 3060, RTX 2060
+ LLaMA-13B 最低显存要求: 11GB  推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000 
+ Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090
+ LLaMA-30B 最低显存要求: 22GB  推荐显卡：RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40 
+ LLaMA-65B 最低显存要求: 22GB  推荐显卡：A100,A40,A6000
+
+如果是int8 则显存x1.5 fp16 x2.5的要求
+如：使用fp16 推理Qwen-7B-Chat 模型 则需要使用16GB显存。
+
+以上仅为估算，实际情况以nvidia-smi占用为准。
+
 ## 变更日志

 参见 [版本更新日志](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)。
@ -112,7 +131,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
 - [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
 - [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
 - [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
+- [Qwen/Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
 - [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
 - [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat) and others
 - [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
@ -159,9 +178,11 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
 - [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
 - [nghuyong/ernie-3.0-nano-zh](https://huggingface.co/nghuyong/ernie-3.0-nano-zh)
 - [nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
+- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
+- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-large-zh)
 - [OpenAI/text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings)

-项目中默认使用的 Embedding 类型为 `moka-ai/m3e-base`，如需使用其他 Embedding 类型，请在 [configs/model_config.py] 中对 `embedding_model_dict` 和 `EMBEDDING_MODEL` 进行修改。
+项目中默认使用的 Embedding 类型为 `sensenova/piccolo-base-zh`，如需使用其他 Embedding 类型，请在 [configs/model_config.py] 中对 `embedding_model_dict` 和 `EMBEDDING_MODEL` 进行修改。

 ---

@ -199,17 +220,17 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch

 ### 构建自己的Agent工具

-详见 (docs/自定义Agent.md)
+详见 [自定义Agent说明](docs/自定义Agent.md)

 ## Docker 部署

-🐳 Docker 镜像地址: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3)`
+🐳 Docker 镜像地址: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)`

 ```shell
-docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3
+docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
 ```

- 该版本镜像大小 `35.3GB`，使用 `v0.2.3`，以 `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` 为基础镜像
+- 该版本镜像大小 `35.3GB`，使用 `v0.2.5`，以 `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` 为基础镜像
 - 该版本内置两个 `embedding` 模型：`m3e-large`，`text2vec-bge-large-chinese`，默认启用后者，内置 `chatglm2-6b-32k`
 - 该版本目标为方便一键部署使用，请确保您已经在Linux发行版上安装了NVIDIA驱动程序
 - 请注意，您不需要在主机系统上安装CUDA工具包，但需要安装 `NVIDIA Driver` 以及 `NVIDIA Container Toolkit`，请参考[安装指南](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
--- a/README_en.md
+++ b/README_en.md
@ -56,6 +56,25 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch

 ---

+## Environment Minimum Requirements
+
+To run this code smoothly, please configure it according to the following minimum requirements:
+ Python version: >= 3.8.5, < 3.11
+ Cuda version: >= 11.7, with Python installed.
+
+If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration.
+
+ chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060
+ LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000 
+ Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090
+ LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40 
+ Minimum memory requirement for LLaMA-65B: 22GB Recommended cards: A100,A40,A6000
+
+If int8 then memory x1.5 fp16 x2.5 requirement.
+For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory.
+
+The above is only an estimate, the actual situation is based on nvidia-smi occupancy.
+
 ## Change Log

 plese refer to [version change log](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)
@ -105,7 +124,7 @@ The project use [FastChat](https://github.com/lm-sys/FastChat) to provide the AP
 - [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
 - [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
 - [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
+- [Qwen/Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
 - [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
 - [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat) and other models of FlagAlpha
 - [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
@ -117,7 +136,19 @@ The project use [FastChat](https://github.com/lm-sys/FastChat) to provide the AP
 * Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)(任何 [EleutherAI](https://huggingface.co/EleutherAI) 的 pythia 模型，如 [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b))
 * Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a model above. To activate, must have `peft` in the model path. Note: If loading multiple peft models, you can have them share the base model weights by setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model worker.

-Please refer to `llm_model_dict` in `configs.model_configs.py.example` to invoke OpenAI API.
+
+The above model support list may be updated continuously as [FastChat](https://github.com/lm-sys/FastChat) is updated, see [FastChat Supported Models List](https://github.com/lm-sys/FastChat/blob/main /docs/model_support.md).
+In addition to local models, this project also supports direct access to online models such as OpenAI API, Wisdom Spectrum AI, etc. For specific settings, please refer to the configuration information of `llm_model_dict` in `configs/model_configs.py.example`.
+Online LLM models are currently supported:
+
+- [ChatGPT](https://api.openai.com)
+- [Smart Spectrum AI](http://open.bigmodel.cn)
+- [MiniMax](https://api.minimax.chat)
+- [Xunfei Starfire](https://xinghuo.xfyun.cn)
+- [Baidu Qianfan](https://cloud.baidu.com/product/wenxinworkshop?track=dingbutonglan)
+- [Aliyun Tongyi Qianqian](https://dashscope.aliyun.com/)
+
+The default LLM type used in the project is `THUDM/chatglm2-6b`, if you need to use other LLM types, please modify `llm_model_dict` and `LLM_MODEL` in [configs/model_config.py].

 ### Supported Embedding models

@ -130,6 +161,8 @@ Following models are tested by developers with Embedding class of [HuggingFace](
 - [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh)
 - [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh)
 - [BAAI/bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct)
+- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
+- [sensenova/piccolo-large-zh](https://huggingface.co/sensenova/piccolo-large-zh)
 - [shibing624/text2vec-base-chinese-sentence](https://huggingface.co/shibing624/text2vec-base-chinese-sentence)
 - [shibing624/text2vec-base-chinese-paraphrase](https://huggingface.co/shibing624/text2vec-base-chinese-paraphrase)
 - [shibing624/text2vec-base-multilingual](https://huggingface.co/shibing624/text2vec-base-multilingual)
@ -138,16 +171,24 @@ Following models are tested by developers with Embedding class of [HuggingFace](
 - [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
 - [nghuyong/ernie-3.0-nano-zh](https://huggingface.co/nghuyong/ernie-3.0-nano-zh)
 - [nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
+- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
+- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-large-zh)
 - [OpenAI/text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings)

+The default Embedding type used in the project is `sensenova/piccolo-base-zh`, if you want to use other Embedding types, please modify `embedding_model_dict` and `embedding_model_dict` and `embedding_model_dict` in [configs/model_config.py]. MODEL` in [configs/model_config.py].
+
+### Build your own Agent tool!
+
+See [Custom Agent Instructions](docs/自定义Agent.md) for details.
+
 ---

 ## Docker Deployment

-🐳 Docker image path: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0)`
+🐳 Docker image path: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)`

 ```shell
-docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0
+docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
 ```

 - The image size of this version is `33.9GB`, using `v0.2.0`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image
--- a/configs/model_config.py.example
+++ b/configs/model_config.py.example
@ -83,12 +83,14 @@ MODEL_PATH = {
        "opt-iml-max-30b":"facebook/opt-iml-max-30b",

        "Qwen-7B":"Qwen/Qwen-7B",
-        "Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",  
+        "Qwen-14B":"Qwen/Qwen-14B",
+        "Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
+        "Qwen-14B-Chat":"Qwen/Qwen-14B-Chat",
    },
 }

 # 选用的 Embedding 名称
-EMBEDDING_MODEL = "m3e-base"
+EMBEDDING_MODEL = "piccolo-large-zh" # 最新的嵌入式sota模型

 # Embedding 模型运行设备。设为"auto"会自动检测，也可手动设定为"cuda","mps","cpu"其中之一。
 EMBEDDING_DEVICE = "auto"
@ -221,6 +223,8 @@ VLLM_MODEL_DICT = {
    "opt-iml-max-30b":"facebook/opt-iml-max-30b",

    "Qwen-7B":"Qwen/Qwen-7B",
-    "Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",   
-   
+    "Qwen-14B":"Qwen/Qwen-14B",
+    "Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
+    "Qwen-14B-Chat":"Qwen/Qwen-14B-Chat",
+
 }
--- a/configs/server_config.py.example
+++ b/configs/server_config.py.example
@ -1,5 +1,5 @@
+import sys
 from configs.model_config import LLM_DEVICE
-import httpx

 # httpx 请求默认超时时间（秒）。如果加载模型或对话较慢，出现超时错误，可以适当加大该值。
 HTTPX_DEFAULT_TIMEOUT = 300.0
--- a/docs/自定义Agent.md
+++ b/docs/自定义Agent.md
@ -1,45 +1,50 @@
 ## 自定义属于自己的Agent
-### 1. 创建自己的Agent的py文件
-开发者在```server/agent```文件中创建一个自己的文件，并将其添加到```tools.py```中。
+### 1. 创建自己的Agent工具
+ 开发者在```server/agent```文件中创建一个自己的文件，并将其添加到```tools.py```中。这样就完成了Tools的设定。

-例如，您创建了一个```custom_agent.py```文件，其中包含一个```work```函数，那么您需要在```tools.py```中添加如下代码：
+ 当您创建了一个```custom_agent.py```文件，其中包含一个```work```函数，那么您需要在```tools.py```中添加如下代码：
 ```python
 from custom_agent import work
 Tool.from_function(
-        func=work,
-        name="该函数的名字",
-        description=""
+    func=work,
+    name="该函数的名字",
+    description=""
    )
 ```
+ 请注意，如果你确定在某一个工程中不会使用到某个工具，可以将其从Tools中移除，降低模型分类错误导致使用错误工具的风险。

 ### 2. 修改 custom_template.py文件
 开发者需要根据自己选择的大模型设定适合该模型的Agent Prompt和自自定义返回格式。
-在我们的代码中，提供了默认的两种方式，一种是适配于GPT的提示词：
+在我们的代码中，提供了默认的两种方式，一种是适配于GPT和Qwen的提示词：
 ```python
-template = """Answer the following questions as best you can， You have access to the following tools:
-{tools}
-Use the following format:
-
-Question: the input question you must answer
-Thought: you should always think about what to do
-Action: the action to take, should be one of [{tool_names}]
-Action Input: the input to the action
-Observation: the result of the action
-... (this Thought/Action/Action Input/Observation can repeat N times)
-Thought: I now know the final answer
-Final Answer: the final answer to the original input question
-
-Begin!
-
-Previous conversation history:
-{history}
-
-New question: {input}
-{agent_scratchpad}"""
+"""
+    Answer the following questions as best you can. You have access to the following tools:
+    
+    {tools}
+    Use the following format:
+    
+    Question: the input question you must answer
+    Thought: you should always think about what to do
+    Action: the action to take, should be one of [{tool_names}]
+    Action Input: the input to the action
+    Observation: the result of the action
+    ... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
+    Thought: I now know the final answer
+    Final Answer: the final answer to the original input question
+    
+    Begin!
+    
+    history:
+    {history}
+    
+    Question: {input}
+    Thought: {agent_scratchpad}
+"""
 ```
+
 另一种是适配于GLM-130B的提示词：
 ```python
-template = """
+"""
 尽可能地回答以下问题。你可以使用以下工具:{tools}
 请按照以下格式进行:
 Question: 需要你回答的输入问题
@ -47,7 +52,7 @@ Thought: 你应该总是思考该做什么
 Action: 需要使用的工具，应该是[{tool_names}]中的一个
 Action Input: 传入工具的内容
 Observation: 行动的结果
-           ... (这个Thought/Action/Action Input/Observation可以重复N次)
+       ... (这个Thought/Action/Action Input/Observation可以重复N次)
 Thought: 我现在知道最后的答案
 Final Answer: 对原始输入问题的最终答案

@ -57,7 +62,8 @@ Final Answer: 对原始输入问题的最终答案
 {history}

 New question: {input}
-Thought: {agent_scratchpad}"""
+Thought: {agent_scratchpad}
+"""
 ```

 ### 3. 局限性
@ -70,4 +76,5 @@ Thought: {agent_scratchpad}"""
 我们为开发者编写了三个运用大模型执行的Agent，分别是：
 1. 翻译工具，实现对输入的任意语言翻译。
 2. 数学工具，使用LLMMathChain 实现数学计算。
-3. 天气工具，使用自定义的LLMWetherChain实现天气查询，调用和风天气API。
+3. 天气工具，使用自定义的LLMWetherChain实现天气查询，调用和风天气API。
+4. 我们支持Langchain支持的Agent工具，在代码中，我们已经提供了Shell和Google Search两个工具的实现。
--- a/server/agent/init.py
+++ b/server/agent/init.py
--- a/server/agent/callbacks.py
+++ b/server/agent/callbacks.py
@ -56,7 +56,6 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):

    async def on_tool_error(self, error: Exception | KeyboardInterrupt, *, run_id: UUID,
                            parent_run_id: UUID | None = None, tags: List[str] | None = None, **kwargs: Any) -> None:
-        self.out = True
        self.cur_tool.update(
            status=Status.error,
            error=str(error),
@ -65,19 +64,19 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):

    async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        if token:
-            if token == "Action":
+            if "Action" in token:
                self.out = False
                self.cur_tool.update(
                    status=Status.running,
                    llm_token="\n\n",
                )
-
+                self.queue.put_nowait(dumps(self.cur_tool))
            if self.out:
                self.cur_tool.update(
                    status=Status.running,
                    llm_token=token,
                )
-            self.queue.put_nowait(dumps(self.cur_tool))
+                self.queue.put_nowait(dumps(self.cur_tool))

    async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
        self.cur_tool.update(
@ -95,6 +94,7 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):
        self.queue.put_nowait(dumps(self.cur_tool))

    async def on_llm_error(self, error: Exception | KeyboardInterrupt, **kwargs: Any) -> None:
+        self.out = True
        self.cur_tool.update(
            status=Status.error,
            error=str(error),
--- a/server/agent/custom_template.py
+++ b/server/agent/custom_template.py
@ -1,57 +1,9 @@
-template = """
-尽可能地回答以下问题。你可以使用以下工具:{tools}
-请按照以下格式进行:  
-
-Question: 需要你回答的输入问题。 
-Thought: 你应该总是思考该做什么，并告诉我你要用什么工具。  
-Action: 需要使用的工具，应该是[{tool_names}]中的一个  
-Action Input: 传入工具的内容  
-Observation: 行动的结果
-           ... (这个Thought/Action/Action Input/Observation可以重复N次)
-Thought: 通过使用工具，我是否知道了答案，如果知道，就自然的回答问题，如果不知道，继续使用工具或者自己的知识 \n
-Final Answer: 这个问题的答案是，输出完整的句子。
-现在开始！ 
-
-之前的对话: 
-{history}
-New question: 
-{input} 
-Thought:  
-{agent_scratchpad}"""
-
-
-# ChatGPT 提示词模板
-# template = """Answer the following questions as best you can， You have access to the following tools:
-# {tools}
-# Use the following format:
-#
-# Question: the input question you must answer
-# Thought: you should always think about what to do
-# Action: the action to take, should be one of [{tool_names}]
-# Action Input: the input to the action
-# Observation: the result of the action
-# ... (this Thought/Action/Action Input/Observation can repeat N times)
-# Thought: I now know the final answer
-# Final Answer: the final answer to the original input question
-#
-# Begin!
-#
-# Previous conversation history:
-# {history}
-#
-# New question: {input}
-# {agent_scratchpad}"""
-
-
-from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
+from langchain.agents import Tool, AgentOutputParser
 from langchain.prompts import StringPromptTemplate
-from langchain.llms import OpenAI
-from langchain.utilities import SerpAPIWrapper
-from langchain.chains import LLMChain
 from typing import List, Union
-from langchain.schema import AgentAction, AgentFinish, OutputParserException
-from server.agent.tools import tools
+from langchain.schema import AgentAction, AgentFinish
 import re
+
 class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
@ -73,15 +25,9 @@ class CustomPromptTemplate(StringPromptTemplate):
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(**kwargs)
-
-prompt = CustomPromptTemplate(
-    template=template,
-    tools=tools,
-    input_variables=["input", "intermediate_steps", "history"]
-)
 class CustomOutputParser(AgentOutputParser):

-    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
+    def parse(self, llm_output: str) -> AgentFinish | AgentAction | str:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
@ -101,10 +47,18 @@ class CustomOutputParser(AgentOutputParser):
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
-        return AgentAction(
+        try:
+            ans = AgentAction(
            tool=action,
            tool_input=action_input.strip(" ").strip('"'),
            log=llm_output
-        )
+            )
+            return ans
+        except:
+            return AgentFinish(
+                return_values={"output": f"调用agent失败: `{llm_output}`"},
+                log=llm_output,
+            )
+


--- a/server/agent/google_search.py
+++ b/server/agent/google_search.py
@ -0,0 +1,8 @@
+import os
+os.environ["GOOGLE_CSE_ID"] = ""
+os.environ["GOOGLE_API_KEY"] = ""
+
+from  langchain.tools import GoogleSearchResults
+def google_search(query: str):
+    tool = GoogleSearchResults()
+    return tool.run(tool_input=query)
--- a/server/agent/knoledge.py
+++ b/server/agent/knoledge.py
--- a/server/agent/shell.py
+++ b/server/agent/shell.py
@ -0,0 +1,5 @@
+from langchain.tools import ShellTool
+def shell(query: str):
+    tool = ShellTool()
+    return tool.run(tool_input=query)
+
--- a/server/agent/tools.py
+++ b/server/agent/tools.py
@ -1,28 +1,40 @@
-
 import sys
 import os
+
 sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))

 from server.agent.math import calculate
 from server.agent.translator import translate
 from server.agent.weather import weathercheck
+from server.agent.shell import shell
+from server.agent.google_search import google_search
 from langchain.agents import Tool

 tools = [
    Tool.from_function(
        func=calculate,
        name="计算器工具",
-        description=""
+        description="进行简单的数学运算"
    ),
    Tool.from_function(
        func=translate,
        name="翻译工具",
-        description=""
+        description="翻译各种语言"
    ),
    Tool.from_function(
        func=weathercheck,
        name="天气查询工具",
-        description="",
+        description="查询天气",
+    ),
+    Tool.from_function(
+        func=shell,
+        name="shell工具",
+        description="使用命令行工具输出",
+    ),
+    Tool.from_function(
+        func=google_search,
+        name="谷歌搜索工具",
+        description="使用谷歌搜索",
    )
 ]
 tool_names = [tool.name for tool in tools]
--- a/server/agent/translator.py
+++ b/server/agent/translator.py
@ -23,13 +23,18 @@ ${{翻译结果}}
 ```
 答案: ${{答案}}

-以下是一个例子
+以下是两个例子
 问题: 翻译13成英语
 ```text
-13 English
+13 英语
 ```output
 thirteen
-答案: thirteen
+以下是两个例子
+问题: 翻译 我爱你 成法语
+```text
+13 法语
+```output
+Je t'aime.
 '''

 PROMPT = PromptTemplate(
--- a/server/agent/weather.py
+++ b/server/agent/weather.py
@ -1,12 +1,14 @@
 ## 使用和风天气API查询天气
-
 from __future__ import annotations
+
+## 单独运行的时候需要添加
 import sys
 import os
+# sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
+

 from server.utils import get_ChatOpenAI

-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))

 import re
 import warnings
@ -25,6 +27,8 @@ import requests
 from typing import List, Any, Optional
 from configs.model_config import LLM_MODEL, TEMPERATURE

+## 使用和风天气API查询天气
+KEY = ""

 def get_city_info(location, adm, key):
    base_url = 'https://geoapi.qweather.com/v2/city/lookup?'
@ -109,11 +113,11 @@ def split_query(query):

 def weather(query):
    location, adm, time = split_query(query)
+    key = KEY
    if time != "None" and int(time) > 24:
        return "只能查看24小时内的天气，无法回答"
    if time == "None":
        time = "24"  # 免费的版本只能24小时内的天气
-    key = "315625cdca234137944d7f8956106a3e"  # 和风天气API Key
    if key == "":
        return "请先在代码中填入和风天气API Key"
    city_info = get_city_info(location=location, adm=adm, key=key)
@ -272,7 +276,7 @@ _PROMPT_TEMPLATE = """用户将会向您咨询天气问题，您不需要自己
 ${{拆分的区，市和时间}}
 ```

-... weather(query)...
+... weather(提取后的关键字，用空格隔开)...
 ```output

 ${{提取后的答案}}
@ -283,7 +287,6 @@ ${{提取后的答案}}
 问题: 上海浦东未来1小时天气情况？

 ```text
-
 浦东 上海 1
 ```
 ...weather(浦东 上海 1)...
@ -353,3 +356,10 @@ def weathercheck(query: str):
    ans = llm_weather.run(query)
    return ans

+if __name__ == '__main__':
+
+    ## 检测api是否能正确返回
+    query = "上海浦东未来1小时天气情况"
+    # ans = weathercheck(query)
+    ans = weather("浦东 上海 1")
+    print(ans)
--- a/server/chat/agent_chat.py
+++ b/server/chat/agent_chat.py
@ -2,18 +2,19 @@ from langchain.memory import ConversationBufferWindowMemory
 from server.agent.tools import tools, tool_names
 from server.agent.callbacks import CustomAsyncIteratorCallbackHandler, Status, dumps
 from langchain.agents import AgentExecutor, LLMSingleActionAgent
-from server.agent.custom_template import CustomOutputParser, prompt
+from server.agent.custom_template import CustomOutputParser, CustomPromptTemplate
 from fastapi import Body
 from fastapi.responses import StreamingResponse
 from configs.model_config import LLM_MODEL, TEMPERATURE, HISTORY_LEN
-from server.utils import wrap_done, get_ChatOpenAI
+from server.utils import wrap_done, get_ChatOpenAI, get_prompt_template
 from langchain.chains import LLMChain
-from typing import AsyncIterable
+from typing import AsyncIterable, Optional
 import asyncio
-from langchain.prompts.chat import ChatPromptTemplate
 from typing import List
 from server.chat.utils import History
 import json
+
+
 async def agent_chat(query: str = Body(..., description="用户输入", examples=["恼羞成怒"]),
                     history: List[History] = Body([],
                                                   description="历史对话",
@ -24,26 +25,40 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
                     stream: bool = Body(False, description="流式输出"),
                     model_name: str = Body(LLM_MODEL, description="LLM 模型名称。"),
                     temperature: float = Body(TEMPERATURE, description="LLM 采样温度", ge=0.0, le=1.0),
+                     prompt_name: str = Body("agent_chat",
+                                             description="使用的prompt模板名称(在configs/prompt_config.py中配置)"),
                     # top_p: float = Body(TOP_P, description="LLM 核采样。勿与temperature同时设置", gt=0.0, lt=1.0),
                     ):
    history = [History.from_data(h) for h in history]

-    async def chat_iterator() -> AsyncIterable[str]:
+    async def agent_chat_iterator(
+            query: str,
+            history: Optional[List[History]],
+            model_name: str = LLM_MODEL,
+            prompt_name: str = prompt_name,
+    ) -> AsyncIterable[str]:
        callback = CustomAsyncIteratorCallbackHandler()
        model = get_ChatOpenAI(
            model_name=model_name,
            temperature=temperature,
        )
+
+        prompt_template = CustomPromptTemplate(
+            template=get_prompt_template(prompt_name),
+            tools=tools,
+            input_variables=["input", "intermediate_steps", "history"]
+        )
        output_parser = CustomOutputParser()
-        llm_chain = LLMChain(llm=model, prompt=prompt)
+        llm_chain = LLMChain(llm=model, prompt=prompt_template)
        agent = LLMSingleActionAgent(
            llm_chain=llm_chain,
            output_parser=output_parser,
-            stop=["\nObservation:"],
+            stop=["Observation:", "Observation:\n", "<|im_end|>"], # Qwen模型中使用这个
+            # stop=["Observation:", "Observation:\n"], # 其他模型，注意模板
            allowed_tools=tool_names,
        )
        # 把history转成agent的memory
-        memory = ConversationBufferWindowMemory(k=100)
+        memory = ConversationBufferWindowMemory(k=HISTORY_LEN * 2)

        for message in history:
            # 检查消息的角色
@ -53,16 +68,12 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
            else:
                # 添加AI消息
                memory.chat_memory.add_ai_message(message.content)
-
        agent_executor = AgentExecutor.from_agent_and_tools(agent=agent,
                                                            tools=tools,
                                                            verbose=True,
                                                            memory=memory,
                                                            )
-        # TODO: history is not used
        input_msg = History(role="user", content="{{ input }}").to_msg_template(False)
-        chat_prompt = ChatPromptTemplate.from_messages(
-            [i.to_msg_template() for i in history] + [input_msg])
        task = asyncio.create_task(wrap_done(
            agent_executor.acall(query, callbacks=[callback], include_run_info=True),
            callback.done),
@ -72,6 +83,10 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
                tools_use = []
                # Use server-sent-events to stream the response
                data = json.loads(chunk)
+                if data["status"] == Status.error:
+                    tools_use.append("工具调用失败:\n" + data["error"])
+                    yield json.dumps({"tools": tools_use}, ensure_ascii=False)
+                    yield json.dumps({"answer": "(工具调用失败，请查看工具栏报错) \n\n"}, ensure_ascii=False)
                if data["status"] == Status.start or data["status"] == Status.complete:
                    continue
                if data["status"] == Status.agent_action:
@ -85,7 +100,7 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples

        else:
            pass
-            # agent必须要steram=True
+            # agent必须要steram=True,这部分暂时没有完成
            # result = []
            # async for chunk in callback.aiter():
            #     data = json.loads(chunk)
@ -104,5 +119,8 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples

        await task

-    return StreamingResponse(chat_iterator(),
+    return StreamingResponse(agent_chat_iterator(query=query,
+                                                 history=history,
+                                                 model_name=model_name,
+                                                 prompt_name=prompt_name),
                             media_type="text/event-stream")
--- a/startup.py
+++ b/startup.py
@ -58,7 +58,7 @@ def create_controller_app(


 def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
-    """ 
+    """
    kwargs包含的字段如下：
    host:
    port:
@ -66,7 +66,8 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
    controller_address:
    worker_address:

-    对于online_api: 
+
+    对于online_api:
        online_api:True
        worker_class: `provider`
    对于离线模型：
@ -77,7 +78,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
    fastchat.constants.LOGDIR = LOG_PATH
    from fastchat.serve.model_worker import worker_id, logger
    import argparse
-    import fastchat.serve.model_worker
    logger.setLevel(log_level)

    parser = argparse.ArgumentParser()
@ -101,7 +101,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
            from fastchat.serve.vllm_worker import VLLMWorker,app
            from vllm import AsyncLLMEngine
            from vllm.engine.arg_utils import AsyncEngineArgs,EngineArgs
-            
            args.tokenizer = args.model_path # 如果tokenizer与model_path不一致在此处添加
            args.tokenizer_mode = 'auto'
            args.trust_remote_code= True
@ -121,7 +120,7 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
            args.conv_template = None
            args.limit_worker_concurrency = 5
            args.no_register = False
-            args.num_gpus = 1
+            args.num_gpus = 1 # vllm worker的切分是tensor并行，这里填写显卡的数量
            args.engine_use_ray = False
            args.disable_log_requests = False
            if args.model_path:
@ -148,11 +147,13 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
                        )
            sys.modules["fastchat.serve.vllm_worker"].engine = engine
            sys.modules["fastchat.serve.vllm_worker"].worker = worker
-            
+
        else:
            from fastchat.serve.model_worker import app, GptqConfig, AWQConfig, ModelWorker
-            args.gpus = "1"
+            args.gpus = "0" # GPU的编号,如果有多个GPU，可以设置为"0,1,2,3"
            args.max_gpu_memory = "20GiB"
+            args.num_gpus = 1  # model worker的切分是model并行，这里填写显卡的数量
+
            args.load_8bit = False
            args.cpu_offloading = None
            args.gptq_ckpt = None
@ -162,7 +163,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
            args.awq_ckpt = None
            args.awq_wbits = 16
            args.awq_groupsize = -1
-            args.num_gpus = 1
            args.model_names = []
            args.conv_template = None
            args.limit_worker_concurrency = 5