* 增加了仅限GPT4的agent功能,陆续补充,中文版readme已写

* issue提到的一个bug

* 温度最小改成0,但是不应该支持负数

* 修改了最小的温度

* 增加了部分Agent支持和修改了启动文件的部分bug

* 修改了GPU数量配置文件

* 1

1

* 修复配置文件错误

* 更新readme,稳定测试
This commit is contained in:
zR 2023-09-28 20:19:26 +08:00 committed by GitHub
parent 8fa99026c8
commit efd8edda16
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 229 additions and 144 deletions

View File

@ -57,6 +57,25 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
---
## 环境最低要求
想顺利运行本代码,请按照以下的最低要求进行配置:
+ Python版本: >= 3.8.5, < 3.11
+ Cuda版本: >= 11.7, 且能顺利安装Python
如果想要顺利在GPU运行本地模型(int4版本),你至少需要以下的硬件配置:
+ chatglm2-6b & LLaMA-7B 最低显存要求: 7GB 推荐显卡: RTX 3060, RTX 2060
+ LLaMA-13B 最低显存要求: 11GB 推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
+ Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090
+ LLaMA-30B 最低显存要求: 22GB 推荐显卡RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
+ LLaMA-65B 最低显存要求: 22GB 推荐显卡A100,A40,A6000
如果是int8 则显存x1.5 fp16 x2.5的要求
使用fp16 推理Qwen-7B-Chat 模型 则需要使用16GB显存。
以上仅为估算实际情况以nvidia-smi占用为准。
## 变更日志
参见 [版本更新日志](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)。
@ -112,7 +131,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [Qwen/Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat) and others
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
@ -159,9 +178,11 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
- [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
- [nghuyong/ernie-3.0-nano-zh](https://huggingface.co/nghuyong/ernie-3.0-nano-zh)
- [nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-large-zh)
- [OpenAI/text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings)
项目中默认使用的 Embedding 类型为 `moka-ai/m3e-base`,如需使用其他 Embedding 类型,请在 [configs/model_config.py] 中对 `embedding_model_dict``EMBEDDING_MODEL` 进行修改。
项目中默认使用的 Embedding 类型为 `sensenova/piccolo-base-zh`,如需使用其他 Embedding 类型,请在 [configs/model_config.py] 中对 `embedding_model_dict``EMBEDDING_MODEL` 进行修改。
---
@ -199,17 +220,17 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
### 构建自己的Agent工具
详见 (docs/自定义Agent.md)
详见 [自定义Agent说明](docs/自定义Agent.md)
## Docker 部署
🐳 Docker 镜像地址: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3)`
🐳 Docker 镜像地址: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)`
```shell
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
```
- 该版本镜像大小 `35.3GB`,使用 `v0.2.3`,以 `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` 为基础镜像
- 该版本镜像大小 `35.3GB`,使用 `v0.2.5`,以 `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` 为基础镜像
- 该版本内置两个 `embedding` 模型:`m3e-large``text2vec-bge-large-chinese`,默认启用后者,内置 `chatglm2-6b-32k`
- 该版本目标为方便一键部署使用请确保您已经在Linux发行版上安装了NVIDIA驱动程序
- 请注意您不需要在主机系统上安装CUDA工具包但需要安装 `NVIDIA Driver` 以及 `NVIDIA Container Toolkit`,请参考[安装指南](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

View File

@ -56,6 +56,25 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
---
## Environment Minimum Requirements
To run this code smoothly, please configure it according to the following minimum requirements:
+ Python version: >= 3.8.5, < 3.11
+ Cuda version: >= 11.7, with Python installed.
If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration.
+ chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060
+ LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
+ Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090
+ LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
+ Minimum memory requirement for LLaMA-65B: 22GB Recommended cards: A100,A40,A6000
If int8 then memory x1.5 fp16 x2.5 requirement.
For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory.
The above is only an estimate, the actual situation is based on nvidia-smi occupancy.
## Change Log
plese refer to [version change log](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)
@ -105,7 +124,7 @@ The project use [FastChat](https://github.com/lm-sys/FastChat) to provide the AP
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [Qwen/Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat) and other models of FlagAlpha
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
@ -117,7 +136,19 @@ The project use [FastChat](https://github.com/lm-sys/FastChat) to provide the AP
* Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)(任何 [EleutherAI](https://huggingface.co/EleutherAI) 的 pythia 模型,如 [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b))
* Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a model above. To activate, must have `peft` in the model path. Note: If loading multiple peft models, you can have them share the base model weights by setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model worker.
Please refer to `llm_model_dict` in `configs.model_configs.py.example` to invoke OpenAI API.
The above model support list may be updated continuously as [FastChat](https://github.com/lm-sys/FastChat) is updated, see [FastChat Supported Models List](https://github.com/lm-sys/FastChat/blob/main /docs/model_support.md).
In addition to local models, this project also supports direct access to online models such as OpenAI API, Wisdom Spectrum AI, etc. For specific settings, please refer to the configuration information of `llm_model_dict` in `configs/model_configs.py.example`.
Online LLM models are currently supported:
- [ChatGPT](https://api.openai.com)
- [Smart Spectrum AI](http://open.bigmodel.cn)
- [MiniMax](https://api.minimax.chat)
- [Xunfei Starfire](https://xinghuo.xfyun.cn)
- [Baidu Qianfan](https://cloud.baidu.com/product/wenxinworkshop?track=dingbutonglan)
- [Aliyun Tongyi Qianqian](https://dashscope.aliyun.com/)
The default LLM type used in the project is `THUDM/chatglm2-6b`, if you need to use other LLM types, please modify `llm_model_dict` and `LLM_MODEL` in [configs/model_config.py].
### Supported Embedding models
@ -130,6 +161,8 @@ Following models are tested by developers with Embedding class of [HuggingFace](
- [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh)
- [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh)
- [BAAI/bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct)
- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
- [sensenova/piccolo-large-zh](https://huggingface.co/sensenova/piccolo-large-zh)
- [shibing624/text2vec-base-chinese-sentence](https://huggingface.co/shibing624/text2vec-base-chinese-sentence)
- [shibing624/text2vec-base-chinese-paraphrase](https://huggingface.co/shibing624/text2vec-base-chinese-paraphrase)
- [shibing624/text2vec-base-multilingual](https://huggingface.co/shibing624/text2vec-base-multilingual)
@ -138,16 +171,24 @@ Following models are tested by developers with Embedding class of [HuggingFace](
- [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
- [nghuyong/ernie-3.0-nano-zh](https://huggingface.co/nghuyong/ernie-3.0-nano-zh)
- [nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-base-zh)
- [sensenova/piccolo-base-zh](https://huggingface.co/sensenova/piccolo-large-zh)
- [OpenAI/text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings)
The default Embedding type used in the project is `sensenova/piccolo-base-zh`, if you want to use other Embedding types, please modify `embedding_model_dict` and `embedding_model_dict` and `embedding_model_dict` in [configs/model_config.py]. MODEL` in [configs/model_config.py].
### Build your own Agent tool!
See [Custom Agent Instructions](docs/自定义Agent.md) for details.
---
## Docker Deployment
🐳 Docker image path: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0)`
🐳 Docker image path: `registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)`
```shell
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
```
- The image size of this version is `33.9GB`, using `v0.2.0`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image

View File

@ -83,12 +83,14 @@ MODEL_PATH = {
"opt-iml-max-30b":"facebook/opt-iml-max-30b",
"Qwen-7B":"Qwen/Qwen-7B",
"Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
"Qwen-14B":"Qwen/Qwen-14B",
"Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
"Qwen-14B-Chat":"Qwen/Qwen-14B-Chat",
},
}
# 选用的 Embedding 名称
EMBEDDING_MODEL = "m3e-base"
EMBEDDING_MODEL = "piccolo-large-zh" # 最新的嵌入式sota模型
# Embedding 模型运行设备。设为"auto"会自动检测,也可手动设定为"cuda","mps","cpu"其中之一。
EMBEDDING_DEVICE = "auto"
@ -221,6 +223,8 @@ VLLM_MODEL_DICT = {
"opt-iml-max-30b":"facebook/opt-iml-max-30b",
"Qwen-7B":"Qwen/Qwen-7B",
"Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
"Qwen-14B":"Qwen/Qwen-14B",
"Qwen-7B-Chat":"Qwen/Qwen-7B-Chat",
"Qwen-14B-Chat":"Qwen/Qwen-14B-Chat",
}

View File

@ -1,5 +1,5 @@
import sys
from configs.model_config import LLM_DEVICE
import httpx
# httpx 请求默认超时时间(秒)。如果加载模型或对话较慢,出现超时错误,可以适当加大该值。
HTTPX_DEFAULT_TIMEOUT = 300.0

View File

@ -1,45 +1,50 @@
## 自定义属于自己的Agent
### 1. 创建自己的Agent的py文件
开发者在```server/agent```文件中创建一个自己的文件,并将其添加到```tools.py```中。
### 1. 创建自己的Agent工具
+ 开发者在```server/agent```文件中创建一个自己的文件,并将其添加到```tools.py```中。这样就完成了Tools的设定。
例如,您创建了一个```custom_agent.py```文件,其中包含一个```work```函数,那么您需要在```tools.py```中添加如下代码:
+ 当您创建了一个```custom_agent.py```文件,其中包含一个```work```函数,那么您需要在```tools.py```中添加如下代码:
```python
from custom_agent import work
Tool.from_function(
func=work,
name="该函数的名字",
description=""
func=work,
name="该函数的名字",
description=""
)
```
+ 请注意如果你确定在某一个工程中不会使用到某个工具可以将其从Tools中移除降低模型分类错误导致使用错误工具的风险。
### 2. 修改 custom_template.py文件
开发者需要根据自己选择的大模型设定适合该模型的Agent Prompt和自自定义返回格式。
在我们的代码中提供了默认的两种方式一种是适配于GPT的提示词
在我们的代码中提供了默认的两种方式一种是适配于GPT和Qwen的提示词:
```python
template = """Answer the following questions as best you can You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Previous conversation history:
{history}
New question: {input}
{agent_scratchpad}"""
"""
Answer the following questions as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
history:
{history}
Question: {input}
Thought: {agent_scratchpad}
"""
```
另一种是适配于GLM-130B的提示词
```python
template = """
"""
尽可能地回答以下问题。你可以使用以下工具:{tools}
请按照以下格式进行:
Question: 需要你回答的输入问题
@ -47,7 +52,7 @@ Thought: 你应该总是思考该做什么
Action: 需要使用的工具,应该是[{tool_names}]中的一个
Action Input: 传入工具的内容
Observation: 行动的结果
... (这个Thought/Action/Action Input/Observation可以重复N次)
... (这个Thought/Action/Action Input/Observation可以重复N次)
Thought: 我现在知道最后的答案
Final Answer: 对原始输入问题的最终答案
@ -57,7 +62,8 @@ Final Answer: 对原始输入问题的最终答案
{history}
New question: {input}
Thought: {agent_scratchpad}"""
Thought: {agent_scratchpad}
"""
```
### 3. 局限性
@ -70,4 +76,5 @@ Thought: {agent_scratchpad}"""
我们为开发者编写了三个运用大模型执行的Agent分别是
1. 翻译工具,实现对输入的任意语言翻译。
2. 数学工具使用LLMMathChain 实现数学计算。
3. 天气工具使用自定义的LLMWetherChain实现天气查询调用和风天气API。
3. 天气工具使用自定义的LLMWetherChain实现天气查询调用和风天气API。
4. 我们支持Langchain支持的Agent工具在代码中我们已经提供了Shell和Google Search两个工具的实现。

View File

@ -56,7 +56,6 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):
async def on_tool_error(self, error: Exception | KeyboardInterrupt, *, run_id: UUID,
parent_run_id: UUID | None = None, tags: List[str] | None = None, **kwargs: Any) -> None:
self.out = True
self.cur_tool.update(
status=Status.error,
error=str(error),
@ -65,19 +64,19 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
if token:
if token == "Action":
if "Action" in token:
self.out = False
self.cur_tool.update(
status=Status.running,
llm_token="\n\n",
)
self.queue.put_nowait(dumps(self.cur_tool))
if self.out:
self.cur_tool.update(
status=Status.running,
llm_token=token,
)
self.queue.put_nowait(dumps(self.cur_tool))
self.queue.put_nowait(dumps(self.cur_tool))
async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
self.cur_tool.update(
@ -95,6 +94,7 @@ class CustomAsyncIteratorCallbackHandler(AsyncIteratorCallbackHandler):
self.queue.put_nowait(dumps(self.cur_tool))
async def on_llm_error(self, error: Exception | KeyboardInterrupt, **kwargs: Any) -> None:
self.out = True
self.cur_tool.update(
status=Status.error,
error=str(error),

View File

@ -1,57 +1,9 @@
template = """
尽可能地回答以下问题你可以使用以下工具:{tools}
请按照以下格式进行:
Question: 需要你回答的输入问题
Thought: 你应该总是思考该做什么并告诉我你要用什么工具
Action: 需要使用的工具应该是[{tool_names}]中的一个
Action Input: 传入工具的内容
Observation: 行动的结果
... (这个Thought/Action/Action Input/Observation可以重复N次)
Thought: 通过使用工具我是否知道了答案如果知道就自然的回答问题如果不知道继续使用工具或者自己的知识 \n
Final Answer: 这个问题的答案是输出完整的句子
现在开始
之前的对话:
{history}
New question:
{input}
Thought:
{agent_scratchpad}"""
# ChatGPT 提示词模板
# template = """Answer the following questions as best you can You have access to the following tools:
# {tools}
# Use the following format:
#
# Question: the input question you must answer
# Thought: you should always think about what to do
# Action: the action to take, should be one of [{tool_names}]
# Action Input: the input to the action
# Observation: the result of the action
# ... (this Thought/Action/Action Input/Observation can repeat N times)
# Thought: I now know the final answer
# Final Answer: the final answer to the original input question
#
# Begin!
#
# Previous conversation history:
# {history}
#
# New question: {input}
# {agent_scratchpad}"""
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.agents import Tool, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain.llms import OpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.chains import LLMChain
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish, OutputParserException
from server.agent.tools import tools
from langchain.schema import AgentAction, AgentFinish
import re
class CustomPromptTemplate(StringPromptTemplate):
# The template to use
template: str
@ -73,15 +25,9 @@ class CustomPromptTemplate(StringPromptTemplate):
# Create a list of tool names for the tools provided
kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
return self.template.format(**kwargs)
prompt = CustomPromptTemplate(
template=template,
tools=tools,
input_variables=["input", "intermediate_steps", "history"]
)
class CustomOutputParser(AgentOutputParser):
def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
def parse(self, llm_output: str) -> AgentFinish | AgentAction | str:
# Check if agent should finish
if "Final Answer:" in llm_output:
return AgentFinish(
@ -101,10 +47,18 @@ class CustomOutputParser(AgentOutputParser):
action = match.group(1).strip()
action_input = match.group(2)
# Return the action and action input
return AgentAction(
try:
ans = AgentAction(
tool=action,
tool_input=action_input.strip(" ").strip('"'),
log=llm_output
)
)
return ans
except:
return AgentFinish(
return_values={"output": f"调用agent失败: `{llm_output}`"},
log=llm_output,
)

View File

@ -0,0 +1,8 @@
import os
os.environ["GOOGLE_CSE_ID"] = ""
os.environ["GOOGLE_API_KEY"] = ""
from langchain.tools import GoogleSearchResults
def google_search(query: str):
tool = GoogleSearchResults()
return tool.run(tool_input=query)

5
server/agent/shell.py Normal file
View File

@ -0,0 +1,5 @@
from langchain.tools import ShellTool
def shell(query: str):
tool = ShellTool()
return tool.run(tool_input=query)

View File

@ -1,28 +1,40 @@
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from server.agent.math import calculate
from server.agent.translator import translate
from server.agent.weather import weathercheck
from server.agent.shell import shell
from server.agent.google_search import google_search
from langchain.agents import Tool
tools = [
Tool.from_function(
func=calculate,
name="计算器工具",
description=""
description="进行简单的数学运算"
),
Tool.from_function(
func=translate,
name="翻译工具",
description=""
description="翻译各种语言"
),
Tool.from_function(
func=weathercheck,
name="天气查询工具",
description="",
description="查询天气",
),
Tool.from_function(
func=shell,
name="shell工具",
description="使用命令行工具输出",
),
Tool.from_function(
func=google_search,
name="谷歌搜索工具",
description="使用谷歌搜索",
)
]
tool_names = [tool.name for tool in tools]

View File

@ -23,13 +23,18 @@ ${{翻译结果}}
```
答案: ${{答案}}
以下是个例子
以下是个例子
问题: 翻译13成英语
```text
13 English
13 英语
```output
thirteen
答案: thirteen
以下是两个例子
问题: 翻译 我爱你 成法语
```text
13 法语
```output
Je t'aime.
'''
PROMPT = PromptTemplate(

View File

@ -1,12 +1,14 @@
## 使用和风天气API查询天气
from __future__ import annotations
## 单独运行的时候需要添加
import sys
import os
# sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from server.utils import get_ChatOpenAI
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import re
import warnings
@ -25,6 +27,8 @@ import requests
from typing import List, Any, Optional
from configs.model_config import LLM_MODEL, TEMPERATURE
## 使用和风天气API查询天气
KEY = ""
def get_city_info(location, adm, key):
base_url = 'https://geoapi.qweather.com/v2/city/lookup?'
@ -109,11 +113,11 @@ def split_query(query):
def weather(query):
location, adm, time = split_query(query)
key = KEY
if time != "None" and int(time) > 24:
return "只能查看24小时内的天气无法回答"
if time == "None":
time = "24" # 免费的版本只能24小时内的天气
key = "315625cdca234137944d7f8956106a3e" # 和风天气API Key
if key == "":
return "请先在代码中填入和风天气API Key"
city_info = get_city_info(location=location, adm=adm, key=key)
@ -272,7 +276,7 @@ _PROMPT_TEMPLATE = """用户将会向您咨询天气问题,您不需要自己
${{拆分的区市和时间}}
```
... weather(query)...
... weather(提取后的关键字用空格隔开)...
```output
${{提取后的答案}}
@ -283,7 +287,6 @@ ${{提取后的答案}}
问题: 上海浦东未来1小时天气情况
```text
浦东 上海 1
```
...weather(浦东 上海 1)...
@ -353,3 +356,10 @@ def weathercheck(query: str):
ans = llm_weather.run(query)
return ans
if __name__ == '__main__':
## 检测api是否能正确返回
query = "上海浦东未来1小时天气情况"
# ans = weathercheck(query)
ans = weather("浦东 上海 1")
print(ans)

View File

@ -2,18 +2,19 @@ from langchain.memory import ConversationBufferWindowMemory
from server.agent.tools import tools, tool_names
from server.agent.callbacks import CustomAsyncIteratorCallbackHandler, Status, dumps
from langchain.agents import AgentExecutor, LLMSingleActionAgent
from server.agent.custom_template import CustomOutputParser, prompt
from server.agent.custom_template import CustomOutputParser, CustomPromptTemplate
from fastapi import Body
from fastapi.responses import StreamingResponse
from configs.model_config import LLM_MODEL, TEMPERATURE, HISTORY_LEN
from server.utils import wrap_done, get_ChatOpenAI
from server.utils import wrap_done, get_ChatOpenAI, get_prompt_template
from langchain.chains import LLMChain
from typing import AsyncIterable
from typing import AsyncIterable, Optional
import asyncio
from langchain.prompts.chat import ChatPromptTemplate
from typing import List
from server.chat.utils import History
import json
async def agent_chat(query: str = Body(..., description="用户输入", examples=["恼羞成怒"]),
history: List[History] = Body([],
description="历史对话",
@ -24,26 +25,40 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
stream: bool = Body(False, description="流式输出"),
model_name: str = Body(LLM_MODEL, description="LLM 模型名称。"),
temperature: float = Body(TEMPERATURE, description="LLM 采样温度", ge=0.0, le=1.0),
prompt_name: str = Body("agent_chat",
description="使用的prompt模板名称(在configs/prompt_config.py中配置)"),
# top_p: float = Body(TOP_P, description="LLM 核采样。勿与temperature同时设置", gt=0.0, lt=1.0),
):
history = [History.from_data(h) for h in history]
async def chat_iterator() -> AsyncIterable[str]:
async def agent_chat_iterator(
query: str,
history: Optional[List[History]],
model_name: str = LLM_MODEL,
prompt_name: str = prompt_name,
) -> AsyncIterable[str]:
callback = CustomAsyncIteratorCallbackHandler()
model = get_ChatOpenAI(
model_name=model_name,
temperature=temperature,
)
prompt_template = CustomPromptTemplate(
template=get_prompt_template(prompt_name),
tools=tools,
input_variables=["input", "intermediate_steps", "history"]
)
output_parser = CustomOutputParser()
llm_chain = LLMChain(llm=model, prompt=prompt)
llm_chain = LLMChain(llm=model, prompt=prompt_template)
agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
stop=["\nObservation:"],
stop=["Observation:", "Observation:\n", "<|im_end|>"], # Qwen模型中使用这个
# stop=["Observation:", "Observation:\n"], # 其他模型,注意模板
allowed_tools=tool_names,
)
# 把history转成agent的memory
memory = ConversationBufferWindowMemory(k=100)
memory = ConversationBufferWindowMemory(k=HISTORY_LEN * 2)
for message in history:
# 检查消息的角色
@ -53,16 +68,12 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
else:
# 添加AI消息
memory.chat_memory.add_ai_message(message.content)
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent,
tools=tools,
verbose=True,
memory=memory,
)
# TODO: history is not used
input_msg = History(role="user", content="{{ input }}").to_msg_template(False)
chat_prompt = ChatPromptTemplate.from_messages(
[i.to_msg_template() for i in history] + [input_msg])
task = asyncio.create_task(wrap_done(
agent_executor.acall(query, callbacks=[callback], include_run_info=True),
callback.done),
@ -72,6 +83,10 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
tools_use = []
# Use server-sent-events to stream the response
data = json.loads(chunk)
if data["status"] == Status.error:
tools_use.append("工具调用失败:\n" + data["error"])
yield json.dumps({"tools": tools_use}, ensure_ascii=False)
yield json.dumps({"answer": "(工具调用失败,请查看工具栏报错) \n\n"}, ensure_ascii=False)
if data["status"] == Status.start or data["status"] == Status.complete:
continue
if data["status"] == Status.agent_action:
@ -85,7 +100,7 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
else:
pass
# agent必须要steram=True
# agent必须要steram=True,这部分暂时没有完成
# result = []
# async for chunk in callback.aiter():
# data = json.loads(chunk)
@ -104,5 +119,8 @@ async def agent_chat(query: str = Body(..., description="用户输入", examples
await task
return StreamingResponse(chat_iterator(),
return StreamingResponse(agent_chat_iterator(query=query,
history=history,
model_name=model_name,
prompt_name=prompt_name),
media_type="text/event-stream")

View File

@ -58,7 +58,7 @@ def create_controller_app(
def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
"""
"""
kwargs包含的字段如下
host:
port:
@ -66,7 +66,8 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
controller_address:
worker_address:
对于online_api:
对于online_api:
online_api:True
worker_class: `provider`
对于离线模型
@ -77,7 +78,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
fastchat.constants.LOGDIR = LOG_PATH
from fastchat.serve.model_worker import worker_id, logger
import argparse
import fastchat.serve.model_worker
logger.setLevel(log_level)
parser = argparse.ArgumentParser()
@ -101,7 +101,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
from fastchat.serve.vllm_worker import VLLMWorker,app
from vllm import AsyncLLMEngine
from vllm.engine.arg_utils import AsyncEngineArgs,EngineArgs
args.tokenizer = args.model_path # 如果tokenizer与model_path不一致在此处添加
args.tokenizer_mode = 'auto'
args.trust_remote_code= True
@ -121,7 +120,7 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
args.conv_template = None
args.limit_worker_concurrency = 5
args.no_register = False
args.num_gpus = 1
args.num_gpus = 1 # vllm worker的切分是tensor并行这里填写显卡的数量
args.engine_use_ray = False
args.disable_log_requests = False
if args.model_path:
@ -148,11 +147,13 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
)
sys.modules["fastchat.serve.vllm_worker"].engine = engine
sys.modules["fastchat.serve.vllm_worker"].worker = worker
else:
from fastchat.serve.model_worker import app, GptqConfig, AWQConfig, ModelWorker
args.gpus = "1"
args.gpus = "0" # GPU的编号,如果有多个GPU可以设置为"0,1,2,3"
args.max_gpu_memory = "20GiB"
args.num_gpus = 1 # model worker的切分是model并行这里填写显卡的数量
args.load_8bit = False
args.cpu_offloading = None
args.gptq_ckpt = None
@ -162,7 +163,6 @@ def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
args.awq_ckpt = None
args.awq_wbits = 16
args.awq_groupsize = -1
args.num_gpus = 1
args.model_names = []
args.conv_template = None
args.limit_worker_concurrency = 5