Merge branch 'dev'

This commit is contained in:
zR 2023-09-29 16:44:10 +08:00
commit b077085fbe
8 changed files with 201 additions and 13 deletions

View File

@ -45,14 +45,13 @@
🚩 本项目未涉及微调、训练过程,但可利用微调或训练对本项目效果进行优化。 🚩 本项目未涉及微调、训练过程,但可利用微调或训练对本项目效果进行优化。
🌐 [AutoDL 镜像](https://www.codewithgpu.com/i/chatchat-space/Langchain-Chatchat/Langchain-Chatchat) 中 `v8` 版本所使用代码已更新至本项目 `v0.2.4` 版本。 🌐 [AutoDL 镜像](https://www.codewithgpu.com/i/chatchat-space/Langchain-Chatchat/Langchain-Chatchat) 中 `v9` 版本所使用代码已更新至本项目 `v0.2.5` 版本。
🐳 [Docker 镜像](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)
🐳 [Docker 镜像](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3)
💻 一行命令运行 Docker 🌲: 💻 一行命令运行 Docker 🌲:
```shell ```shell
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3 docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
``` ```
--- ---
@ -61,14 +60,15 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
想顺利运行本代码,请按照以下的最低要求进行配置: 想顺利运行本代码,请按照以下的最低要求进行配置:
+ Python版本: >= 3.8.5, < 3.11 + Python版本: >= 3.8.5, < 3.11
+ Cuda版本: >= 11.7, 且能顺利安装Python + Cuda版本: >= 11.7
+ 强烈推荐使用Python3.10部分Agent功能可能没有完全支持Python3.10以下版本。
如果想要顺利在GPU运行本地模型(int4版本),你至少需要以下的硬件配置: 如果想要顺利在GPU运行本地模型(int4版本),你至少需要以下的硬件配置:
+ chatglm2-6b & LLaMA-7B 最低显存要求: 7GB 推荐显卡: RTX 3060, RTX 2060 + chatglm2-6b & LLaMA-7B 最低显存要求: 7GB 推荐显卡: RTX 3060, RTX 2060
+ LLaMA-13B 最低显存要求: 11GB 推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000 + LLaMA-13B 最低显存要求: 11GB 推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
+ Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090 + Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090
+ LLaMA-30B 最低显存要求: 22GB 推荐显卡RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40 + LLaMA-30B 最低显存要求: 22GB 推荐显卡RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
+ LLaMA-65B 最低显存要求: 40GB 推荐显卡A100,A40,A6000 + LLaMA-65B 最低显存要求: 40GB 推荐显卡A100,A40,A6000
如果是int8 则显存x1.5 fp16 x2.5的要求 如果是int8 则显存x1.5 fp16 x2.5的要求
@ -249,7 +249,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
参见 [开发环境准备](docs/INSTALL.md)。 参见 [开发环境准备](docs/INSTALL.md)。
**请注意:** `0.2.3` 及更新版本的依赖包与 `0.1.x` 版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。 **请注意:** `0.2.5` 及更新版本的依赖包与 `0.1.x` 版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。
### 2. 下载模型至本地 ### 2. 下载模型至本地

View File

@ -44,14 +44,14 @@ The main process analysis from the aspect of document process:
🚩 The training or fined-tuning are not involved in the project, but still, one always can improve performance by do these. 🚩 The training or fined-tuning are not involved in the project, but still, one always can improve performance by do these.
🌐 [AutoDL image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0) is supported, and in v7 the codes are update to v0.2.3. 🌐 [AutoDL image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5) is supported, and in v9 the codes are update to v0.2.5.
🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0) 🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)
💻 Run Docker with one command: 💻 Run Docker with one command:
```shell ```shell
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0 docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
``` ```
--- ---
@ -60,16 +60,17 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
To run this code smoothly, please configure it according to the following minimum requirements: To run this code smoothly, please configure it according to the following minimum requirements:
+ Python version: >= 3.8.5, < 3.11 + Python version: >= 3.8.5, < 3.11
+ Cuda version: >= 11.7, with Python installed. + Cuda version: >= 11.7
+ Python 3.10 is highly recommended, some Agent features may not be fully supported below Python 3.10.
If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration. If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration.
+ chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060 + chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060
+ LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000 + LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
+ Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090 + Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090
+ LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40 + LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
+ Minimum memory requirement for LLaMA-65B: 40GB Recommended cards: A100,A40,A6000 + Minimum memory requirement for LLaMA-65B: 40GB Recommended cards: A100,A40,A6000
If int8 then memory x1.5 fp16 x2.5 requirement. If int8 then memory x1.5 fp16 x2.5 requirement.
For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory. For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory.
@ -191,7 +192,7 @@ See [Custom Agent Instructions](docs/自定义Agent.md) for details.
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5 docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
``` ```
- The image size of this version is `33.9GB`, using `v0.2.0`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image - The image size of this version is `33.9GB`, using `v0.2.5`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image
- This version has a built-in `embedding` model: `m3e-large`, built-in `chatglm2-6b-32k` - This version has a built-in `embedding` model: `m3e-large`, built-in `chatglm2-6b-32k`
- This version is designed to facilitate one-click deployment. Please make sure you have installed the NVIDIA driver on your Linux distribution. - This version is designed to facilitate one-click deployment. Please make sure you have installed the NVIDIA driver on your Linux distribution.
- Please note that you do not need to install the CUDA toolkit on the host system, but you need to install the `NVIDIA Driver` and the `NVIDIA Container Toolkit`, please refer to the [Installation Guide](https://docs.nvidia.com/datacenter/cloud -native/container-toolkit/latest/install-guide.html) - Please note that you do not need to install the CUDA toolkit on the host system, but you need to install the `NVIDIA Driver` and the `NVIDIA Container Toolkit`, please refer to the [Installation Guide](https://docs.nvidia.com/datacenter/cloud -native/container-toolkit/latest/install-guide.html)

View File

@ -92,6 +92,7 @@ MODEL_PATH = {
# 选用的 Embedding 名称 # 选用的 Embedding 名称
EMBEDDING_MODEL = "m3e-base" # 可以尝试最新的嵌入式sota模型piccolo-large-zh EMBEDDING_MODEL = "m3e-base" # 可以尝试最新的嵌入式sota模型piccolo-large-zh
# Embedding 模型运行设备。设为"auto"会自动检测,也可手动设定为"cuda","mps","cpu"其中之一。 # Embedding 模型运行设备。设为"auto"会自动检测,也可手动设定为"cuda","mps","cpu"其中之一。
EMBEDDING_DEVICE = "auto" EMBEDDING_DEVICE = "auto"
@ -174,6 +175,14 @@ ONLINE_LLM_MODEL = {
"api_key": "", # 请在阿里云控制台模型服务灵积API-KEY管理页面创建 "api_key": "", # 请在阿里云控制台模型服务灵积API-KEY管理页面创建
"provider": "QwenWorker", "provider": "QwenWorker",
}, },
# 百川 API申请方式请参考 https://www.baichuan-ai.com/home#api-enter
"baichuan-api": {
"version": "Baichuan2-53B", # 当前支持 "Baichuan2-53B" 见官方文档。
"api_key": "",
"secret_key": "",
"provider": "BaiChuanWorker",
},
} }

13
copy_config_example.py Normal file
View File

@ -0,0 +1,13 @@
# 用于批量将configs下的.example文件复制并命名为.py文件
import os
import shutil
files = os.listdir("configs")
src_files = [os.path.join("configs",file) for file in files if ".example" in file]
for src_file in src_files:
tar_file = src_file.replace(".example","")
shutil.copy(src_file,tar_file)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 225 KiB

After

Width:  |  Height:  |  Size: 84 KiB

View File

@ -1,3 +1,4 @@
from __future__ import annotations
from uuid import UUID from uuid import UUID
from langchain.callbacks import AsyncIteratorCallbackHandler from langchain.callbacks import AsyncIteratorCallbackHandler
import json import json

View File

@ -1,3 +1,4 @@
from __future__ import annotations
from langchain.agents import Tool, AgentOutputParser from langchain.agents import Tool, AgentOutputParser
from langchain.prompts import StringPromptTemplate from langchain.prompts import StringPromptTemplate
from typing import List, Union from typing import List, Union

View File

@ -0,0 +1,163 @@
# import os
# import sys
# sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
import requests
import json
import time
import hashlib
from server.model_workers.base import ApiModelWorker
from fastchat import conversation as conv
import sys
import json
from typing import List, Literal
from configs import TEMPERATURE
def calculate_md5(input_string):
md5 = hashlib.md5()
md5.update(input_string.encode('utf-8'))
encrypted = md5.hexdigest()
return encrypted
def do_request():
url = "https://api.baichuan-ai.com/v1/stream/chat"
api_key = ""
secret_key = ""
data = {
"model": "Baichuan2-53B",
"messages": [
{
"role": "user",
"content": "世界第一高峰是"
}
],
"parameters": {
"temperature": 0.1,
"top_k": 10
}
}
json_data = json.dumps(data)
time_stamp = int(time.time())
signature = calculate_md5(secret_key + json_data + str(time_stamp))
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + api_key,
"X-BC-Request-Id": "your requestId",
"X-BC-Timestamp": str(time_stamp),
"X-BC-Signature": signature,
"X-BC-Sign-Algo": "MD5",
}
response = requests.post(url, data=json_data, headers=headers)
if response.status_code == 200:
print("请求成功!")
print("响应header:", response.headers)
print("响应body:", response.text)
else:
print("请求失败,状态码:", response.status_code)
class BaiChuanWorker(ApiModelWorker):
BASE_URL = "https://api.baichuan-ai.com/v1/chat"
SUPPORT_MODELS = ["Baichuan2-53B"]
def __init__(
self,
*,
controller_addr: str,
worker_addr: str,
model_names: List[str] = ["baichuan-api"],
version: Literal["Baichuan2-53B"] = "Baichuan2-53B",
**kwargs,
):
kwargs.update(model_names=model_names, controller_addr=controller_addr, worker_addr=worker_addr)
kwargs.setdefault("context_len", 32768)
super().__init__(**kwargs)
# TODO: 确认模板是否需要修改
self.conv = conv.Conversation(
name=self.model_names[0],
system_message="",
messages=[],
roles=["user", "assistant"],
sep="\n### ",
stop_str="###",
)
config = self.get_config()
self.version = config.get("version",version)
self.api_key = config.get("api_key")
self.secret_key = config.get("secret_key")
def generate_stream_gate(self, params):
data = {
"model": self.version,
"messages": [
{
"role": "user",
"content": params["prompt"]
}
],
"parameters": {
"temperature": params.get("temperature",TEMPERATURE),
"top_k": params.get("top_k",1)
}
}
json_data = json.dumps(data)
time_stamp = int(time.time())
signature = calculate_md5(self.secret_key + json_data + str(time_stamp))
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + self.api_key,
"X-BC-Request-Id": "your requestId",
"X-BC-Timestamp": str(time_stamp),
"X-BC-Signature": signature,
"X-BC-Sign-Algo": "MD5",
}
response = requests.post(self.BASE_URL, data=json_data, headers=headers)
if response.status_code == 200:
resp = eval(response.text)
yield json.dumps(
{
"error_code": resp["code"],
"text": resp["data"]["messages"][-1]["content"]
},
ensure_ascii=False
).encode() + b"\0"
else:
yield json.dumps(
{
"error_code": resp["code"],
"text": resp["msg"]
},
ensure_ascii=False
).encode() + b"\0"
def get_embeddings(self, params):
# TODO: 支持embeddings
print("embedding")
print(params)
if __name__ == "__main__":
import uvicorn
from server.utils import MakeFastAPIOffline
from fastchat.serve.model_worker import app
worker = BaiChuanWorker(
controller_addr="http://127.0.0.1:20001",
worker_addr="http://127.0.0.1:21001",
)
sys.modules["fastchat.serve.model_worker"].worker = worker
MakeFastAPIOffline(app)
uvicorn.run(app, port=21001)
# do_request()