Merge branch 'dev'
This commit is contained in:
commit
b077085fbe
14
README.md
14
README.md
|
|
@ -45,14 +45,13 @@
|
||||||
|
|
||||||
🚩 本项目未涉及微调、训练过程,但可利用微调或训练对本项目效果进行优化。
|
🚩 本项目未涉及微调、训练过程,但可利用微调或训练对本项目效果进行优化。
|
||||||
|
|
||||||
🌐 [AutoDL 镜像](https://www.codewithgpu.com/i/chatchat-space/Langchain-Chatchat/Langchain-Chatchat) 中 `v8` 版本所使用代码已更新至本项目 `v0.2.4` 版本。
|
🌐 [AutoDL 镜像](https://www.codewithgpu.com/i/chatchat-space/Langchain-Chatchat/Langchain-Chatchat) 中 `v9` 版本所使用代码已更新至本项目 `v0.2.5` 版本。
|
||||||
|
🐳 [Docker 镜像](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)
|
||||||
🐳 [Docker 镜像](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3)
|
|
||||||
|
|
||||||
💻 一行命令运行 Docker 🌲:
|
💻 一行命令运行 Docker 🌲:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.3
|
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
@ -61,14 +60,15 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
||||||
|
|
||||||
想顺利运行本代码,请按照以下的最低要求进行配置:
|
想顺利运行本代码,请按照以下的最低要求进行配置:
|
||||||
+ Python版本: >= 3.8.5, < 3.11
|
+ Python版本: >= 3.8.5, < 3.11
|
||||||
+ Cuda版本: >= 11.7, 且能顺利安装Python
|
+ Cuda版本: >= 11.7
|
||||||
|
+ 强烈推荐使用Python3.10,部分Agent功能可能没有完全支持Python3.10以下版本。
|
||||||
|
|
||||||
如果想要顺利在GPU运行本地模型(int4版本),你至少需要以下的硬件配置:
|
如果想要顺利在GPU运行本地模型(int4版本),你至少需要以下的硬件配置:
|
||||||
|
|
||||||
+ chatglm2-6b & LLaMA-7B 最低显存要求: 7GB 推荐显卡: RTX 3060, RTX 2060
|
+ chatglm2-6b & LLaMA-7B 最低显存要求: 7GB 推荐显卡: RTX 3060, RTX 2060
|
||||||
+ LLaMA-13B 最低显存要求: 11GB 推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
|
+ LLaMA-13B 最低显存要求: 11GB 推荐显卡: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
|
||||||
+ Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090
|
+ Qwen-14B-Chat 最低显存要求: 13GB 推荐显卡: RTX 3090
|
||||||
+ LLaMA-30B 最低显存要求: 22GB 推荐显卡:RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
|
+ LLaMA-30B 最低显存要求: 22GB 推荐显卡:RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
|
||||||
+ LLaMA-65B 最低显存要求: 40GB 推荐显卡:A100,A40,A6000
|
+ LLaMA-65B 最低显存要求: 40GB 推荐显卡:A100,A40,A6000
|
||||||
|
|
||||||
如果是int8 则显存x1.5 fp16 x2.5的要求
|
如果是int8 则显存x1.5 fp16 x2.5的要求
|
||||||
|
|
@ -249,7 +249,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
||||||
|
|
||||||
参见 [开发环境准备](docs/INSTALL.md)。
|
参见 [开发环境准备](docs/INSTALL.md)。
|
||||||
|
|
||||||
**请注意:** `0.2.3` 及更新版本的依赖包与 `0.1.x` 版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。
|
**请注意:** `0.2.5` 及更新版本的依赖包与 `0.1.x` 版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。
|
||||||
|
|
||||||
### 2. 下载模型至本地
|
### 2. 下载模型至本地
|
||||||
|
|
||||||
|
|
|
||||||
13
README_en.md
13
README_en.md
|
|
@ -44,14 +44,14 @@ The main process analysis from the aspect of document process:
|
||||||
|
|
||||||
🚩 The training or fined-tuning are not involved in the project, but still, one always can improve performance by do these.
|
🚩 The training or fined-tuning are not involved in the project, but still, one always can improve performance by do these.
|
||||||
|
|
||||||
🌐 [AutoDL image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0) is supported, and in v7 the codes are update to v0.2.3.
|
🌐 [AutoDL image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5) is supported, and in v9 the codes are update to v0.2.5.
|
||||||
|
|
||||||
🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0)
|
🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5)
|
||||||
|
|
||||||
💻 Run Docker with one command:
|
💻 Run Docker with one command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0
|
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
@ -60,16 +60,17 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
||||||
|
|
||||||
To run this code smoothly, please configure it according to the following minimum requirements:
|
To run this code smoothly, please configure it according to the following minimum requirements:
|
||||||
+ Python version: >= 3.8.5, < 3.11
|
+ Python version: >= 3.8.5, < 3.11
|
||||||
+ Cuda version: >= 11.7, with Python installed.
|
+ Cuda version: >= 11.7
|
||||||
|
+ Python 3.10 is highly recommended, some Agent features may not be fully supported below Python 3.10.
|
||||||
|
|
||||||
If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration.
|
If you want to run the native model (int4 version) on the GPU without problems, you need at least the following hardware configuration.
|
||||||
|
|
||||||
+ chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060
|
+ chatglm2-6b & LLaMA-7B Minimum RAM requirement: 7GB Recommended graphics cards: RTX 3060, RTX 2060
|
||||||
+ LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
|
+ LLaMA-13B Minimum graphics memory requirement: 11GB Recommended cards: RTX 2060 12GB, RTX3060 12GB, RTX3080, RTXA2000
|
||||||
+ Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090
|
+ Qwen-14B-Chat Minimum memory requirement: 13GB Recommended graphics card: RTX 3090
|
||||||
|
|
||||||
+ LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
|
+ LLaMA-30B Minimum Memory Requirement: 22GB Recommended Cards: RTX A5000,RTX 3090,RTX 4090,RTX 6000,Tesla V100,RTX Tesla P40
|
||||||
+ Minimum memory requirement for LLaMA-65B: 40GB Recommended cards: A100,A40,A6000
|
+ Minimum memory requirement for LLaMA-65B: 40GB Recommended cards: A100,A40,A6000
|
||||||
|
|
||||||
If int8 then memory x1.5 fp16 x2.5 requirement.
|
If int8 then memory x1.5 fp16 x2.5 requirement.
|
||||||
For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory.
|
For example: using fp16 to reason about the Qwen-7B-Chat model requires 16GB of video memory.
|
||||||
|
|
||||||
|
|
@ -191,7 +192,7 @@ See [Custom Agent Instructions](docs/自定义Agent.md) for details.
|
||||||
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
|
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.5
|
||||||
```
|
```
|
||||||
|
|
||||||
- The image size of this version is `33.9GB`, using `v0.2.0`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image
|
- The image size of this version is `33.9GB`, using `v0.2.5`, with `nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04` as the base image
|
||||||
- This version has a built-in `embedding` model: `m3e-large`, built-in `chatglm2-6b-32k`
|
- This version has a built-in `embedding` model: `m3e-large`, built-in `chatglm2-6b-32k`
|
||||||
- This version is designed to facilitate one-click deployment. Please make sure you have installed the NVIDIA driver on your Linux distribution.
|
- This version is designed to facilitate one-click deployment. Please make sure you have installed the NVIDIA driver on your Linux distribution.
|
||||||
- Please note that you do not need to install the CUDA toolkit on the host system, but you need to install the `NVIDIA Driver` and the `NVIDIA Container Toolkit`, please refer to the [Installation Guide](https://docs.nvidia.com/datacenter/cloud -native/container-toolkit/latest/install-guide.html)
|
- Please note that you do not need to install the CUDA toolkit on the host system, but you need to install the `NVIDIA Driver` and the `NVIDIA Container Toolkit`, please refer to the [Installation Guide](https://docs.nvidia.com/datacenter/cloud -native/container-toolkit/latest/install-guide.html)
|
||||||
|
|
|
||||||
|
|
@ -92,6 +92,7 @@ MODEL_PATH = {
|
||||||
# 选用的 Embedding 名称
|
# 选用的 Embedding 名称
|
||||||
EMBEDDING_MODEL = "m3e-base" # 可以尝试最新的嵌入式sota模型:piccolo-large-zh
|
EMBEDDING_MODEL = "m3e-base" # 可以尝试最新的嵌入式sota模型:piccolo-large-zh
|
||||||
|
|
||||||
|
|
||||||
# Embedding 模型运行设备。设为"auto"会自动检测,也可手动设定为"cuda","mps","cpu"其中之一。
|
# Embedding 模型运行设备。设为"auto"会自动检测,也可手动设定为"cuda","mps","cpu"其中之一。
|
||||||
EMBEDDING_DEVICE = "auto"
|
EMBEDDING_DEVICE = "auto"
|
||||||
|
|
||||||
|
|
@ -174,6 +175,14 @@ ONLINE_LLM_MODEL = {
|
||||||
"api_key": "", # 请在阿里云控制台模型服务灵积API-KEY管理页面创建
|
"api_key": "", # 请在阿里云控制台模型服务灵积API-KEY管理页面创建
|
||||||
"provider": "QwenWorker",
|
"provider": "QwenWorker",
|
||||||
},
|
},
|
||||||
|
|
||||||
|
# 百川 API,申请方式请参考 https://www.baichuan-ai.com/home#api-enter
|
||||||
|
"baichuan-api": {
|
||||||
|
"version": "Baichuan2-53B", # 当前支持 "Baichuan2-53B", 见官方文档。
|
||||||
|
"api_key": "",
|
||||||
|
"secret_key": "",
|
||||||
|
"provider": "BaiChuanWorker",
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
# 用于批量将configs下的.example文件复制并命名为.py文件
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
files = os.listdir("configs")
|
||||||
|
|
||||||
|
src_files = [os.path.join("configs",file) for file in files if ".example" in file]
|
||||||
|
|
||||||
|
for src_file in src_files:
|
||||||
|
tar_file = src_file.replace(".example","")
|
||||||
|
shutil.copy(src_file,tar_file)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 225 KiB After Width: | Height: | Size: 84 KiB |
|
|
@ -1,3 +1,4 @@
|
||||||
|
from __future__ import annotations
|
||||||
from uuid import UUID
|
from uuid import UUID
|
||||||
from langchain.callbacks import AsyncIteratorCallbackHandler
|
from langchain.callbacks import AsyncIteratorCallbackHandler
|
||||||
import json
|
import json
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,4 @@
|
||||||
|
from __future__ import annotations
|
||||||
from langchain.agents import Tool, AgentOutputParser
|
from langchain.agents import Tool, AgentOutputParser
|
||||||
from langchain.prompts import StringPromptTemplate
|
from langchain.prompts import StringPromptTemplate
|
||||||
from typing import List, Union
|
from typing import List, Union
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,163 @@
|
||||||
|
# import os
|
||||||
|
# import sys
|
||||||
|
# sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
|
||||||
|
import requests
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
import hashlib
|
||||||
|
from server.model_workers.base import ApiModelWorker
|
||||||
|
from fastchat import conversation as conv
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
from typing import List, Literal
|
||||||
|
from configs import TEMPERATURE
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_md5(input_string):
|
||||||
|
md5 = hashlib.md5()
|
||||||
|
md5.update(input_string.encode('utf-8'))
|
||||||
|
encrypted = md5.hexdigest()
|
||||||
|
return encrypted
|
||||||
|
|
||||||
|
|
||||||
|
def do_request():
|
||||||
|
url = "https://api.baichuan-ai.com/v1/stream/chat"
|
||||||
|
api_key = ""
|
||||||
|
secret_key = ""
|
||||||
|
|
||||||
|
data = {
|
||||||
|
"model": "Baichuan2-53B",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "世界第一高峰是"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parameters": {
|
||||||
|
"temperature": 0.1,
|
||||||
|
"top_k": 10
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
json_data = json.dumps(data)
|
||||||
|
time_stamp = int(time.time())
|
||||||
|
signature = calculate_md5(secret_key + json_data + str(time_stamp))
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
"Authorization": "Bearer " + api_key,
|
||||||
|
"X-BC-Request-Id": "your requestId",
|
||||||
|
"X-BC-Timestamp": str(time_stamp),
|
||||||
|
"X-BC-Signature": signature,
|
||||||
|
"X-BC-Sign-Algo": "MD5",
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(url, data=json_data, headers=headers)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
print("请求成功!")
|
||||||
|
print("响应header:", response.headers)
|
||||||
|
print("响应body:", response.text)
|
||||||
|
else:
|
||||||
|
print("请求失败,状态码:", response.status_code)
|
||||||
|
|
||||||
|
|
||||||
|
class BaiChuanWorker(ApiModelWorker):
|
||||||
|
BASE_URL = "https://api.baichuan-ai.com/v1/chat"
|
||||||
|
SUPPORT_MODELS = ["Baichuan2-53B"]
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
controller_addr: str,
|
||||||
|
worker_addr: str,
|
||||||
|
model_names: List[str] = ["baichuan-api"],
|
||||||
|
version: Literal["Baichuan2-53B"] = "Baichuan2-53B",
|
||||||
|
**kwargs,
|
||||||
|
):
|
||||||
|
kwargs.update(model_names=model_names, controller_addr=controller_addr, worker_addr=worker_addr)
|
||||||
|
kwargs.setdefault("context_len", 32768)
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
|
||||||
|
# TODO: 确认模板是否需要修改
|
||||||
|
self.conv = conv.Conversation(
|
||||||
|
name=self.model_names[0],
|
||||||
|
system_message="",
|
||||||
|
messages=[],
|
||||||
|
roles=["user", "assistant"],
|
||||||
|
sep="\n### ",
|
||||||
|
stop_str="###",
|
||||||
|
)
|
||||||
|
|
||||||
|
config = self.get_config()
|
||||||
|
self.version = config.get("version",version)
|
||||||
|
self.api_key = config.get("api_key")
|
||||||
|
self.secret_key = config.get("secret_key")
|
||||||
|
|
||||||
|
def generate_stream_gate(self, params):
|
||||||
|
data = {
|
||||||
|
"model": self.version,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": params["prompt"]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"parameters": {
|
||||||
|
"temperature": params.get("temperature",TEMPERATURE),
|
||||||
|
"top_k": params.get("top_k",1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
json_data = json.dumps(data)
|
||||||
|
time_stamp = int(time.time())
|
||||||
|
signature = calculate_md5(self.secret_key + json_data + str(time_stamp))
|
||||||
|
headers = {
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
"Authorization": "Bearer " + self.api_key,
|
||||||
|
"X-BC-Request-Id": "your requestId",
|
||||||
|
"X-BC-Timestamp": str(time_stamp),
|
||||||
|
"X-BC-Signature": signature,
|
||||||
|
"X-BC-Sign-Algo": "MD5",
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(self.BASE_URL, data=json_data, headers=headers)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
resp = eval(response.text)
|
||||||
|
yield json.dumps(
|
||||||
|
{
|
||||||
|
"error_code": resp["code"],
|
||||||
|
"text": resp["data"]["messages"][-1]["content"]
|
||||||
|
},
|
||||||
|
ensure_ascii=False
|
||||||
|
).encode() + b"\0"
|
||||||
|
else:
|
||||||
|
yield json.dumps(
|
||||||
|
{
|
||||||
|
"error_code": resp["code"],
|
||||||
|
"text": resp["msg"]
|
||||||
|
},
|
||||||
|
ensure_ascii=False
|
||||||
|
).encode() + b"\0"
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def get_embeddings(self, params):
|
||||||
|
# TODO: 支持embeddings
|
||||||
|
print("embedding")
|
||||||
|
print(params)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import uvicorn
|
||||||
|
from server.utils import MakeFastAPIOffline
|
||||||
|
from fastchat.serve.model_worker import app
|
||||||
|
|
||||||
|
worker = BaiChuanWorker(
|
||||||
|
controller_addr="http://127.0.0.1:20001",
|
||||||
|
worker_addr="http://127.0.0.1:21001",
|
||||||
|
)
|
||||||
|
sys.modules["fastchat.serve.model_worker"].worker = worker
|
||||||
|
MakeFastAPIOffline(app)
|
||||||
|
uvicorn.run(app, port=21001)
|
||||||
|
# do_request()
|
||||||
Loading…
Reference in New Issue