1. update readme;2. 解决多卡启动问题;3. 更新lora加载方式说明 (#1079)
* fix chat and knowledge_base_chat * 更新多卡部署 * update readme * update api and webui: 1. add download_doc to api 2. return local path or http url in kowledge_base_chat depends on no_remote_api 3. change assistant avater in webui * 解决多卡启动问题 * fix chat and knowledge_base_chat * 更新readme的lora加载方式 * update readme * 更新readme --------- Co-authored-by: liunux4odoo <liunu@qq.com>
This commit is contained in:
parent
d14d80d759
commit
71b528a2d1
87
README.md
87
README.md
|
|
@ -46,6 +46,7 @@
|
|||
🐳 [Docker 镜像](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0)
|
||||
|
||||
💻 一行命令运行 Docker:
|
||||
|
||||
```shell
|
||||
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.0
|
||||
```
|
||||
|
|
@ -56,7 +57,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
|||
|
||||
参见 [版本更新日志](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)。
|
||||
|
||||
从`0.1.x`升级过来的用户请注意,在完成[“开发部署 3 设置配置项”](docs/INSTALL.md)之后,需要将现有知识库迁移到新格式,具体见[知识库初始化与迁移](docs/INSTALL.md#知识库初始化与迁移)。
|
||||
从 `0.1.x`升级过来的用户请注意,在完成[“开发部署 3 设置配置项”](docs/INSTALL.md)之后,需要将现有知识库迁移到新格式,具体见[知识库初始化与迁移](docs/INSTALL.md#知识库初始化与迁移)。
|
||||
|
||||
### `0.2.0` 版本与 `0.1.x` 版本区别
|
||||
|
||||
|
|
@ -147,7 +148,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
|||
- 该版本目标为方便一键部署使用,请确保您已经在Linux发行版上安装了NVIDIA驱动程序
|
||||
- 请注意,您不需要在主机系统上安装CUDA工具包,但需要安装 `NVIDIA Driver` 以及 `NVIDIA Container Toolkit`,请参考[安装指南](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
|
||||
- 首次拉取和启动均需要一定时间,首次启动时请参照下图使用 `docker logs -f <container id>` 查看日志
|
||||
- 如遇到启动过程卡在 `Waiting..` 步骤,建议使用`docker exec -it <container id> bash` 进入 `/logs/` 目录查看对应阶段日志
|
||||
- 如遇到启动过程卡在 `Waiting..` 步骤,建议使用 `docker exec -it <container id> bash` 进入 `/logs/` 目录查看对应阶段日志
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -161,8 +162,7 @@ docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/ch
|
|||
|
||||
参见 [开发环境准备](docs/INSTALL.md)。
|
||||
|
||||
**请注意:** `0.2.0`及更新版本的依赖包与`0.1.x`版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。
|
||||
|
||||
**请注意:** `0.2.0`及更新版本的依赖包与 `0.1.x`版本依赖包可能发生冲突,强烈建议新建环境后重新安装依赖包。
|
||||
|
||||
### 2. 下载模型至本地
|
||||
|
||||
|
|
@ -209,26 +209,55 @@ embedding_model_dict = {
|
|||
当前项目的知识库信息存储在数据库中,在正式运行项目之前请先初始化数据库(我们强烈建议您在执行操作前备份您的知识文件)。
|
||||
|
||||
- 如果您是从 `0.1.x` 版本升级过来的用户,针对已建立的知识库,请确认知识库的向量库类型、Embedding 模型 `configs/model_config.py` 中默认设置一致,如无变化只需以下命令将现有知识库信息添加到数据库即可:
|
||||
```shell
|
||||
$ python init_database.py
|
||||
```
|
||||
|
||||
```shell
|
||||
$ python init_database.py
|
||||
```
|
||||
- 如果您是第一次运行本项目,知识库尚未建立,或者配置文件中的知识库类型、嵌入模型发生变化,需要以下命令初始化或重建知识库:
|
||||
```shell
|
||||
$ python init_database.py --recreate-vs
|
||||
```
|
||||
|
||||
```shell
|
||||
$ python init_database.py --recreate-vs
|
||||
```
|
||||
|
||||
### 5. 启动 API 服务或 Web UI
|
||||
|
||||
#### 5.1 启动 LLM 服务
|
||||
|
||||
**!!!注意1**:5.1.1-5.1.3三种方式只需选择一个即可。
|
||||
|
||||
**!!!注意2**:如果启动在线的API服务(如OPENAI的api接口),则无需启动LLM服务,即5.1小节的任何命令均无需启动。
|
||||
|
||||
##### 5.1.1 基于多进程脚本llm_api.py启动LLM服务
|
||||
|
||||
在项目根目录下,执行 [server/llm_api.py](server/llm_api.py) 脚本启动 **LLM 模型**服务:
|
||||
|
||||
```shell
|
||||
$ python server/llm_api.py
|
||||
```
|
||||
|
||||
以如上方式启动LLM服务会以nohup命令在后台运行 fastchat 服务,如需停止服务,可以运行如下命令:
|
||||
项目支持多卡加载,需在llm_api.py中修改create_model_worker_app函数中,修改gpus=None,num_gpus=1,max_gpu_memory="20GiB",三个参数,其中gpus控制使用的卡的ID,如果“0,1";num_gpus控制使用的卡数;max_gpu_memory控制每个卡使用的显存容量。
|
||||
|
||||
##### 5.1.2 基于命令行脚本llm_api_launch.py启动LLM服务
|
||||
|
||||
在项目根目录下,执行 [server/llm_api_launch.py](server/llm_api.py) 脚本启动 **LLM 模型**服务:
|
||||
|
||||
```shell
|
||||
$ python server/llm_api_launch.py
|
||||
```
|
||||
|
||||
该方式支持启动多个worker,示例启动方式:
|
||||
|
||||
```shell
|
||||
$ python server/llm_api_launch.py --model-path-addresss model1@host1@port1 model2@host2@port2
|
||||
```
|
||||
|
||||
如果要启动多卡加载,示例命令如下:
|
||||
|
||||
```shell
|
||||
$ python server/llm_api_launch.py --gpus 0,1 --num-gpus 2 --max-gpu-memory 10GiB
|
||||
```
|
||||
|
||||
注:以如上方式启动LLM服务会以nohup命令在后台运行 fastchat 服务,如需停止服务,可以运行如下命令:
|
||||
|
||||
```shell
|
||||
$ python server/llm_api_shutdown.py --serve all
|
||||
|
|
@ -236,9 +265,32 @@ $ python server/llm_api_shutdown.py --serve all
|
|||
|
||||
亦可单独停止一个 fastchat 服务模块,可选 [`all`, `controller`, `model_worker`, `openai_api_server`]
|
||||
|
||||
##### 5.1.3 lora加载
|
||||
|
||||
本项目基于fastchat加载LLM服务,故需以fastchat加载lora路径,即保证路径名称里必须有peft这个词,配置文件的名字为adapter_config.json,peft路径下包含model.bin格式的lora权重。
|
||||
|
||||
示例代码如下:
|
||||
|
||||
```shell
|
||||
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
|
||||
--model-path /data/chris/peft-llama-dummy-1 \
|
||||
--model-names peft-dummy-1 \
|
||||
--model-path /data/chris/peft-llama-dummy-2 \
|
||||
--model-names peft-dummy-2 \
|
||||
--model-path /data/chris/peft-llama-dummy-3 \
|
||||
--model-names peft-dummy-3 \
|
||||
--num-gpus 2
|
||||
```
|
||||
|
||||
详见 https://github.com/lm-sys/FastChat/pull/1905#issuecomment-1627801216
|
||||
|
||||
#### 5.2 启动 API 服务
|
||||
|
||||
启动 **LLM 服务**后,执行 [server/api.py](server/api.py) 脚本启动 **API** 服务
|
||||
本地部署情况下,!!!**启动LLM 服务后!!!**,再执行 [server/api.py](server/api.py) 脚本启动 **API** 服务;
|
||||
|
||||
在线调用API服务的情况下,直接执执行 [server/api.py](server/api.py) 脚本启动 **API** 服务;
|
||||
|
||||
调用命令示例:
|
||||
|
||||
```shell
|
||||
$ python server/api.py
|
||||
|
|
@ -252,13 +304,13 @@ $ python server/api.py
|
|||
|
||||
#### 5.3 启动 Web UI 服务
|
||||
|
||||
执行 [webui.py](webui.py) 启动 **Web UI** 服务(默认使用端口`8501`)
|
||||
**!!!启动API服务后!!!**,执行 [webui.py](webui.py) 启动 **Web UI** 服务(默认使用端口 `8501`)
|
||||
|
||||
```shell
|
||||
$ streamlit run webui.py
|
||||
```
|
||||
|
||||
使用 Langchain-Chatchat 主题色启动 **Web UI** 服务(默认使用端口`8501`)
|
||||
使用 Langchain-Chatchat 主题色启动 **Web UI** 服务(默认使用端口 `8501`)
|
||||
|
||||
```shell
|
||||
$ streamlit run webui.py --theme.base "light" --theme.primaryColor "#165dff" --theme.secondaryBackgroundColor "#f5f5f5" --theme.textColor "#000000"
|
||||
|
|
@ -273,7 +325,6 @@ $ streamlit run webui.py --server.port 666
|
|||
- Web UI 对话界面:
|
||||
|
||||

|
||||
|
||||
- Web UI 知识库管理页面:
|
||||
|
||||

|
||||
|
|
@ -308,11 +359,11 @@ $ streamlit run webui.py --server.port 666
|
|||
- [X] Bing 搜索
|
||||
- [X] DuckDuckGo 搜索
|
||||
- [ ] Agent 实现
|
||||
- [x] LLM 模型接入
|
||||
- [x] 支持通过调用 [fastchat](https://github.com/lm-sys/FastChat) api 调用 llm
|
||||
- [X] LLM 模型接入
|
||||
- [X] 支持通过调用 [fastchat](https://github.com/lm-sys/FastChat) api 调用 llm
|
||||
- [ ] 支持 ChatGLM API 等 LLM API 的接入
|
||||
- [X] Embedding 模型接入
|
||||
- [x] 支持调用 HuggingFace 中各开源 Emebdding 模型
|
||||
- [X] 支持调用 HuggingFace 中各开源 Emebdding 模型
|
||||
- [ ] 支持 OpenAI Embedding API 等 Embedding API 的接入
|
||||
- [X] 基于 FastAPI 的 API 方式调用
|
||||
- [X] Web UI
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ from server.chat import (chat, knowledge_base_chat, openai_chat,
|
|||
search_engine_chat)
|
||||
from server.knowledge_base.kb_api import list_kbs, create_kb, delete_kb
|
||||
from server.knowledge_base.kb_doc_api import (list_docs, upload_doc, delete_doc,
|
||||
update_doc, recreate_vector_store)
|
||||
update_doc, download_doc, recreate_vector_store)
|
||||
from server.utils import BaseResponse, ListResponse
|
||||
|
||||
nltk.data.path = [NLTK_DATA_PATH] + nltk.data.path
|
||||
|
|
@ -101,6 +101,10 @@ def create_app():
|
|||
summary="更新现有文件到知识库"
|
||||
)(update_doc)
|
||||
|
||||
app.get("/knowledge_base/download_doc",
|
||||
tags=["Knowledge Base Management"],
|
||||
summary="下载对应的知识文件")(download_doc)
|
||||
|
||||
app.post("/knowledge_base/recreate_vector_store",
|
||||
tags=["Knowledge Base Management"],
|
||||
summary="根据content中文档重建向量库,流式输出处理进度。"
|
||||
|
|
|
|||
|
|
@ -19,6 +19,7 @@ def chat(query: str = Body(..., description="用户输入", examples=["恼羞成
|
|||
{"role": "user", "content": "我们来玩成语接龙,我先来,生龙活虎"},
|
||||
{"role": "assistant", "content": "虎头虎脑"}]]
|
||||
),
|
||||
stream: bool = Body(False, description="流式输出"),
|
||||
):
|
||||
history = [History(**h) if isinstance(h, dict) else h for h in history]
|
||||
|
||||
|
|
@ -46,9 +47,16 @@ def chat(query: str = Body(..., description="用户输入", examples=["恼羞成
|
|||
callback.done),
|
||||
)
|
||||
|
||||
async for token in callback.aiter():
|
||||
# Use server-sent-events to stream the response
|
||||
yield token
|
||||
if stream:
|
||||
async for token in callback.aiter():
|
||||
# Use server-sent-events to stream the response
|
||||
yield token
|
||||
else:
|
||||
answer = ""
|
||||
async for token in callback.aiter():
|
||||
answer += token
|
||||
yield answer
|
||||
|
||||
await task
|
||||
|
||||
return StreamingResponse(chat_iterator(query, history),
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
from fastapi import Body
|
||||
from fastapi import Body, Request
|
||||
from fastapi.responses import StreamingResponse
|
||||
from configs.model_config import (llm_model_dict, LLM_MODEL, PROMPT_TEMPLATE,
|
||||
VECTOR_SEARCH_TOP_K)
|
||||
|
|
@ -14,6 +14,8 @@ from typing import List, Optional
|
|||
from server.chat.utils import History
|
||||
from server.knowledge_base.kb_service.base import KBService, KBServiceFactory
|
||||
import json
|
||||
import os
|
||||
from urllib.parse import urlencode
|
||||
|
||||
|
||||
def knowledge_base_chat(query: str = Body(..., description="用户输入", examples=["你好"]),
|
||||
|
|
@ -28,6 +30,8 @@ def knowledge_base_chat(query: str = Body(..., description="用户输入", examp
|
|||
"content": "虎头虎脑"}]]
|
||||
),
|
||||
stream: bool = Body(False, description="流式输出"),
|
||||
local_doc_url: bool = Body(False, description="知识文件返回本地路径(true)或URL(false)"),
|
||||
request: Request = None,
|
||||
):
|
||||
kb = KBServiceFactory.get_service_by_name(knowledge_base_name)
|
||||
if kb is None:
|
||||
|
|
@ -63,10 +67,16 @@ def knowledge_base_chat(query: str = Body(..., description="用户输入", examp
|
|||
callback.done),
|
||||
)
|
||||
|
||||
source_documents = [
|
||||
f"""出处 [{inum + 1}] [{doc.metadata["source"]}]({doc.metadata["source"]}) \n\n{doc.page_content}\n\n"""
|
||||
for inum, doc in enumerate(docs)
|
||||
]
|
||||
source_documents = []
|
||||
for inum, doc in enumerate(docs):
|
||||
filename = os.path.split(doc.metadata["source"])[-1]
|
||||
if local_doc_url:
|
||||
url = "file://" + doc.metadata["source"]
|
||||
else:
|
||||
parameters = urlencode({"knowledge_base_name": knowledge_base_name, "file_name":filename})
|
||||
url = f"{request.base_url}knowledge_base/download_doc?" + parameters
|
||||
text = f"""出处 [{inum + 1}] [{filename}]({url}) \n\n{doc.page_content}\n\n"""
|
||||
source_documents.append(text)
|
||||
|
||||
if stream:
|
||||
async for token in callback.aiter():
|
||||
|
|
@ -78,7 +88,7 @@ def knowledge_base_chat(query: str = Body(..., description="用户输入", examp
|
|||
answer = ""
|
||||
async for token in callback.aiter():
|
||||
answer += token
|
||||
yield json.dumps({"answer": token,
|
||||
yield json.dumps({"answer": answer,
|
||||
"docs": source_documents},
|
||||
ensure_ascii=False)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,10 +1,10 @@
|
|||
import os
|
||||
import urllib
|
||||
from fastapi import File, Form, Body, UploadFile
|
||||
from fastapi import File, Form, Body, Query, UploadFile
|
||||
from configs.model_config import DEFAULT_VS_TYPE, EMBEDDING_MODEL
|
||||
from server.utils import BaseResponse, ListResponse
|
||||
from server.knowledge_base.utils import validate_kb_name, list_docs_from_folder, KnowledgeFile
|
||||
from fastapi.responses import StreamingResponse
|
||||
from fastapi.responses import StreamingResponse, FileResponse
|
||||
import json
|
||||
from server.knowledge_base.kb_service.base import KBServiceFactory
|
||||
from typing import List
|
||||
|
|
@ -104,9 +104,32 @@ async def update_doc(
|
|||
return BaseResponse(code=500, msg=f"{kb_file.filename} 文件更新失败")
|
||||
|
||||
|
||||
async def download_doc():
|
||||
# TODO: 下载文件
|
||||
pass
|
||||
async def download_doc(
|
||||
knowledge_base_name: str = Query(..., examples=["samples"]),
|
||||
file_name: str = Query(..., examples=["test.txt"]),
|
||||
):
|
||||
'''
|
||||
下载知识库文档
|
||||
'''
|
||||
if not validate_kb_name(knowledge_base_name):
|
||||
return BaseResponse(code=403, msg="Don't attack me")
|
||||
|
||||
kb = KBServiceFactory.get_service_by_name(knowledge_base_name)
|
||||
if kb is None:
|
||||
return BaseResponse(code=404, msg=f"未找到知识库 {knowledge_base_name}")
|
||||
|
||||
kb_file = KnowledgeFile(filename=file_name,
|
||||
knowledge_base_name=knowledge_base_name)
|
||||
|
||||
if os.path.exists(kb_file.filepath):
|
||||
return FileResponse(
|
||||
path=kb_file.filepath,
|
||||
filename=kb_file.filename,
|
||||
media_type="multipart/form-data")
|
||||
else:
|
||||
return BaseResponse(code=500, msg=f"{kb_file.filename} 读取文件失败")
|
||||
|
||||
|
||||
|
||||
|
||||
async def recreate_vector_store(
|
||||
|
|
|
|||
|
|
@ -44,7 +44,7 @@ def create_model_worker_app(
|
|||
gptq_act_order=None,
|
||||
gpus=None,
|
||||
num_gpus=1,
|
||||
max_gpu_memory=None,
|
||||
max_gpu_memory="20GiB",
|
||||
cpu_offloading=None,
|
||||
worker_address=base_url.format(model_worker_port),
|
||||
controller_address=base_url.format(controller_port),
|
||||
|
|
|
|||
|
|
@ -76,6 +76,7 @@ parser.add_argument("--num-gpus", type=int, default=1)
|
|||
parser.add_argument(
|
||||
"--max-gpu-memory",
|
||||
type=str,
|
||||
default="20GiB",
|
||||
help="The maximum memory per gpu. Use a string like '13Gib'",
|
||||
)
|
||||
parser.add_argument(
|
||||
|
|
@ -131,11 +132,11 @@ worker_args = [
|
|||
"gptq-ckpt", "gptq-wbits", "gptq-groupsize",
|
||||
"gptq-act-order", "model-names", "limit-worker-concurrency",
|
||||
"stream-interval", "no-register",
|
||||
"controller-address"
|
||||
"controller-address","worker-address"
|
||||
]
|
||||
# -----------------openai server---------------------------
|
||||
|
||||
parser.add_argument("--server-host", type=str, default="127.0.0.1", help="host name")
|
||||
parser.add_argument("--server-host", type=str, default="localhost", help="host name")
|
||||
parser.add_argument("--server-port", type=int, default=8888, help="port number")
|
||||
parser.add_argument(
|
||||
"--allow-credentials", action="store_true", help="allow credentials"
|
||||
|
|
@ -214,6 +215,7 @@ def launch_worker(item):
|
|||
log_name = item.split("/")[-1].split("\\")[-1].replace("-", "_").replace("@", "_").replace(".", "_")
|
||||
# 先分割model-path-address,在传到string_args中分析参数
|
||||
args.model_path, args.worker_host, args.worker_port = item.split("@")
|
||||
args.worker_address = f"http://{args.worker_host}:{args.worker_port}"
|
||||
print("*" * 80)
|
||||
worker_str_args = string_args(args, worker_args)
|
||||
print(worker_str_args)
|
||||
|
|
|
|||
1
webui.py
1
webui.py
|
|
@ -35,7 +35,6 @@ if __name__ == "__main__":
|
|||
with st.sidebar:
|
||||
st.image(
|
||||
os.path.join(
|
||||
os.path.dirname(__file__),
|
||||
"img",
|
||||
"logo-long-chatchat-trans-v2.png"
|
||||
),
|
||||
|
|
|
|||
|
|
@ -8,7 +8,6 @@ import os
|
|||
|
||||
chat_box = ChatBox(
|
||||
assistant_avatar=os.path.join(
|
||||
os.path.dirname(os.path.dirname(os.path.dirname(__file__))),
|
||||
"img",
|
||||
"chatchat_icon_blue_square_v2.png"
|
||||
)
|
||||
|
|
|
|||
|
|
@ -259,6 +259,7 @@ class ApiRequest:
|
|||
self,
|
||||
query: str,
|
||||
history: List[Dict] = [],
|
||||
stream: bool = True,
|
||||
no_remote_api: bool = None,
|
||||
):
|
||||
'''
|
||||
|
|
@ -267,12 +268,18 @@ class ApiRequest:
|
|||
if no_remote_api is None:
|
||||
no_remote_api = self.no_remote_api
|
||||
|
||||
data = {
|
||||
"query": query,
|
||||
"history": history,
|
||||
"stream": stream,
|
||||
}
|
||||
|
||||
if no_remote_api:
|
||||
from server.chat.chat import chat
|
||||
response = chat(query, history)
|
||||
response = chat(**data)
|
||||
return self._fastapi_stream2generator(response)
|
||||
else:
|
||||
response = self.post("/chat/chat", json={"query": query, "history": history}, stream=True)
|
||||
response = self.post("/chat/chat", json=data, stream=True)
|
||||
return self._httpx_stream2generator(response)
|
||||
|
||||
def knowledge_base_chat(
|
||||
|
|
@ -296,6 +303,7 @@ class ApiRequest:
|
|||
"top_k": top_k,
|
||||
"history": history,
|
||||
"stream": stream,
|
||||
"local_doc_url": no_remote_api,
|
||||
}
|
||||
|
||||
if no_remote_api:
|
||||
|
|
|
|||
Loading…
Reference in New Issue