update README.md and INSTALL.md

2023-08-09 22:04:28 +08:00 · 2023-08-09 22:04:28 +08:00 · 22260af16f
parent 450b23c4f2
commit 22260af16f
11 changed files with 113 additions and 407 deletions
--- a/README.md
+++ b/README.md
@ -1,14 +1,29 @@
 # 基于本地知识库的 ChatGLM 等大语言模型应用实现

+## 目录
+
+* [介绍](#介绍)
+* [变更日志](#变更日志)
+* [模型支持](#模型支持)
+* [Docker 整合包](#Docker 整合包)
+* [Docker 部署](#Docker 部署)
+* [开发部署](#开发部署)
+  * [软件需求](#软件需求)
+  * [1. 开发环境准备](#1.-开发环境准备)
+  * [2. 下载模型至本地](#2.-下载模型至本地)
+  * [3. 设置配置项](#3.-设置配置项)
+  * [4. 启动 API 服务或 Web UI](#4.-启动-API-服务或-Web-UI)
+* [常见问题](#常见问题)
+* [路线图](#路线图)
+* [项目交流群](#项目交流群)
+
 ## 介绍

-🌍 [_READ THIS IN ENGLISH_](README_en.md)
-
 🤖️ 一种利用 [langchain](https://github.com/hwchase17/langchain) 思想实现的基于本地知识库的问答应用，目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。

-💡 受 [GanymedeNil](https://github.com/GanymedeNil) 的项目 [document.ai](https://github.com/GanymedeNil/document.ai) 和 [AlexZhangji](https://github.com/AlexZhangji) 创建的 [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216) 启发，建立了全流程可使用开源模型实现的本地知识库问答应用。现已支持使用 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) 等大语言模型直接接入，或通过 [fastchat](https://github.com/lm-sys/FastChat) api 形式接入 Vicuna, Alpaca, LLaMA, Koala, RWKV 等模型。
+💡 受 [GanymedeNil](https://github.com/GanymedeNil) 的项目 [document.ai](https://github.com/GanymedeNil/document.ai) 和 [AlexZhangji](https://github.com/AlexZhangji) 创建的 [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216) 启发，建立了全流程可使用开源模型实现的本地知识库问答应用。本项目的最新版本中通过使用 [FastChat](https://github.com/lm-sys/FastChat) 接入 Vicuna, Alpaca, LLaMA, Koala, RWKV 等模型，依托于 [langchain](https://github.com/langchain-ai/langchain) 框架支持通过基于 [FastAPI](https://github.com/tiangolo/fastapi) 提供的 API 调用服务，或使用基于 [Streamlit](https://github.com/streamlit/streamlit) 的 WebUI 进行操作。

-✅ 本项目中 Embedding 默认选用的是 [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main)，LLM 默认选用的是 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B)。依托上述模型，本项目可实现全部使用**开源**模型**离线私有部署**。
+✅ 依托于本项目支持的开源 LLM 与 Embedding 模型，本项目可实现全部使用**开源**模型**离线私有部署**。与此同时，本项目也支持 OpenAI GPT API 的调用，并将在后续持续扩充对各类模型及模型 API 的接入。

 ⛓️ 本项目实现原理如下图所示，过程包括加载文件 -> 读取文本 -> 文本分割 -> 文本向量化 -> 问句向量化 -> 在文本向量中匹配出与问句向量最相似的`top k`个 -> 匹配出的文本作为上下文和问题一起添加到`prompt`中 -> 提交给`LLM`生成回答。

@ -20,51 +35,74 @@

 ![实现原理图2](img/langchain+chatglm2.png)

-
 🚩 本项目未涉及微调、训练过程，但可利用微调或训练对本项目效果进行优化。

-🐳 Docker镜像：registry.cn-beijing.aliyuncs.com/isafetech/chatmydata:1.0 （感谢 @InkSong🌲 ）
+🐳 Docker镜像：registry.cn-beijing.aliyuncs.com/isafetech/chatmydata:1.0 （感谢 @InkSong🌲 ）

-💻 运行方式：docker run -d -p 80:7860 --gpus all registry.cn-beijing.aliyuncs.com/isafetech/chatmydata:1.0 
-
-🌐 [AutoDL 镜像](https://www.codewithgpu.com/i/imClumsyPanda/langchain-ChatGLM/langchain-ChatGLM)
-
-📓 [ModelWhale 在线运行项目](https://www.heywhale.com/mw/project/643977aa446c45f4592a1e59)
+💻 运行方式：docker run -d -p 80:7860 --gpus all registry.cn-beijing.aliyuncs.com/isafetech/chatmydata:1.0 

 ## 变更日志

 参见 [版本更新日志](https://github.com/imClumsyPanda/langchain-ChatGLM/releases)。

-## 硬件需求
+## 模型支持

- ChatGLM-6B 模型硬件需求
+本项目中默认使用的 LLM 模型为 [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)，默认使用的 Embedding 模型为 [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base) 为例。

-    注：如未将模型下载至本地，请执行前检查`$HOME/.cache/huggingface/`文件夹剩余空间，模型文件下载至本地需要 15 GB 存储空间。
-    注：一些其它的可选启动项见[项目启动选项](docs/StartOption.md)
-    模型下载方法可参考 [常见问题](docs/FAQ.md) 中 Q8。
-  
-    | **量化等级**   | **最低 GPU 显存**（推理） | **最低 GPU 显存**（高效参数微调） |
-    | -------------- | ------------------------- | --------------------------------- |
-    | FP16（无量化） | 13 GB                     | 14 GB                             |
-    | INT8           | 8 GB                     | 9 GB                             |
-    | INT4           | 6 GB                      | 7 GB                              |
+### LLM 模型支持

- MOSS 模型硬件需求
-    
-    注：如未将模型下载至本地，请执行前检查`$HOME/.cache/huggingface/`文件夹剩余空间，模型文件下载至本地需要 70 GB 存储空间
+本项目最新版本中基于 [FastChat](https://github.com/lm-sys/FastChat) 进行本地 LLM 模型接入，支持模型如下：

-    模型下载方法可参考 [常见问题](docs/FAQ.md) 中 Q8。
+- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
+- Vicuna, Alpaca, LLaMA, Koala
+- [BlinkDL/RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
+- [camel-ai/CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
+- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
+- [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
+- [h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
+- [lcw99/polyglot-ko-12.8b-chang-instruct-chat](https://huggingface.co/lcw99/polyglot-ko-12.8b-chang-instruct-chat)
+- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
+- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
+- [Neutralzz/BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)
+- [nomic-ai/gpt4all-13b-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
+- [NousResearch/Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
+- [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
+- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
+- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
+- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
+- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
+- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
+- [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
+- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
+- [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
+- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
+- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
+- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
+- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
+- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
+- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
+- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
+- 任何 [EleutherAI](https://huggingface.co/EleutherAI) 的 pythia 模型，如 [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)
+- 在以上模型基础上训练的任何 [Peft](https://github.com/huggingface/peft) 适配器。为了激活，模型路径中必须有 `peft` 。注意：如果加载多个peft模型，你可以通过在任何模型工作器中设置环境变量 `PEFT_SHARE_BASE_WEIGHTS=true` 来使它们共享基础模型的权重。

-    | **量化等级**  | **最低 GPU 显存**（推理） | **最低 GPU 显存**（高效参数微调） |
-    |-------------------|-----------------------| --------------------------------- |
-    | FP16（无量化） | 68 GB             | -                     |
-    | INT8      | 20 GB          | -                     |
+以上模型支持列表可能随 [FastChat](https://github.com/lm-sys/FastChat) 更新而持续更新，可参考 [FastChat 已支持模型列表](https://github.com/lm-sys/FastChat/blob/main/docs/model_support.md)。

- Embedding 模型硬件需求
+### Embedding 模型支持

-    本项目中默认选用的 Embedding 模型 [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main) 约占用显存 3GB，也可修改为在 CPU 中运行。
+本项目支持调用 [HuggingFace](https://huggingface.co/models?pipeline_tag=sentence-similarity) 中的 Embedding 模型，已支持的 Embedding 模型如下： 
+
+- [moka-ai/m3e-small](https://huggingface.co/moka-ai/m3e-small)
+- [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base)
+- [text2vec-base-chinese-sentence](https://huggingface.co/shibing624/text2vec-base-chinese-sentence)
+- [text2vec-base-chinese-paraphrase](https://huggingface.co/shibing624/text2vec-base-chinese-paraphrase)
+- [text2vec-base-multilingual](https://huggingface.co/shibing624/text2vec-base-multilingual)
+- [shibing624/text2vec-base-chinese](https://huggingface.co/shibing624/text2vec-base-chinese)
+- [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
+- [nghuyong/ernie-3.0-nano-zh](https://huggingface.co/nghuyong/ernie-3.0-nano-zh)
+- [nghuyong/ernie-3.0-base-zh](https://huggingface.co/nghuyong/ernie-3.0-base-zh)

 ## Docker 整合包
+
 🐳 Docker镜像地址：`registry.cn-beijing.aliyuncs.com/isafetech/chatmydata:1.0 `🌲

 💻 一行命令运行：
@ -94,129 +132,77 @@ sudo systemctl restart docker
 docker build -f Dockerfile-cuda -t chatglm-cuda:latest .
 docker run --gpus all -d --name chatglm -p 7860:7860  chatglm-cuda:latest

-#若要使用离线模型，请配置好模型路径，然后此repo挂载到Container
+#若要使用离线模型，请配置好模型路径，然后此 repo 挂载到 Container
 docker run --gpus all -d --name chatglm -p 7860:7860 -v ~/github/langchain-ChatGLM:/chatGLM  chatglm-cuda:latest
 ```

-
 ## 开发部署

 ### 软件需求

 本项目已在 Python 3.8.1 - 3.10，CUDA 11.7 环境下完成测试。已在 Windows、ARM 架构的 macOS、Linux 系统中完成测试。

-vue前端需要node18环境
+### 1. 开发环境准备

-### 从本地加载模型
+参见 [开发环境准备](docs/INSTALL.md)。

-请参考 [THUDM/ChatGLM-6B#从本地加载模型](https://github.com/THUDM/ChatGLM-6B#从本地加载模型)
+### 2. 下载模型至本地

-### 1. 安装环境
+如需在本地或离线环境下运行本项目，需要首先将项目所需的模型下载至本地，通常开源 LLM 与 Embedding 模型可以从 [HuggingFace](https://huggingface.co/models) 下载。

-参见 [安装指南](docs/INSTALL.md)。
+以本项目中默认使用的 LLM 模型 [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) 与 Embedding 模型 [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base) 为例：

-### 2. 设置模型默认参数
+下载模型需要先[安装Git LFS](https://docs.github.com/zh/repositories/working-with-files/managing-large-files/installing-git-large-file-storage)，然后运行
+
+```Shell
+$ git clone https://huggingface.co/THUDM/chatglm2-6b
+
+$ git clone https://huggingface.co/moka-ai/m3e-base
+```
+
+### 3. 设置配置项

 在开始执行 Web UI 或命令行交互前，请先检查 [configs/model_config.py](configs/model_config.py) 中的各项模型参数设计是否符合需求。

-如需通过 fastchat 以 api 形式调用 llm，请参考 [fastchat 调用实现](docs/fastchat.md)
+### 4. 启动 API 服务或 Web UI

-### 3. 执行脚本体验 Web UI 或命令行交互
-
-> 注：鉴于环境部署过程中可能遇到问题，建议首先测试命令行脚本。建议命令行脚本测试可正常运行后再运行 Web UI。
-
-执行 [cli_demo.py](cli_demo.py) 脚本体验**命令行交互**：
+在项目根目录下，执行 [server/llm_api.py](server/llm_api.py) 脚本启动 **LLM 模型**服务：
 ```shell
-$ python cli_demo.py
+$ python server/llm_api.py
 ```

-或执行 [webui.py](webui.py) 脚本体验 **Web 交互**
+执行 [server/api.py](server/api.py) 脚本启动 **API** 服务

+```shell
+$ python server/api.py
+```
+
+执行 [webui.py](webui.py) 启动 **Web UI** 服务
 ```shell
 $ python webui.py
 ```

-或执行 [api.py](api.py) 利用 fastapi 部署 API
-```shell
-$ python api.py
-```
-或成功部署 API 后，执行以下脚本体验基于 VUE 的前端页面
-```shell
-$ cd views 
-
-$ pnpm i
-
-$ npm run dev
-```
-
-VUE 前端界面如下图所示：
-1. `对话` 界面
-![](img/vue_0521_0.png)
-2. `知识库问答` 界面
-![](img/vue_0521_1.png)
-3. `Bing搜索` 界面
-![](img/vue_0521_2.png)
-
-WebUI 界面如下图所示：
-1. `对话` Tab 界面
-![](img/webui_0521_0.png)
-2. `知识库测试 Beta` Tab 界面
-![](img/webui_0510_1.png)
-3. `模型配置` Tab 界面
-![](img/webui_0510_2.png)
-
-Web UI 可以实现如下功能：
-
-1. 运行前自动读取`configs/model_config.py`中`LLM`及`Embedding`模型枚举及默认模型设置运行模型，如需重新加载模型，可在 `模型配置` Tab 重新选择后点击 `重新加载模型` 进行模型加载；
-2. 可手动调节保留对话历史长度、匹配知识库文段数量，可根据显存大小自行调节；
-3. `对话` Tab 具备模式选择功能，可选择 `LLM对话` 与 `知识库问答` 模式进行对话，支持流式对话；
-4. 添加 `配置知识库` 功能，支持选择已有知识库或新建知识库，并可向知识库中**新增**上传文件/文件夹，使用文件上传组件选择好文件后点击 `上传文件并加载知识库`，会将所选上传文档数据加载至知识库中，并基于更新后知识库进行问答；
-5. 新增 `知识库测试 Beta` Tab，可用于测试不同文本切分方法与检索相关度阈值设置，暂不支持将测试参数作为 `对话` Tab 设置参数。
-6. 后续版本中将会增加对知识库的修改或删除，及知识库中已导入文件的查看。
-
 ### 常见问题

 参见 [常见问题](docs/FAQ.md)。

-## Demo
-
-以问题`chatglm-6b 的局限性具体体现在哪里，如何实现改进`为例：
-
-未使用 langchain 接入本地文档时：
-
->ChatGLM-6B 是一个基于 GPT-3 的聊天机器人,它的局限性主要体现在以下几个方面:
->1. 语言模型的精度:GPT-3 是一种大型的语言模型,它的精度比较高,但是仍然有一些局限性。例如,它可能无法准确地理解某些方言、口音或者复杂的句子结构。
->2. 知识图谱的完整性:GPT-3 可以生成一些知识图谱,但是其完整性和准确性可能不够。例如,它可能无法准确地理解某些领域的专业知识和术语。
->3. 对话生成的连贯性:GPT-3 可以生成一些连贯的对话,但是其生成的对话可能不够自然。例如,它可能会生成一些虚假的、不连贯的对话或者误解用户的意图。
->
->为了改进 ChatGLM-6B,可以考虑以下几个方面:
->1. 提高语言模型的精度:可以通过增加训练数据、使用更好的语言模型架构、优化模型训练过程等方式来提高语言模型的精度。
->2. 增强知识图谱的完整性:可以通过增加知识图谱的维度、使用更好的知识图谱构建方法、优化知识图谱的匹配算法等方式来增强知识图谱的完整性。
->3. 提高对话生成的连贯性:可以通过增加对话生成模型的上下文信息、使用更好的上下文信息提取方法、优化对话生成模型的算法等方式来提高对话生成的连贯性。
->4. 引入更多的人类反馈:人类反馈可以帮助 ChatGLM-6B 更好地理解用户的意图和需求,从而提高对话生成的质量和准确性。可以通过引入更多的人类客服、使用人类反馈收集器等方式来引入更多的人类反馈。
-
-使用 langchain 接入 ChatGLM-6B 项目的 README.md 文件后 ChatGLM 的回答：
->ChatGLM-6B 是一个基于深度学习的自然语言处理模型,它在回答问题方面表现出色。但是,它也有一些局限性,具体如下:
->1. 语言理解有限:ChatGLM-6B 只学习了中文自然语言,对于其他语言可能会存在理解上的局限性。
->2. 知识储备不足:ChatGLM-6B 的训练数据集只包含了中文语料,因此它可能无法回答一些非中文的问题或者某些特定领域的问题。
->3. 数据量有限:ChatGLM-6B 的训练数据集只有几百万条记录,因此它可能无法回答一些非常具体或者复杂的问题。
->
->为了改进 ChatGLM-6B,可以考虑以下几个方面:
->1. 学习更多的语言知识:可以学习其他语言的自然语言处理技术,扩大语言理解的范围。
->2. 扩大知识储备:可以收集更多的中文语料,或者使用其他语言的数据集来扩充知识储备。
->3. 增加数据量:可以使用更大的数据集来训练 ChatGLM-6B,提高模型的表现。
->4. 引入更多的评估指标:可以引入更多的评估指标来评估模型的表现,从而发现 ChatGLM-6B 存在的不足和局限性。
->5. 改进模型架构:可以改进 ChatGLM-6B 的模型架构,提高模型的性能和表现。例如,可以使用更大的神经网络或者改进的卷积神经网络结构。
-
 ## 路线图

- [ ] Langchain 应用
-  - [x] 接入非结构化文档（已支持 md、pdf、docx、txt 文件格式）
-  - [x] jpg 与 png 格式图片的 OCR 文字识别
+- [x] Langchain 应用
+  - [x] 本地数据接入
+    - [x] 接入非结构化文档
+      - [x] .md
+      - [x] .txt
+      - [x] .
+    - [ ] 结构化数据接入
+      - [x] .csv
+      - [ ] .xlsx
+    - [ ] 本地网页接入
+    - [ ] SQL 接入
+    - [ ] 知识图谱/图数据库接入
  - [x] 搜索引擎接入
-  - [ ] 本地网页接入
-  - [ ] 结构化数据接入（如 csv、Excel、SQL 等）
-  - [ ] 知识图谱/图数据库接入
+    - [x] Bing 搜索
+    - [x] DuckDuckGo 搜索
  - [ ] Agent 实现
 - [x] 增加更多 LLM 模型支持
  - [x] [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
@ -238,21 +224,9 @@ Web UI 可以实现如下功能：
  - [x] [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)
  - [x] [moka-ai/m3e-small](https://huggingface.co/moka-ai/m3e-small)
  - [x] [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base)
- [ ] Web UI
-  - [x] 基于 gradio 实现 Web UI DEMO
-  - [x] 基于 streamlit 实现 Web UI DEMO
-  - [x] 添加输出内容及错误提示
-  - [x] 引用标注
-  - [ ] 增加知识库管理
-    - [x] 选择知识库开始问答
-    - [x] 上传文件/文件夹至知识库
-    - [x] 知识库测试
-    - [x] 删除知识库中文件
-  - [x] 支持搜索引擎问答
- [ ] 增加 API 支持
-  - [x] 利用 fastapi 实现 API 部署方式
-  - [ ] 实现调用 API 的 Web UI Demo
- [x] VUE 前端
+- [x] 基于 FastAPI 的 API 方式调用
+- [x] Web UI
+  - [x] 基于 Streamlit 的 Web UI

 ## 项目交流群
 <img src="img/qr_code_46.jpg" alt="二维码" width="300" height="300" />
--- a/README_en.md
+++ b/README_en.md
@ -1,251 +0,0 @@
-# ChatGLM Application with Local Knowledge Implementation
-
-## Introduction
-
-[![Telegram](https://img.shields.io/badge/Telegram-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white "langchain-chatglm")](https://t.me/+RjliQ3jnJ1YyN2E9)
-
-🌍 [_中文文档_](README.md)
-
-🤖️ This is a ChatGLM application based on local knowledge, implemented using [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) and [langchain](https://github.com/hwchase17/langchain).
-
-💡 Inspired by [document.ai](https://github.com/GanymedeNil/document.ai) and [Alex Zhangji](https://github.com/AlexZhangji)'s [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216), this project establishes a local knowledge question-answering application using open-source models.
-
-✅ The embeddings used in this project are [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main), and the LLM is [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B). Relying on these models, this project enables the use of **open-source** models for **offline private deployment**.
-
-⛓️ The implementation principle of this project is illustrated in the figure below. The process includes loading files -> reading text -> text segmentation -> text vectorization -> question vectorization -> matching the top k most similar text vectors to the question vector -> adding the matched text to `prompt` along with the question as context -> submitting to `LLM` to generate an answer.
-
-![Implementation schematic diagram](img/langchain+chatglm.png)
-
-🚩 This project does not involve fine-tuning or training; however, fine-tuning or training can be employed to optimize the effectiveness of this project.
-
-📓 [ModelWhale online notebook](https://www.heywhale.com/mw/project/643977aa446c45f4592a1e59)
-
-## Changelog
-
-**[2023/04/15]**
-
-   1. refactor the project structure to keep the command line demo [cli_demo.py](cli_demo.py) and the Web UI demo [webui.py](webui.py) in the root directory.
-   2. Improve the Web UI by modifying it to first load the model according to the default option of [configs/model_config.py](configs/model_config.py) after running the Web UI, and adding error messages, etc.
-   3. Update FAQ.
-
-**[2023/04/12]**
-
-   1. Replaced the sample files in the Web UI to avoid issues with unreadable files due to encoding problems in Ubuntu;
-   2. Replaced the prompt template in `knowledge_based_chatglm.py` to prevent confusion in the content returned by ChatGLM, which may arise from the prompt template containing Chinese and English bilingual text.
-
-**[2023/04/11]**
-
-   1. Added Web UI V0.1 version (thanks to [@liangtongt](https://github.com/liangtongt));
-   2. Added Frequently Asked Questions in `README.md` (thanks to [@calcitem](https://github.com/calcitem) and [@bolongliu](https://github.com/bolongliu));
-   3. Enhanced automatic detection for the availability of `cuda`, `mps`, and `cpu` for LLM and Embedding model running devices;
-   4. Added a check for `filepath` in `knowledge_based_chatglm.py`. In addition to supporting single file import, it now supports a single folder path as input. After input, it will traverse each file in the folder and display a command-line message indicating the success of each file load.
-
-5. **[2023/04/09]**
-
-   1. Replaced the previously selected `ChatVectorDBChain` with `RetrievalQA` in `langchain`, effectively reducing the issue of stopping due to insufficient video memory after asking 2-3 times;
-   2. Added `EMBEDDING_MODEL`, `VECTOR_SEARCH_TOP_K`, `LLM_MODEL`, `LLM_HISTORY_LEN`, `REPLY_WITH_SOURCE` parameter value settings in `knowledge_based_chatglm.py`;
-   3. Added `chatglm-6b-int4` and `chatglm-6b-int4-qe`, which require less GPU memory, as LLM model options;
-   4. Corrected code errors in `README.md` (thanks to [@calcitem](https://github.com/calcitem)).
-
-**[2023/04/07]**
-
-   1. Resolved the issue of doubled video memory usage when loading the ChatGLM model (thanks to [@suc16](https://github.com/suc16) and [@myml](https://github.com/myml));
-   2. Added a mechanism to clear video memory;
-   3. Added `nghuyong/ernie-3.0-nano-zh` and `nghuyong/ernie-3.0-base-zh` as Embedding model options, which consume less video memory resources than `GanymedeNil/text2vec-large-chinese` (thanks to [@lastrei](https://github.com/lastrei)).
-
-## How to Use
-
-### Hardware Requirements
-
- ChatGLM-6B Model Hardware Requirements
-  
-     | **Quantization Level** | **Minimum GPU Memory** (inference) | **Minimum GPU Memory** (efficient parameter fine-tuning) |
-     | -------------- | ------------------------- | --------------------------------- |
-     | FP16 (no quantization) | 13 GB | 14 GB |
-     | INT8 | 8 GB | 9 GB |
-     | INT4 | 6 GB | 7 GB |
-
- Embedding Model Hardware Requirements
-
-     The default Embedding model [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main) in this project occupies around 3GB of video memory and can also be configured to run on a CPU.
-### Software Requirements
-
-This repository has been tested with Python 3.8 and CUDA 11.7 environments.
-
-### 1. Setting up the environment
-
-* Environment check
-
-```shell
-# First, make sure your machine has Python 3.8 or higher installed
-$ python --version
-Python 3.8.13
-
-# If your version is lower, you can use conda to install the environment
-$ conda create -p /your_path/env_name python=3.8
-
-# Activate the environment
-$ source activate /your_path/env_name
-
-# or, do not specify an env path, note that /your_path/env_name is to be replaced with env_name below
-$ conda create -n env_name python=3.8
-$ conda activate env_name # Activate the environment
-
-# Deactivate the environment
-$ source deactivate /your_path/env_name
-
-# Remove the environment
-$ conda env remove -p  /your_path/env_name
-```
-
-* Project dependencies
-
-```shell
-
-# Clone the repository
-$ git clone https://github.com/imClumsyPanda/langchain-ChatGLM.git
-
-# Install dependencies
-$ pip install -r requirements.txt
-```
-
-Note: When using langchain.document_loaders.UnstructuredFileLoader for unstructured file integration, you may need to install other dependency packages according to the documentation. Please refer to [langchain documentation](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html).
-
-### 2. Run Scripts to Experience Web UI or Command Line Interaction
-
-Execute [webui.py](webui.py) script to experience **Web interaction** <img src="https://img.shields.io/badge/Version-0.1-brightgreen">
-```commandline
-python webui.py
-
-```
-Or execute [api.py](api.py) script to deploy web api.
-```shell
-$ python api.py
-```
-Note: Before executing, check the remaining space in the `$HOME/.cache/huggingface/` folder, at least 15G.
-
-Or execute following command to run VUE after api.py executed
-```shell
-$ cd views 
-
-$ pnpm i
-
-$ npm run dev
-```
-
-VUE interface screenshots:
-
-![](img/vue_0521_0.png)
-
-![](img/vue_0521_1.png)
-
-![](img/vue_0521_2.png)
-
-Web UI interface screenshots:
-
-![img.png](img/webui_0521_0.png)
-
-![](img/webui_0510_1.png)
-
-![](img/webui_0510_2.png)
-
-The Web UI supports the following features:
-
-1. Automatically reads the `LLM` and `embedding` model enumerations in `configs/model_config.py`, allowing you to select and reload the model by clicking `重新加载模型`.
-2. The length of retained dialogue history can be manually adjusted according to the available video memory.
-3. Adds a file upload function. Select the uploaded file through the drop-down box, click `加载文件` to load the file, and change the loaded file at any time during the process.
-
-Alternatively, execute the [knowledge_based_chatglm.py](https://chat.openai.com/chat/cli_demo.py) script to experience **command line interaction**:
-
-```commandline
-python knowledge_based_chatglm.py
-```
-
-### FAQ
-
-Q1: What file formats does this project support?
-
-A1: Currently, this project has been tested with txt, docx, and md file formats. For more file formats, please refer to the [langchain documentation](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html). It is known that if the document contains special characters, there might be issues with loading the file.
-
-Q2: How can I resolve the `detectron2` dependency issue when reading specific file formats?
-
-A2: As the installation process for this package can be problematic and it is only required for some file formats, it is not included in `requirements.txt`. You can install it with the following command:
-
-```commandline
-pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"
-```
-
-Q3: How can I solve the `Resource punkt not found.` error?
-
-A3: Unzip the `packages/tokenizers` folder from https://github.com/nltk/nltk_data/raw/gh-pages/packages/tokenizers/punkt.zip, and place it in the `nltk_data/tokenizers` storage path.
-
-The `nltk_data` storage path can be found using `nltk.data.path`.
-
-Q4: How can I solve the `Resource averaged_perceptron_tagger not found.` error?
-
-A4: Download https://github.com/nltk/nltk_data/blob/gh-pages/packages/taggers/averaged_perceptron_tagger.zip, extract it, and place it in the `nltk_data/taggers` storage path.
-
-The `nltk_data` storage path can be found using `nltk.data.path`.
-
-Q5: Can this project run in Google Colab?
-
-A5: You can try running the chatglm-6b-int4 model in Google Colab. Please note that if you want to run the Web UI in Colab, you need to set the `share` parameter in `demo.queue(concurrency_count=3).launch(server_name='0.0.0.0', share=False, inbrowser=False)` to `True`.
-
-This issue is related to the system environment. For more details, please refer to [Issues with installing packages using pip in Anaconda](docs/Issue-with-Installing-Packages-Using-pip-in-Anaconda.md).
-
-## DEMO
-
-Using the question "What are the limitations of ChatGLM-6B and how can they be improved?" as an example:
-
-Without using langchain to access local documents:
-
-> ChatGLM-6B is a chatbot based on GPT-3, and its limitations are mainly reflected in the following aspects:
->
->   1. Language model accuracy: Although GPT-3 is a large language model with relatively high accuracy, it still has some limitations. For instance, it may not be able to accurately understand certain dialects, accents, or complex sentence structures.
->   2. Completeness of the knowledge graph: GPT-3 can generate some knowledge graphs, but its completeness and accuracy may not be sufficient. For example, it may struggle to accurately understand specific domain expertise and terminology.
->   3. Coherence of dialogue generation: While GPT-3 can generate somewhat coherent dialogues, they may not always be natural. For instance, it may generate false, incoherent dialogue or misinterpret the user's intent.
->
->   To improve ChatGLM-6B, consider the following aspects:
->
->   1. Enhance the accuracy of the language model: The accuracy of the language model can be improved by increasing the training data, using better language model architectures, and optimizing the model training process.
->   2. Strengthen the integrity of the knowledge graph: The integrity of the knowledge graph can be enhanced by increasing its dimensions, employing better knowledge graph construction methods, and optimizing the knowledge graph's matching algorithm.
->   3. Boost the coherence of dialogue generation: The coherence of dialogue generation can be improved by augmenting the context information of the dialogue generation model, utilizing better context information extraction methods, and optimizing the dialogue generation model's algorithm.
->   4. Incorporate more human feedback: Human feedback can help ChatGLM-6B better understand users' intentions and needs, thereby improving the quality and accuracy of dialogue generation. More human feedback can be introduced by involving more human agents and using human feedback collectors.
-
-ChatGLM's answer after using LangChain to access the README.md file of the ChatGLM-6B project:
->ChatGLM-6B is a deep learning-based natural language processing model that excels at answering questions. However, it also has some limitations, as follows:
->1. Limited language understanding: ChatGLM-6B has been primarily trained on Chinese natural language, and its understanding of other languages may be limited.
->2. Insufficient knowledge base: The training dataset of ChatGLM-6B contains only a Chinese corpus, so it may not be able to answer non-Chinese questions or queries in specific domains.
->3. Limited data volume: ChatGLM-6B's training dataset has only a few million records, which may hinder its ability to answer very specific or complex questions.
->
->To improve ChatGLM-6B, consider the following aspects:
->1. Expand language knowledge: Learn natural language processing techniques in other languages to broaden the model's language understanding capabilities.
->2. Broaden the knowledge base: Collect more Chinese corpora or use datasets in other languages to expand the model's knowledge base.
->3. Increase data volume: Use larger datasets to train ChatGLM-6B, which can improve the model's performance.
->4. Introduce more evaluation metrics: Incorporate additional evaluation metrics to assess the model's performance, which can help identify the shortcomings and limitations of ChatGLM-6B.
->5. Enhance the model architecture: Improve ChatGLM-6B's model architecture to boost its performance and capabilities. For example, employ larger neural networks or refined convolutional neural network structures.
-
-## Roadmap
-
- [x] Implement LangChain + ChatGLM-6B for local knowledge application
- [x] Unstructured file access based on langchain
-   - [x].md
-   - [x].pdf
-   - [x].docx
-   - [x].txt
- [ ] Add support for more LLM models
-   - [x] THUDM/chatglm-6b
-   - [x] THUDM/chatglm-6b-int4
-   - [x] THUDM/chatglm-6b-int4-qe
- [ ] Add Web UI DEMO
-   - [x] Implement Web UI DEMO using Gradio
-   - [x] Add output and error messages
-   - [x] Citation callout
-   - [ ] Knowledge base management
-     - [x] QA based on selected knowledge base
-     - [x] Add files/folder to knowledge base
-     - [ ] Add files/folder to knowledge base
-   - [ ] Implement Web UI DEMO using Streamlit
- [ ] Add support for API deployment
-  - [x] Use fastapi to implement API
-  - [ ] Implement Web UI DEMO for API calls
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@ -3,7 +3,7 @@
 ## 环境检查

 ```shell
-# 首先，确信你的机器安装了 Python 3.8 及以上版本
+# 首先，确信你的机器安装了 Python 3.8 - 3.10 版本
 $ python --version
 Python 3.8.13

@ -36,26 +36,9 @@ $ git clone https://github.com/imClumsyPanda/langchain-ChatGLM.git
 # 进入目录
 $ cd langchain-ChatGLM

-# 项目中 pdf 加载由先前的 detectron2 替换为使用 paddleocr，如果之前有安装过 detectron2 需要先完成卸载避免引发 tools 冲突
-$ pip uninstall detectron2
-
-# 检查paddleocr依赖，linux环境下paddleocr依赖libX11，libXext
-$ yum install libX11
-$ yum install libXext
-
 # 安装依赖
 $ pip install -r requirements.txt

-# 验证paddleocr是否成功，首次运行会下载约18M模型到~/.paddleocr
-$ python loader/image_loader.py
-
 ```

-注：使用 `langchain.document_loaders.UnstructuredFileLoader` 进行非结构化文件接入时，可能需要依据文档进行其他依赖包的安装，请参考 [langchain 文档](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html)。
-
-## llama-cpp模型调用的说明
-
-1. 首先从huggingface hub中下载对应的模型，如 [https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/](https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/) 的 [ggml-vic13b-q5_1.bin](https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/blob/main/ggml-vic13b-q5_1.bin)，建议使用huggingface_hub库的snapshot_download下载。
-2. 将下载的模型重命名。通过huggingface_hub下载的模型会被重命名为随机序列，因此需要重命名为原始文件名，如[ggml-vic13b-q5_1.bin](https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/blob/main/ggml-vic13b-q5_1.bin)。
-3. 基于下载模型的ggml的加载时间，推测对应的llama-cpp版本，下载对应的llama-cpp-python库的wheel文件，实测[ggml-vic13b-q5_1.bin](https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/blob/main/ggml-vic13b-q5_1.bin)与llama-cpp-python库兼容,然后手动安装wheel文件。
-4. 将下载的模型信息写入configs/model_config.py文件里 `llm_model_dict`中，注意保证参数的兼容性，一些参数组合可能会报错.
+注：使用 `langchain.document_loaders.UnstructuredFileLoader` 进行 `.docx` 等格式非结构化文件接入时，可能需要依据文档进行其他依赖包的安装，请参考 [langchain 文档](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html)。
--- a/img/vue_0521_0.png
+++ b/img/vue_0521_0.png
--- a/img/vue_0521_1.png
+++ b/img/vue_0521_1.png
--- a/img/vue_0521_2.png
+++ b/img/vue_0521_2.png
--- a/img/webui_0419.png
+++ b/img/webui_0419.png
--- a/img/webui_0510_0.png
+++ b/img/webui_0510_0.png
--- a/img/webui_0510_1.png
+++ b/img/webui_0510_1.png
--- a/img/webui_0510_2.png
+++ b/img/webui_0510_2.png
--- a/img/webui_0521_0.png
+++ b/img/webui_0521_0.png