Chroma - Ollama 搭建本地 RAG 應用

本文作者爲 360 奇舞團前端開發工程師

本篇文章我們將基於Ollama本地運行大語言模型（LLM），並結合ChormaDB、Langchain來建立一個小型的基於網頁內容進行本地問答的RAG應用。

概念介紹

先簡單瞭解下這些術語：

LLM (A large language model) 是通過使用海量的文本數據集（書籍、網站等）訓練出來的，具備通用語言理解和生成的能力。雖然它可以推理許多內容，但它們的知識僅限於特定時間點之前用於訓練的數據。

LangChain 是一個用於開發由大型語言模型（LLM）驅動的應用程序的框架。提供了豐富的接口、組件、能力簡化了構建LLM應用程序的過程。

Ollama 是一個免費的開源框架，可以讓大模型很容易的運行在本地電腦上。

RAG（Retrieval Augmented Generation）是一種利用額外數據增強 LLM 知識的技術，它通過從外部數據庫獲取當前或相關上下文信息，並在請求大型語言模型（LLM）生成響應時呈現給它，從而解決了生成不正確或誤導性信息的問題。

工作流程圖解如下：

基於上述RAG步驟, 接下來我們將使用代碼完成它。

開始搭建

1. 依據Ollama使用指南完成大模型的本地下載和的運行。

# LLM
ollama pull llama3
# Embedding Model
ollama pull nomic-embed-text

2. 安裝langchain、langchain-community、bs4

pip install langchain langchain-community bs4

3. 初始化langchain提供的Ollama對象

from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# 1. 初始化llm, 讓其流式輸出
llm = Ollama(model="llama3", 
             temperature=0.1, 
             top_p=0.4, 
             callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
             )

temperature控制文本生成的創造性，爲0時響應是可預測，始終選擇下一個最可能的單詞，這對於事實和準確性非常重要的答案是非常有用的。爲 1時生成文本會選擇更多的單詞，會產生更具創意但不可能預測的答案。

top_p 或核心採樣決定了生成時要考慮多少可能的單詞。高top_p值意味着模型會考慮更多可能的單詞，甚至是可能性較低的單詞，從而使生成的文本更加多樣化。

較低的temperature和較高的top_p，可以產生具有創意的連貫文字。由於temperature較低，答案通常具有邏輯性和連貫性，但由於top_p較高，答案仍然具有豐富的詞彙和觀點。比較適合生成信息類文本，內容清晰且能吸引讀者。

較高的temperature和較低的top_p，可能會把單詞以難以預測的方式組合在一起。生成的文本創意高，會出現意想不到的結果，適合創作。

4. 獲取 RAG 檢索內容並分塊

#`BeautifulSoup'解析網頁內容：按照標籤、類名、ID 等方式來定位和提取你需要的內容
import bs4 
#Load HTML pages using `urllib` and parse them with `BeautifulSoup'
from langchain_community.document_loaders import WebBaseLoader
#文本分割
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    web_paths=("https://vuejs.org/guide/introduction.html#html",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("content",),
            # id=("article-root",)
        )
    ),
)
docs = loader.load()
# chunk_overlap：分塊的重疊部分
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

chunk_overlap：分塊的重疊部分, 重疊有助於降低將語句與與其相關的重要上下文分開的可能性。chunk_size：分塊的大小，合理的分詞設置會提高RAG的效果

內容基於本地的詞嵌入模型 nomic-embed-text 嵌入向量數據庫中

# 向量嵌入 ::: conda install onnxruntime -c conda-forge
from langchain_community.vectorstores import Chroma
# 有許多嵌入模型
from langchain_community.embeddings import OllamaEmbeddings
# 基於ollama運行嵌入模型 nomic-embed-text ：A high-performing open embedding model with a large token context window.
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=OllamaEmbeddings(model="nomic-embed-text"))
# 相似搜索
# vectorstore.similarity_search("vue")

此處的嵌入模型也可以使用其他的比如llama3、mistral，但是在本地運行太慢了，它們和nomic-embed-text 一樣不支持中文的詞嵌入。如果想試試建立一箇中文的文檔庫，可以試試 herald/dmeta-embedding-zh詞嵌入的模型，支持中文。

ollama pull herald/dmeta-embedding-zh:latest

設置Prompt規範輸出

from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate(
    input_variables=['context', 'question'],
    template=
    """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the 
    question. you don't know the answer, just say you don't know 
    without any explanation Question: {question} Context: {context} Answer:""",
)

基於langchain實現檢索問答

from langchain.chains import RetrievalQA
# 向量數據庫檢索器
retriever = vectorstore.as_retriever()

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)
# what is Composition API？
question = "what is vue?"
result = qa_chain.invoke({"query": question})

# output
# I think I know this one! Based on the context, 
# Vue is a JavaScript framework for building user interfaces 
# that builds on top of standard HTML, CSS, and JavaScript. 
# It provides a declarative way to use Vue primarily in 
# low-complexity scenarios or for building full applications with 
# Composition API + Single-File Components.

如果我問的問題與文檔無關它的回答是怎樣呢？

question = "what is react?"
result = qa_chain.invoke({"query": question})

最終執行後輸出了I don't know.。

構建用戶界面

Gradio是一個用於構建交互式機器學習界面的Python庫。Gradio使用非常簡單。你只需要定義一個有輸入和輸出的函數，然後Gradio將自動爲你生成一個界面。用戶可以在界面中輸入數據，然後觀察模型的輸出結果。

整合上述代碼，構建可交互的 UI：

import gradio as gr
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

def init_ollama_llm(model, temperature, top_p):
    return Ollama(model=model,
                  temperature=temperature,
                  top_p=top_p,
                  callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
                  )

def content_web(url):
    loader = WebBaseLoader(
        web_paths=(url,),
    )
    docs = loader.load()
    # chunk_overlap：分塊的重疊部分,重疊有助於降低將語句與與其相關的重要上下文分開的可能性，
    # 設置了chunk_overlap效果會更好
    # 合理的分詞會提高RAG的效果
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    return splits

def chroma_retriever_store_content(splits):
    # 基於ollama運行嵌入模型 nomic-embed-text ：A high-performing open embedding model with a large token context window.
    vectorstore = Chroma.from_documents(documents=splits,
                                        embedding=OllamaEmbeddings(model="nomic-embed-text"))
    return vectorstore.as_retriever()

def rag_prompt():
    return PromptTemplate(
        input_variables=['context', 'question'],
        template=
        """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the 
        question. you don't know the answer, just say you don't know 
        without any explanation Question: {question} Context: {context} Answer:""",
    )

def ollama_rag_chroma_web_content(web_url, question,temperature,top_p):
    llm = init_ollama_llm('llama3', temperature, top_p)
    splits = content_web(web_url)
    retriever = chroma_retriever_store_content(splits)
    qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={"prompt": rag_prompt()})
    return qa_chain.invoke({"query": question})["result"]

demo = gr.Interface(
    fn=ollama_rag_chroma_web_content,
    inputs=[gr.Textbox(label="web_url",value="https://vuejs.org/guide/introduction.html",info="爬取內容的網頁地址"),
            "text",
            gr.Slider(0, 1,step=0.1),
            gr.Slider(0, 1,step=0.1)],
    outputs="text",
    title="Ollama+RAG Example",
    description="輸入網頁的URL，然後提問, 獲取答案"
)

demo.launch()

運行後會輸出網頁地址Running on local URL: http://127.0.0.1:7860, 打開後效果如下：

參考

https://github.com/ollama/ollama

https://python.langchain.com/

https://partee.io/2022/08/11/vector-embeddings/

https://jalammar.github.io/illustrated-word2vec/

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/N-h4TntVsnhNLjP1NpxVKQ

概念介紹

開始搭建

構建用戶界面

參考

猜你喜歡