This blog is a simple, practical guide for building your own smart chatbot that can answer questions using your private documents. It shows how to connect your data in AWS Bedrock with a FastAPI app written in Python, so the chatbot can search, find, and explain answers clearly — just like a well-informed helper. You’ll learn how to run it either on your computer or inside a Docker container, making it easy to use and share anywhere. In short, it’s like teaching your computer to “read your files” and talk back with real answers, step by step.
This comprehensive runbook will guide you through building a complete, end-to-end RAG application. The final solution will include:
- A streaming FastAPI backend that provides in-text citations.
- An interactive Streamlit chatbot interface for users.
- Integration with AWS Bedrock Knowledge Base for efficient document retrieval.
- Instructions for running the backend with Docker or locally.
Project Structure#
Your final project directory (rag_chatbot_api/
) will contain the following files:
rag_chatbot_api/
├── main.py # The FastAPI backend application
├── chatbot_ui.py # The Streamlit frontend application
├── environment.yml # Conda dependencies for both apps
├── litellm_config.yaml # Configuration for the LLM proxy
├── Dockerfile # Instructions to containerize the FastAPI backend
└── .env # Environment variables for configuration
Step 1: Prerequisites - Your Bedrock Knowledge Base#
This is the foundation. You must have a Bedrock Knowledge Base (KB) set up and synced with your documents in S3.
Your Action Item:
Navigate to the AWS Console -> Bedrock -> Knowledge bases. Create your KB if you haven’t already. Once it’s ready, find and copy the Knowledge base ID (e.g., ABC123XYZ
).
Step 2: Set up the LiteLLM Proxy#
To start using Litellm, run the following commands in a shell:
# Get the code
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/prometheus.yml
# Add the master key - you can change this after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt your LLM API Key credentials
# We recommend - https://1password.com/password-generator/
# password generator to get a random hash for litellm salt key
echo 'LITELLM_SALT_KEY="sk-1234"' >> .env
source .env
# Start
docker compose up
For further details, please refer to litellm docs
Step 3: Prepare the FastAPI Application & Dependencies#
Here we set up the project files for both the backend and the new Streamlit frontend.
Create Project Directory:
mkdir rag_chatbot_api cd rag_chatbot_api
Define Conda Environment (
environment.yml
): This file now includes dependencies for both FastAPI and Streamlit.name: rag-env channels: - conda-forge dependencies: - python=3.9 - fastapi - uvicorn-standard - pydantic - boto3 - openai - streamlit # Added for the UI - requests # Added for the UI to call the backend
Write the Backend Code (
main.py
): This code remains unchanged from the previous version. It is the streaming heart of our application.# main.py import os import boto3 from fastapi import FastAPI, HTTPException from pydantic import BaseModel from openai import AsyncOpenAI from fastapi.responses import StreamingResponse # ... (The entire main.py code, including Pydantic models, create_prompt function, and the API endpoint, is identical to the previous version) ... # --- Configuration & Clients --- AWS_REGION = os.getenv("AWS_REGION", "us-east-1") BEDROCK_KB_ID = os.getenv("BEDROCK_KB_ID") # Client for Bedrock Knowledge Base bedrock_agent_client = boto3.client("bedrock-agent-runtime", region_name=AWS_REGION) # Client for LiteLLM Proxy (OpenAI SDK compatible) litellm_client = AsyncOpenAI( base_url=os.getenv("LITELLM_PROXY_URL"), api_key=os.getenv("LITELLM_API_KEY"), ) app = FastAPI(title="Streaming Bedrock KB RAG API") class QueryRequest(BaseModel): query: str model: str = "claude-v2-instruct" # The alias from our LiteLLM config numberOfResults: int = 5 def create_prompt(query: str, retrieval_results: list) -> str: source_map = {} context_for_llm = "" citation_counter = 1 for result in retrieval_results: uri = result.get('location', {}).get('s3Location', {}).get('uri', 'Unknown Source') if uri not in source_map: source_map[uri] = citation_counter citation_counter += 1 citation_num = source_map[uri] text_chunk = result['content']['text'] context_for_llm += f"Source [{citation_num}]: \"{text_chunk}\"\n---\n" citation_list = "\n".join([f"[{num}] {uri}" for uri, num in sorted(source_map.items(), key=lambda item: item[1])]) return ( "You are a helpful assistant. Your task is to answer the user's question based *only* on the provided sources. " "Follow these instructions exactly:\n" "1. Synthesize a comprehensive answer using information from the sources.\n" "2. For every piece of information you use, you MUST cite the source by appending the corresponding source number in brackets, like `[1]`, `[2]`, etc.\n" "3. After you have finished the answer, list all the sources you used under a `Sources:` heading. Use the exact mapping provided below.\n" "4. Do not include any information that is not from the provided sources.\n\n" f"---SOURCES---\n{context_for_llm}\n" f"---SOURCE MAPPING---\n{citation_list}\n\n" f"---USER QUESTION---\n{query}\n\n" "ANSWER:" ) @app.post("/document-chat-stream") async def document_chat_stream(request: QueryRequest): if not BEDROCK_KB_ID: raise HTTPException(status_code=500, detail="BEDROCK_KB_ID not configured.") try: response = bedrock_agent_client.retrieve( knowledgeBaseId=BEDROCK_KB_ID, retrievalQuery={'text': request.query}, retrievalConfiguration={'vectorSearchConfiguration': {'numberOfResults': request.numberOfResults}} ) retrieval_results = response.get('retrievalResults', []) except Exception as e: raise HTTPException(status_code=500, detail=f"Error retrieving from Bedrock KB: {e}") prompt = create_prompt(request.query, retrieval_results) async def stream_generator(): try: stream = await litellm_client.chat.completions.create( model=request.model, messages=[{"role": "user", "content": prompt}], stream=True ) async for chunk in stream: content = chunk.choices[0].delta.content if content: yield content except Exception as e: yield f"\n\nERROR: Could not get response from LLM. Details: {str(e)}" return StreamingResponse(stream_generator(), media_type="text/plain")
Step 4: Create the Streamlit Chatbot Client#
Now, let’s create the user-facing application.
Create the Frontend Code (chatbot_ui.py
):
This script uses Streamlit’s chat components and the requests
library to communicate with our FastAPI backend.
# chatbot_ui.py
import streamlit as st
import requests
import json
# --- Page Configuration ---
st.set_page_config(
page_title="RAG Chatbot",
page_icon="🤖",
layout="wide"
)
st.title("RAG Chatbot with Citations 🤖")
# --- Constants ---
FASTAPI_URL = "http://localhost:8000/document-chat-stream"
# --- Session State Initialization ---
# Ensures that the message history is preserved across reruns
if "messages" not in st.session_state:
st.session_state.messages = []
# --- Display Chat History ---
# Renders the existing chat messages
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# --- Main Chat Interaction Logic ---
if prompt := st.chat_input("Ask me anything about your documents..."):
# Add user message to session state and display it
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
# Display assistant response in a streaming fashion
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
try:
# Prepare the data for the POST request
data = {"query": prompt}
# Use requests to stream the response from the FastAPI endpoint
with requests.post(FASTAPI_URL, json=data, stream=True) as r:
r.raise_for_status() # Raise an exception for bad status codes
for chunk in r.iter_content(chunk_size=None, decode_unicode=True):
full_response += chunk
message_placeholder.markdown(full_response + "▌") # "▌" gives a typing cursor effect
message_placeholder.markdown(full_response)
except requests.exceptions.RequestException as e:
st.error(f"Could not connect to the backend: {e}")
full_response = "Sorry, I couldn't connect to the processing service."
except Exception as e:
st.error(f"An unexpected error occurred: {e}")
full_response = "An unexpected error occurred."
# Add the final, complete assistant response to the session state
st.session_state.messages.append({"role": "assistant", "content": full_response})
Step 5: Run the Backend (Choose One Option)#
You need to run the FastAPI backend so the Streamlit UI can talk to it. Open your second terminal for this.
Option A: Run with Docker (Recommended for Deployment) 🐳#
Create the
Dockerfile
: (Same as before)FROM continuumio/miniconda3 WORKDIR /app COPY environment.yml . RUN conda env create -f environment.yml COPY main.py . EXPOSE 8000 CMD ["conda", "run", "-n", "rag-env", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Create the
.env
file for Docker:# .env file AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY BEDROCK_KB_ID=YOUR_KNOWLEDGE_BASE_ID LITELLM_PROXY_URL=http://host.docker.internal:4000 LITELLM_API_KEY=sk-1234
Build and Run the Container (Terminal 2):
docker build -t rag-chatbot-api . docker run --env-file .env -p 8000:8000 rag-chatbot-api
Option B: Run Locally (For Development) 💻#
Create & Activate Conda Environment (Terminal 2):
conda env create -f environment.yml conda activate rag-env
Set Local Environment Variables (Terminal 2):
- PowerShell (Windows):
$env:LITELLM_PROXY_URL="http://localhost:4000" # ... set other AWS and Bedrock variables ...
- Bash (macOS/Linux):
export LITELLM_PROXY_URL="http://localhost:4000" # ... set other AWS and Bedrock variables ...
- PowerShell (Windows):
Run the FastAPI Server (Terminal 2):
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Step 6: Run the Streamlit UI#
Finally, let’s launch the user interface.
Open a Third Terminal (Terminal 3).
Activate the Conda Environment:
conda activate rag-env
Run the Streamlit App:
streamlit run chatbot_ui.py
Open Your Browser: Streamlit will provide a URL, typically
http://localhost:8501
. Open this link to start chatting with your documents!