ChatPDF: Chat with your PDF

ChatPDF: Chat with your PDF

Revolutionize Your PDF Interaction with ChatPDF: The Magical Question-Answering System (Proof of Concept)

Play this article


git clone

git clone
  1. Python 3.6 or later

  2. OpenAI API Key

  3. LangChain library

  4. Faiss VectorDB


Create a virtual environment

python3 -m venv venv

Activate the virtual environment

source venv/bin/activate

Install packages

pip install -r requirements.txt


  • Make sure you have an OpenAI API key. You can get one by signing up for OpenAI at

  • Load your OpenAI API key in a .env file in the root directory of your project using the following format:

  • Replace the file path in loader with the path to the PDF document i.e loader = PyPDFLoader("data/resume.pdf") The pdf will be used for the question-answering system.

  • Run the script and input a question to get an answer from the PDF document.


How it works

  • Import necessary libraries and load the OpenAI API key from a .env file.
import os
import openai
from dotenv import load_dotenv

openai.api_key = os.getenv("OPENAI_API_KEY")
  • Import necessary classes from the LangChain library, including PyPDFLoader, OpenAIEmbeddings, FAISS, OpenAI, and RetrievalQA These classes are used to load a PDF document, convert the text into embeddings, create a vector store, and set up the question-answering model.
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

from langchain import OpenAI
from langchain.chains import RetrievalQA
  • These lines create an instance of PyPDFLoader to load a PDF document, split it into pages, create an instance of OpenAIEmbeddings to convert the text into embeddings, create an instance of FAISS to create a vector store, and create an instance of RetrievalQA to set up the question-answering model.
loader = PyPDFLoader("data/resume.pdf")
pages = loader.load_and_split()

embeddings = OpenAIEmbeddings()

index = FAISS.from_documents(pages, embeddings)
qa = RetrievalQA.from_chain_type(
  • These lines prompt the user to input a question, pass the question to the RetrievalQA model, and print the answer to the console.
query = input("Ask me anything? ")



In conclusion, this code demonstrates how to build a question-answering system for PDF documents using natural language processing and machine learning techniques. By using OpenAI's powerful language model and FAISS for efficient indexing and retrieval, we can provide users with quick and accurate answers to their questions about PDF documents.

Note: To use OpenAI's GPT-3 language model and API, you'll need an API key, which can be obtained by signing up for their API program. You should take care to keep your API key secure and not share it publicly.

If you found this project useful, please consider giving it a star on GitHub at Your support will help to encourage further development and improvements to the project.