Building MiniCPM Financial RAG: Financial Document Intelligence with Small Models

Community Article Published June 15, 2026

🎬 Demo Video https://youtu.be/0z1i5ESbgYk

🚀 Live Demo https://huggingface.co/spaces/build-small-hackathon/MiniCPM_Financial_RAG

🐦 Social Post https://x.com/gajanand2004/status/2066422082725163265


Introduction

Financial reports, insurance documents, annual reports, SEC filings, balance sheets, and investment documents often contain hundreds of pages of information.

Finding specific financial insights manually is time-consuming, repetitive, and prone to errors.

I wanted to build a system that allows users to upload a financial document and interact with it using natural language. Instead of searching through pages of reports, users can simply ask questions and receive grounded answers directly from the uploaded document.

The result is MiniCPM Financial RAG, a lightweight Financial Document Intelligence platform powered by Retrieval-Augmented Generation.


ChatGPT Image Jun 14, 2026, 06_29_10 PM

The Problem

Financial professionals, investors, researchers, and students frequently work with lengthy documents.

Typical questions include:

  • What is the company's total revenue?
  • What is the net income for this period?
  • What liabilities are reported?
  • What are the major risk factors?
  • What is the operating cash flow?
  • Summarize the financial outlook.

Answering these questions manually requires significant time and effort.

A more efficient approach is to allow AI to retrieve the relevant information and generate answers grounded in the document itself.


What MiniCPM Financial RAG Does

The application allows users to upload financial PDF documents and ask questions in natural language.

The system automatically:

  1. Extracts text from the PDF
  2. Splits content into meaningful chunks
  3. Creates vector embeddings
  4. Stores embeddings in FAISS
  5. Retrieves the most relevant information
  6. Generates context-aware answers

This transforms static financial reports into an interactive question-answering experience.


Powered by Small Models

One of the goals of this project was to demonstrate the capabilities of compact AI models.

The application uses:

MiniCPM-2B-128K

A lightweight language model used for:

  • Financial reasoning
  • Question answering
  • Long-context understanding

MiniCPM-Embedding-Light

Used for:

  • Embedding generation
  • Semantic retrieval
  • Vector similarity search

Despite their compact size, these models provide strong performance for real-world document intelligence tasks.


Why Retrieval-Augmented Generation?

Traditional language models may generate answers that are not supported by source documents.

Retrieval-Augmented Generation solves this problem by first retrieving relevant document sections and then generating answers using only the retrieved context.

Benefits include:

  • Higher factual accuracy
  • Reduced hallucinations
  • Better transparency
  • Improved reliability for financial analysis

Knowledge Pipeline

The complete workflow follows a Retrieval-Augmented Generation pipeline:

PDF Upload
     ↓
Text Extraction
     ↓
Chunking
     ↓
Embeddings
     ↓
FAISS Storage
     ↓
Similarity Search
     ↓
MiniCPM Answer Generation

Each stage contributes to producing accurate and context-grounded answers.


Building the System

The frontend was built using Gradio and deployed on Hugging Face Spaces.

Backend inference runs on Modal, enabling scalable model execution.

The retrieval pipeline uses:

  • LangChain
  • FAISS
  • MiniCPM Embeddings
  • PyPDFLoader

The language model receives only the most relevant retrieved chunks, reducing token usage and improving response quality.


Technical Architecture

Hugging Face Spaces
        │
        ▼
   Gradio Frontend
        │
        ▼
    Modal Backend
        │
 ┌──────┴──────┐
 ▼             ▼

MiniCPM QA   FAISS Retrieval

This architecture separates user interaction, retrieval, and generation while keeping the system lightweight and efficient.


Challenges

One challenge was balancing retrieval quality and answer accuracy.

Retrieving too little information can miss important details, while retrieving too much information increases noise.

Another challenge was ensuring that answers remain grounded in the uploaded document instead of relying on model assumptions.

Careful chunking and retrieval strategies were important for achieving reliable results.


What I Learned

This project reinforced an important lesson:

Small models become significantly more powerful when combined with retrieval systems.

Instead of depending entirely on model size, system design, retrieval quality, and document grounding play a major role in overall performance.

A well-designed RAG pipeline can often outperform larger models that lack access to relevant context.


Why Small Models Matter

Small models offer several advantages:

  • Faster inference
  • Lower deployment costs
  • Reduced hardware requirements
  • Easier experimentation
  • Greater accessibility

MiniCPM Financial RAG demonstrates how compact open-source models can solve real-world business and financial problems efficiently.


Target Users

This application can help:

  • Financial Analysts
  • Investors
  • Accountants
  • Auditors
  • Researchers
  • Students

Anyone working with financial documents can benefit from faster information retrieval and natural language interaction.


Conclusion

MiniCPM Financial RAG transforms financial documents into an intelligent conversational system.

By combining MiniCPM models, FAISS retrieval, LangChain, Modal, and Hugging Face Spaces, the project delivers efficient and context-aware financial question answering while remaining lightweight and accessible.

The project demonstrates that small models, when combined with retrieval and thoughtful system design, can provide practical solutions to real-world document intelligence challenges.


Links

🎬 Demo Video https://youtu.be/0z1i5ESbgYk

🚀 Live Demo https://huggingface.co/spaces/build-small-hackathon/MiniCPM_Financial_RAG

🐦 Social Post https://x.com/gajanand2004/status/2066422082725163265

Community

Sign up or log in to comment