Setting Up Ollama with FileMind for Private Q&A

FileMind’s “Ask My Library” feature uses a local AI model to answer questions about your research papers — with citations. The setup wizard handles most of this automatically, but it helps to understand what’s happening under the hood.

What is Ollama?

Ollama is a tool for running large language models locally. It manages model downloads, provides a simple REST API, and handles GPU/CPU inference. FileMind uses Ollama to run the language model that powers semantic search and Q&A.

Automatic setup

When you first launch FileMind, the guided setup wizard:

Detects whether Ollama is already installed
Downloads and runs the Ollama installer if needed
Recommends a model based on your available RAM and GPU
Downloads the model (~2–5 GB depending on model size)
Runs a test query to verify everything works

For most users, you don’t need to do anything manually.

Manual setup

If you prefer to install Ollama yourself, download it from ollama.ai and run:

ollama pull llama3.2:3b

Then launch FileMind and it will automatically detect the running Ollama instance.

Choosing a model

FileMind recommends models based on your hardware. For most researchers, llama3.2:3b (2 GB) works well on 8 GB RAM machines. If you have 16 GB+ RAM or a GPU, llama3.1:8b gives noticeably better Q&A quality.