Configuration

FileMind stores its configuration in a TOML file. All settings have sensible defaults — most users won't need to change anything beyond what the setup wizard configures.

Config File Location

Platform	Path
Windows	`%APPDATA%\FileMind\config.toml`
macOS	`~/Library/Application Support/FileMind/config.toml`

You can also specify a custom path with the --config flag.

Environment Variable Overrides

Any setting can be overridden with an environment variable using the FILEMIND_ prefix and double underscores for nesting:

FILEMIND_LLM__PROVIDER=anthropic
FILEMIND_LLM__API_KEY=sk-ant-...
FILEMIND_RENAME__AUTO_APPROVE_THRESHOLD=0.8

Database

[db]
path = ""  # Empty = auto-detect app data directory

PDF Ingestion

[ingestion]
metadata_pages = 2          # Pages to extract for metadata (1-10)
text_quality_min_chars = 400  # Min chars for good text quality (50-5000)
text_quality_min_alpha = 0.55 # Min alphabetic character ratio (0.1-0.95)
text_quality_max_replacement = 0.01  # Max replacement character ratio

OCR

[ocr]
provider = "paddleocr"      # "paddleocr", "null", or "external"
render_dpi = 300            # Page render DPI for OCR (72-600)
paddleocr_lang = "en"       # Language code
paddleocr_use_gpu = false   # Enable GPU acceleration
paddleocr_use_angle_cls = false  # Enable angle classification
paddleocr_preprocess = true # Enable image preprocessing

Embeddings

[embeddings]
provider = "sentence-transformers"
model = "all-MiniLM-L6-v2"  # Downloaded automatically on first use
dims = 384                   # Must match model output dimensions
batch_size = 32              # Chunks per embedding batch

Language Model

For end-to-end account and API-key setup for each provider, see the Model Providers guide.

[llm]
provider = "ollama"         # "ollama", "llamacpp", "openai", "anthropic", "gemini"
model = "mistral"           # Model name or path
base_url = ""               # API base URL override (optional)
api_key = ""                # API key for cloud providers
temperature = 0.1           # Sampling temperature (0.0-2.0)
max_retries = 2             # Retry count on parse failure
timeout_seconds = 120       # Per-request timeout (5-600)

For cloud providers, use the provider-specific key fields:

[llm]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
anthropic_api_key = "sk-ant-..."

# Or for OpenAI:
provider = "openai"
model = "gpt-4o"
api_key = "sk-..."

# Or for Gemini:
provider = "gemini"
model = "gemini-2.0-flash"
gemini_api_key = "..."

OpenAlex Integration

[openalex]
enabled = true              # Enable title matching against OpenAlex
api_key = ""                # Optional API key
mailto = ""                 # Contact email (recommended for higher rate limits)
timeout_seconds = 10        # API timeout (1-60)
title_match_threshold = 0.85 # Minimum match confidence (0.5-1.0)

Rename Settings

[rename]
template = "default"              # Template style
title_max_chars = 50              # Max title portion in filename (20-200)
filename_max_chars = 160          # Max total filename length (60-255)
auto_approve_threshold = 0.75     # Confidence for auto-approve
propose_threshold = 0.55          # Minimum confidence to propose

The auto_approve_threshold must be greater than propose_threshold.

Search & RAG

[search]
default_mode = "hybrid"           # "hybrid", "fts", or "semantic"
default_limit = 40                # Default results per query
semantic_min_score = 0.15         # Minimum embedding similarity
rag_top_k = 12                    # Chunks to retrieve for RAG
rag_min_score = 0.3               # Minimum chunk score for RAG
reranker_enabled = true           # Enable cross-encoder reranking
reranker_model = "cross-encoder/ms-marco-MiniLM-L-6-v2"

Background Jobs

[jobs]
max_concurrent = 2          # Max parallel jobs (1-8)
scan_batch_size = 50        # Files per scan batch
embed_batch_size = 32       # Chunks per embedding batch

Service

[service]
host = "127.0.0.1"         # Always localhost for security
port = 0                    # 0 = auto-select free port
workers = 1                 # Uvicorn workers
log_level = "info"          # "debug", "info", "warning", "error"

Zotero

See Export & Integration for Zotero configuration.