Why Your LLM Shouldn't Name Your Files
A common approach for AI-powered file renaming is to ask an LLM to “read this PDF and suggest a filename.” It’s simple to implement and the results often look plausible.
But plausible isn’t good enough when you’re renaming real files.
The hallucination problem
LLMs are trained to produce fluent, coherent text. When asked to extract a paper title, a model will produce something that looks like a paper title — even if it’s wrong, partially fabricated, or just a confident guess based on the abstract’s vocabulary.
For a chatbot, a slightly wrong answer is a minor inconvenience. For a filename, a wrong answer corrupts your library. You end up with Smith_2019_SomeThingTheLLMInvented.pdf and you’ll never find that paper again by searching.
What FileMind does instead
FileMind uses a deterministic extraction pipeline:
- DOI detection — find the DOI in the PDF, look it up in CrossRef
- arXiv ID detection — if present, fetch from the arXiv API
- Heuristic parsing — title from the largest text on page 1, authors from the byline, year from the copyright notice
- LLM fallback — only for papers where all three methods fail, and only to extract structured metadata (never to invent it)
Every extracted field includes a confidence score and the exact text snippet it came from. You can verify every rename before it happens.
The template is boring on purpose
FileMind uses Author_Year_Title.pdf. No creativity. No summarization. No “improved” version of the title.
Boring filenames are findable filenames.