A Quick Warning Before We Start
Before getting into the substance of this post, it’s worth being explicit about the environment this work was done in.
OpenClaw is not a secure system. I would not expose it to the internet, and I would not run it anywhere near a machine that held sensitive data. This experiment was conducted on an isolated Linux box that is more than ten years old, deliberately segmented away from anything that mattered. That isolation was intentional, and I would consider it a prerequisite rather than a nice-to-have.
With that caveat out of the way, here’s what I learned.
The more technical information is this post comes from AI allowing me to cosplay as someone with a much deeper background in this field. I haven’t coded since I was a teenager running a BBS on my Apple //GS. Everything described here was implemented by directing AI tools — primarily Claude — with research, validation, and conceptual framing done through ChatGPT.
This was not a case of me dusting off dormant engineering skills. It was an exercise in seeing how far careful prompting, iteration, and architecture could go without writing code myself.
I started with what seemed like a reasonable question: could an AI agent take an RPG PDF and convert it into a usable Fantasy Grounds VTT reference manual? Fantasy Grounds is the tool I use to run RPG with my friends and I very often have to get adventures into the program.
The test case was a Mothership RPG adventure. Not particularly long, but representative of the kind of layout that makes RPG books pleasant to read and painful to process. Multi-column text, sidebars, boxed callouts, tables, and frequent typography changes all coexist on the same page. Humans have no trouble with this. Machines very much do.
The first thing that became obvious is that PDFs do not contain “text” in the way we usually think about it. They contain positioned glyphs. Reading order, paragraph structure, and emphasis are all emergent properties created by the human brain. When you extract text naively, you get all the words, but not the story they were meant to tell.
Standard PDF extraction tools did exactly what they are designed to do. They gave me the words. They just gave them to me in the wrong order. Columns were interleaved, paragraphs were broken every line, sidebars merged into body text, and tables disintegrated into streams of numbers and labels with no structure left intact.
At that point, the obvious temptation was to let the LLM “just read the PDF.” After all, large language models are very good at understanding text, right?
That approach failed in subtle but dangerous ways.
LLMs are quite good at repairing relationships when the underlying structure is mostly correct. They are far less reliable when asked to infer structure that was never presented to them in the first place. RPG books are full of ambiguous layout decisions, and when an LLM guesses, it does so confidently and silently. Sidebars get merged into rules text. Paragraphs are reordered to match genre expectations rather than author intent. The output looks clean, but it is wrong in ways that are difficult to detect later.
The approach that actually worked separated responsibilities very strictly.
First, the extraction phase focused entirely on facts. Using PyMuPDF, the system extracted every word along with its exact coordinates, font size, font face, and bounding box. The output was ugly and unreadable, but nothing was lost. Every signal a human reader subconsciously relies on was still present, just not interpreted yet.
Second came layout reconstruction. This was where most of the complexity lived. By working from geometry instead of text flow, it became possible to detect column gutters, read entire columns top-to-bottom instead of left-to-right across the page, and reconstruct paragraphs based on vertical spacing rather than newline characters. Hyphenated words could be repaired deterministically. Headings could be inferred from typography rather than guessed from phrasing.
This step also addressed the most visible problem with PDF extraction: the explosion of extra line feeds. Those line breaks are not semantic. They are artifacts of line wrapping. Once reading order and paragraph boundaries are reconstructed using spacing and font metrics, most of those spurious line breaks disappear before an LLM ever gets involved.
Only after that cleanup did the LLM enter the process, and even then its role was constrained. It was allowed to repair flow and normalize text, but not invent structure, reorder content, or generate XML. Markers for headings, emphasis, sidebars, and tables were preserved explicitly so the model could not “helpfully” smooth them away.
The final stage — generating Fantasy Grounds XML — was deliberately scripted and deterministic. Fantasy Grounds is unforgiving, and rightly so. IDs, tags, ordering, and escaping are not things you want a language model remembering across thousands of tokens. Once the content was clean and correctly ordered, turning it into XML was a mechanical problem, not an AI problem.
Tables turned out to be the most treacherous area. Early attempts to detect them aggressively led to false positives where multi-column prose or credits pages were misclassified as tables. The safer approach was to be conservative to the point of under-detection, preserving text as text unless the evidence was overwhelming. A less-perfect table is preferable to corrupted rules text.
One of the more humbling realizations came late in the process. The PDF had no table of contents, but it did have bookmarks. Those bookmarks reflected the author’s actual organizational intent far better than anything inferred from layout alone. Once the pipeline followed them, chunking and navigation improved immediately. It was a reminder that many “AI problems” are really failures to leverage existing metadata.
From a CFO perspective, the conclusion is straightforward.
This took longer than doing it manually. Considerably longer.
I spent over $100 running smaller, simpler test cases just to understand where tools failed, where token usage exploded, and how errors manifested. That spend did not produce output. It produced learning. For a single RPG module, this is not rational. Manual conversion would have been faster and cheaper.
Where this starts to make sense is repetition. Multiple modules. Consistent layouts. A reusable pipeline. At that point, the upfront investment begins to amortize.
There is also a broader organizational lesson here. Running this through a rough, developer-oriented agent on an isolated machine worked, but it was far from ideal. A more user-friendly agent, or involvement from an IT team, would have reduced iteration time, lowered token waste, and improved safety. There is real value in tooling and support, even — perhaps especially — when AI is involved.
This was a successful experiment, but not an efficient one. I would do it again only if I planned to do it many times. As with so many automation efforts, the real question is not whether it can be done, but whether you are willing to do it often enough for the investment to pay off.
Sometimes the most useful output isn’t the finished product. It’s understanding where the break-even point actually lies. And what you learn along the way.
I ran OpenClaw on a very old HP ProLiant MicroServer (Gen 7) with a Turion II processor. I blocked incoming access using standard Linux hardening, isolated it on its own VLAN, and did not install any additional skills or allow the agent to browse the internet. That was intentional, given the known security risks around prompt injection attacks and malware delivered through unscreened skills.
All files used by OpenClaw lived only on that machine, and it had no access to my other systems. OpenClaw itself can run on fairly low-end hardware, since the heavy lifting is done in the cloud. If you want a more robust — and still relatively inexpensive — platform to run it on, with the added benefit of access to the Apple ecosystem, many people use Mac minis.
If you want to find OpenClaw (the speed of the internet moves fast and months from now this may not be the hot new tool), look here:
OpenClaw AI Bot


