If You Have a Hammer, Not Everything Is a Nail: Why We Don't Use LLMs for Everything

I was recently quoted at the Generative AI and Agentic AI Summit in San Jose saying that "the biggest problem that we're working with in AI right now" is the misguided idea that everything needs to be processed by a large language model. I want to expand on what I meant, because this isn't an anti-AI take. It's a pro-engineering one.

The AI industry is in a phase where LLMs are being treated like a universal solvent. Got a document? Throw it at an LLM. Need to extract data? LLM. Want to classify something? LLM. Need to move data between steps in a pipeline? Believe it or not, LLM.

The result? You give all of your tokens and all of your money to a bot that just wastes millions and millions of tokens. I've seen it. We've all seen it. And at Meibel, we've deliberately built our platform to avoid it.

The Right Tool for the Right Job

At Meibel, our document processing pipeline doesn't start with an LLM. It starts with document conversion, turning raw documents into structured representations using purpose-built layout detection models, table structure recognition, OCR engines, and reading order algorithms. These are specialized models designed for their specific tasks. A layout detection model doesn't need to "reason." It needs to identify regions on a page fast and accurately. OCR doesn't need chain-of-thought prompting. It needs to recognize characters.

When we encounter images within documents, we use vision models to classify them. Is this a chart, a table, a photograph, a flowchart? Each classification triggers a different extraction pathway. OCR plus vision models for plots. OCR plus LLM for table schema inference. Semantic image models for generating descriptions of photographs. The LLM enters the picture only where generative reasoning actually adds value, like inferring a schema from messy table data or producing a structured extraction from parsed content.

This isn't a philosophical preference. It's an engineering decision backed by data. We actively identify where cheaper, faster models suffice: fields with 99%+ accuracy on smaller models, document types that don't require advanced reasoning, and batch versus real-time processing tradeoffs. Why burn tokens on a frontier model when a purpose-built classifier does it better for a fraction of the cost?

STAG: Structure Augmented Generation

Here's where the industry's LLM-for-everything mindset creates the most waste.

Most AI platforms pick one retrieval strategy and force all data through it. Usually that means RAG: Retrieval Augmented Generation. You take documents, chunk them into text, embed those chunks as vectors, store them in a vector database, and when someone asks a question, you find the most semantically similar chunks and feed them to an LLM for synthesis.

RAG is powerful for unstructured text. Contracts, reports, policy documents, research papers. It's the right tool when you're asking questions like "What does this document say about X?" or "Summarize the risk factors across these filings."

But here's the problem: not all data is unstructured text, and not all questions are about semantic similarity.

A huge amount of enterprise data lives in tables. Spreadsheets, CSVs, database exports, structured reports. And documents don't exist in isolation. They cite each other, reference each other, and form networks of relationships that matter for understanding the full picture.

The industry's instinct is to throw all of it into RAG. Embed everything. Vectorize everything. Hope the semantic similarity math works out. It doesn't. Not well, anyway.

This is why we built STAG: Structure Augmented Generation. STAG is not a replacement for RAG. It's the framework that contains RAG and goes beyond it. STAG routes each type of data to the retrieval system built for that data's access pattern. It has three pillars:

RAG (Retrieval Augmented Generation) handles unstructured text and image-derived content. Text gets chunked, embedded as vectors, and stored for semantic search. Images of photographs get converted to text descriptions and embedded the same way. When a user asks a question about what documents say, RAG finds the most relevant passages and feeds them to an LLM for synthesis. This is the right tool for meaning, topics, and concepts.

TAG (Table Augmented Generation) handles structured data. Instead of embedding structured data as text chunks, TAG transforms diverse data sources into clean, queryable database tables with preserved schemas, column types, and relationships. The output isn't vectors in a search index. It's actual structured tables in a real analytical database. When a customer asks "What were our top 10 products by revenue last quarter?" that's not a semantic similarity question. That's a SQL query. RAG would try to find text chunks that are semantically close to "top 10 products by revenue" and hope the right numbers happen to be in those chunks. TAG writes a precise SQL query against a real table and gets the exact answer. The LLM's role in TAG is to translate the natural language question into a formal query, not to be the database.

Graph Augmented Generation handles relationships and provenance. Documents cite other documents. Reports reference figures and tables. Sections cross-reference other sections. A flat list of text chunks can't represent any of that. Graph augmented generation builds a reference and citation graph where documents are nodes and references are edges. During retrieval, the system doesn't just find semantically similar chunks. It also traverses the graph to find documents that are cited by or related to those chunks, expanding the context with material the user might not have thought to ask about but that is directly relevant. The graph also tracks provenance: where every piece of extracted data came from, which source document it was part of, and how it was derived.

The key insight is that these three systems work together. A single PDF might contain natural language, tables, and images. STAG decomposes that document into its component parts. The natural language flows into RAG. The tables flow into TAG. The relationships between all of it get tracked in the graph. Every component maintains full lineage back to its source document.

Why TAG Matters on Its Own

The TAG pillar of STAG deserves special attention because it's where the industry burns the most tokens for the least value.

LLMs are good at understanding intent and bridging the gap between human language and formal systems. They are not good at storing data, searching data, aggregating data, or doing math. When you use RAG for structured data, you're asking the LLM to do all of those things. And it's mediocre at every one of them.

TAG processes data through a pipeline: preparation, logical group discovery, parallel ingestion, and database consolidation. It handles multiple file formats (CSV, JSON, Parquet) with automatic detection. Schema detection identifies column names, data types, key relationships, and data distributions. Files with matching schemas get automatically grouped and consolidated into unified tables.

The system uses ML-based similarity analysis and selective LLM validation to recognize when files with different naming conventions, like sales_2024_01.csv and january_revenue_2024.json, actually contain the same logical data. It understands that "sales" and "revenue" might represent the same business concept. But that understanding comes from embeddings and similarity math, with the LLM brought in only to validate grouping decisions, not to do the grouping itself.

Why Graph Augmented Generation Matters on Its Own

The graph pillar fills a gap that neither RAG nor TAG can address: the relationships between documents.

During ingestion, the system analyzes documents for in-text citations, bibliography sections, cross-references to other documents or figures or tables, and hyperlinks. Each document generates a set of possible ways it might be referenced by other documents. Then the system matches extracted references against those possible citation forms to build edges in the graph.

At retrieval time, the graph powers a document expansion process. The current RAG implementation embeds the user query as a vector, performs a nearest-neighbor search within the corpus, and feeds those neighbor chunks into the context window. The graph wraps that process with an additional search that also finds documents connected by reference relationships. If a user asks about a topic and the best-matching chunk cites three other documents, those cited documents get pulled into the context too.

The graph also tracks document provenance. When a PDF gets decomposed into text, tables, and images, each derivative file maintains edges back to its parent. When you retrieve a chunk, you can trace it all the way back to the original source document, the page it came from, and the extraction method used. This makes the system auditable in ways that flat vector search simply cannot be.

Exporting and Reusing Structure: The Real Unlock

STAG's power goes beyond querying. The real unlock is what happens to the structure itself.

When Meibel processes documents, we don't just extract information. We export and preserve the structure. A PDF containing natural language, tables, and images gets decomposed into its component parts. Every file, regardless of type, enters a decomposition pipeline. A file is atomic if and only if the decomposition process produces zero derivative files. A PDF with text, tables, and images produces derivatives. A plain CSV produces none. The pattern is consistent: every file enters the pipeline, and the difference is just what the decomposer finds.

The structured data that comes out of this process isn't a one-time extraction. It becomes a reusable, compounding asset. Each new document enriches the existing tables, the existing vector store, and the existing graph. Augmentation layers can add sentiment analysis, topic modeling, and feature generation on top of the raw data. But these augmentations run as batch processes on structured tables, not as LLM calls on individual documents. The difference in cost and reliability is orders of magnitude.

Consider a company with millions of documents in a backlog. With a naive LLM-for-everything approach, every question about those documents requires re-reading and re-processing them. With STAG, those documents get extracted once into structured tables, vector stores, and relationship graphs. Every subsequent question is a database query, a vector search, or a graph traversal. The extraction costs real money upfront, but every query after that is essentially free. Compare that to burning tokens on every single question, forever.

Where the LLM Actually Fits

Let me be precise about where LLMs add value in this kind of architecture, because this is the key point:

Schema inference. When structured data arrives in messy or inconsistent formats, the LLM helps infer and harmonize schemas. It understands that "revenue," "sales," and "income" might mean the same thing in context.

Natural language to query translation. When a user asks a question in plain English, the LLM writes the database query. It understands the schema, the column descriptions, the data types, and generates precise, executable queries.

Result interpretation. After a query executes and returns raw data, the LLM interprets the results and presents them in a human-readable format: summaries, insights, recommendations.

Metadata generation. During ingestion, the LLM generates human-readable descriptions of tables and columns that improve future query understanding.

Validation. LLMs validate automated grouping and classification decisions, confirming that what the ML models flagged as related actually belongs together.

Graph construction. When a document image depicts a flowchart, the LLM derives a graph representation. When the system needs to generate canonical citation forms for reference matching, the LLM produces the possible ways a document might be cited.

Notice what the LLM is not doing: it's not storing data, it's not searching data, it's not aggregating data, it's not doing arithmetic, and it's not traversing graphs. It's reasoning about intent and translating between human language and formal systems. That's what LLMs are genuinely great at.

The actual data processing (ingestion, schema detection, consolidation, querying, aggregation, graph traversal) is handled by purpose-built infrastructure. Each component optimized for its specific job. Vector search for semantic similarity. Analytical databases for structured queries. Graph stores for relationships and provenance.

LLMs Are Generative, Not Precise

Here's something most people building with LLMs learn the hard way: LLMs are generative, not precise copiers. When you need to move data between steps in a pipeline, say, passing the output of a database query to an anomaly detection tool, routing that data through an LLM introduces errors.

Small critical values like IDs, codes, or reference numbers? The LLM might transcribe a character wrong. A wrong customer ID silently returns someone else's data. Large result sets, hundreds of rows of query results? The LLM has to reproduce all of that data precisely, and more data means more transcription errors. And the token cost of reproducing data in a tool call is pure waste when that data already exists in memory.

At Meibel, we've architected our systems so that data flows between processing steps through deterministic infrastructure, not through generative text completion. The LLM still sees the data it needs for reasoning. It has to decide what to do next. But the actual data transfer happens outside the model. The LLM decides what to analyze; the infrastructure handles which data to analyze. No transcription. No token waste. No fidelity loss.

Automation Is Not Dead

This brings me to what might be the most misunderstood point in the current AI discourse: automation is not dead. Using LLMs to write automation is fundamentally different from LLMs being the automation.

Think about a multi-step business process: validating inventory, calculating pricing, applying discounts, processing payment, routing exceptions for human approval. That's automation. It has defined states, transitions, and rules. The agents within each step might use LLMs for reasoning and decision-making, but the orchestration itself is deterministic, durable, and auditable.

The same principle applies across all three STAG pillars. The LLM writes the SQL query for TAG. The LLM helps select relevant context for RAG. The LLM assists in building citation forms for the graph. Those are the intelligent parts. But the query execution, the vector search, the graph traversal, the result retrieval, the cost tracking, and the error handling? That's automation. Deterministic, reliable, auditable automation. The LLM helped create the automation (by translating intent to a formal query or generating a citation form), but it isn't being the automation (executing the query, managing connections, handling retries).

When you're processing millions of items, the difference between "LLM as automation" and "LLM within automation" isn't academic. It's the difference between a system with predictable costs and one that burns through tokens with no ceiling.

The Multimodal Reality

Real-world document processing is inherently multimodal. A single PDF might contain natural language, tables, images of charts, embedded flowcharts, and attached files. The idea that a single LLM call should handle all of that is not just expensive. It's architecturally unsound.

Our approach decomposes complex documents into their atomic components through a recursive pipeline. Every file enters the decomposition process. Composite files produce derivatives: text blocks become text files, tables become structured data files, images get classified and extracted, embedded attachments get pulled out and processed on their own. Each derivative file maintains provenance edges back to its parent. Atomic files pass straight through to ingestion.

Once decomposed, each component flows to the system optimized for it. Text flows into RAG pipelines optimized for semantic search. Structured data flows into TAG pipelines optimized for precise queries. Images get classified and routed to specialized extraction models. Relationships and provenance get tracked in graph structures. Each data type is stored in the system optimized for its access pattern.

This is what STAG looks like in practice. It's not one model to rule them all. It's an ecosystem of specialized capabilities (custom OCR, vision models, traditional machine learning, NLP, deterministic validation, and yes, LLMs where they genuinely add value) all coordinated by durable, observable, cost-controlled orchestration.

Be Deliberate

The companies that will win with AI are not the ones that throw the most tokens at every problem. They're the ones that are deliberate about which tasks are best suited for which tools.

An LLM is extraordinary at reasoning, generating structured output from unstructured input, translating between human language and formal systems, and making nuanced judgments. It is mediocre at copying data precisely, recognizing characters in images, detecting page layouts, executing database queries, doing arithmetic, and doing things that specialized models and purpose-built infrastructure have been doing well for years.

The question isn't "Can an LLM do this?" The answer is almost always yes. The question is "Should an LLM do this?" And the answer is often no, because there's a faster, cheaper, more reliable tool that already exists.

RAG for unstructured text. TAG for structured data. Graph augmented generation for relationships and provenance. And STAG, Structure Augmented Generation, as the framework that orchestrates all three, routing each data type to the system built for its access pattern. OCR for character recognition. Vision models for image classification. Database queries for data retrieval. Deterministic workflows for orchestration. And LLMs for the reasoning that ties it all together.

If you have a hammer, not everything is a nail. And if you have an LLM, not everything needs a prompt.

Take the First Step

Ready to start your AI journey? Contact us to learn how Meibel can help your organization harness the power of AI, regardless of your technical expertise or resource constraints.

Book a Demo

Frequently Asked Questions

No items found.

Kevin McGrath

Founder & CEO

Kevin McGrath is the Co-Founder and CEO of Meibel, bringing over 20 years of experience in cloud infrastructure and platform engineering. He previously served as Vice President and General Manager at Spot by NetApp (2022-2024), where he led a global organization of over 1,000 people, and held roles as Chief Technology Officer and VP of Architecture at Spot (2017-2022). He holds AWS Certified Solutions Architect (Professional), AWS Certified DevOps Engineer (Professional), and earned his BA in Economics and Master's in Computer/Information Technology Administration from University of Maryland.

Linkedin