Scaling AI With Context, Control and Confidence | Meibel at the Generative AI Summit, NY Tech Week 2026

Key takeaways:

AI is an enabler, not the goal. Companies that start with "I need to process X more things per day" succeed. Companies that start with "we need an AI strategy" don't.
Copilot vs. Automation. There are two buckets of AI in your organization: AI that sits next to a human (copilots), and AI that runs processes without constant human oversight. Both matter. This talk focuses on the latter, and why it's harder than it looks.
The engineering trap is real. What starts as a quick win (throw data at an LLM, get good output) turns into prompt engineering, then context engineering, then a vector database, then a graph database, then a full infrastructure stack your team is managing at 2 AM. The gap between a demo and production is enormous.
Build vs. Buy. There are strong reasons to build your own AI in-house: you need to own the logic, control the data, integrate deeply with how your business operates, and differentiate from competitors. If everyone uses the same off-the-shelf agent, where's the competitive advantage?

Two customer stories:

SpecBooks (Construction)

A construction marketplace that processes specification books for wholesalers, distributors, and contractors. They started on AWS with Textract and Bedrock, hit the limits of text extraction across wildly inconsistent document formats (PDFs, Excel, Word, photos from the field), and fell into the engineering trap. Today, they process tens of thousands of specification books a month and are building new revenue streams on top of their platform.

Toffler Associates (Strategic Planning)

A consulting firm working with Space Force, Navy, and commercial organizations on foresight and red teaming. They tried to productize 30 years of consulting knowledge using the Microsoft stack, but couldn't get consistent outputs that sounded like Toffler. Today, they've released a product their customers can interact with alongside consulting services.

What's fascinating: these two customers need opposite things from confidence scoring. SpecBooks needs exact-match accuracy (did the extracted value match the source document?). Toffler needs creative divergence (is the red teaming output surfacing unexpected scenarios?). There is no one-size-fits-all way to measure AI quality.

The three things every organization needs to get right:

Context — How you prepare your data matters more than how much data you throw at the model. Larger context windows are not the answer. Narrow your dataset to the smallest set that solves the problem. Understand the modality of your data (text, tables, images, handwriting). Build a metadata layer. Not everything needs to go to an LLM.
Control — What data do you show to the LLM at a given time? The more data you expose, the more chance the model picks up something you didn't want it to see. Kevin uses the movie Memento as an analogy: the protagonist has no short-term memory and reconstructs his story from tattoos on his body each morning. If he doesn't see a tattoo, he doesn't know the context. LLMs work the same way. Control what they see, and you'll understand why the output is what it is.
Confidence — You can't deploy what you can't measure. If you're putting your finger in the air and saying "I think it's good," you'll never detach from having a human review every output. Measurement needs to be tuned to your use case, using multiple scoring methods (semantic matching between inputs and outputs, grounding, correctness, completeness), not just a single LLM judge.

Take the First Step

Ready to start your AI journey? Contact us to learn how Meibel can help your organization harness the power of AI, regardless of your technical expertise or resource constraints.

Book a Demo

On data cleanliness (audience Q&A):

Do not let anyone tell you the first step to using AI is to clean all your data. You'll never use AI. Data has never been clean since we started storing it on hard drives. The real requirement is understanding how your data influences your outputs, not perfecting the data first. If you can see which documents are influencing your answers, you can start making targeted improvements.

On noisy document extraction (audience Q&A):

Extraction is not a single-pass process. A PDF can contain text, headers, images, tables, pictures of tables, embedded files. You need multiple passes that understand the probability of what each element is (header, graphic, subsection), how elements relate to each other, reading order, and whether a legend on the page corresponds to a nearby graphic. You also need to track the uncertainty at each step and carry that forward into your confidence scoring.

Timestamps:

00:00 – The real problem: why companies have a throughput problem, not an AI problem
01:20 – Co-pilot vs. automation: two ways to use AI in your organization
04:30 – The engineering trap: how "quick wins" become infrastructure debt
08:15 – Case study: Specbooks and document processing at construction scale
10:20 – Case study: Toffler Associates and AI-powered strategic foresight
12:00 – Context engineering: how to narrow your data set before touching an LLM
15:45 – Control: governing what your model sees (and what it doesn't)
18:10 – Confidence scoring: how to measure AI outputs for your specific use case
21:50 – AI in production: outcomes, scale, and what actually changed

Frequently Asked Questions

No items found.

Kevin McGrath

Founder & CEO

Kevin McGrath is the Co-Founder and CEO of Meibel, bringing over 20 years of experience in cloud infrastructure and platform engineering. He previously served as Vice President and General Manager at Spot by NetApp (2022-2024), where he led a global organization of over 1,000 people, and held roles as Chief Technology Officer and VP of Architecture at Spot (2017-2022). He holds AWS Certified Solutions Architect (Professional), AWS Certified DevOps Engineer (Professional), and earned his BA in Economics and Master's in Computer/Information Technology Administration from University of Maryland.

Linkedin