← Back to Blog Hiring an AI Agency

What Questions Should I Ask an Agency Before Hiring Them to Build AI Automation Across My Business?

Adam Harris Dec 22, 2025 9 min read

Due diligence checklist organized by technical capability, security, and delivery evaluation areas

You've decided you need an agency to build AI automation across your business. Maybe you've already talked to a few. They all sound great on the call... confident, full of jargon, demos that sparkle. But Gartner predicts over 40% of agentic AI projects will be canceled by end of 2027, and RAND Corporation research puts the broader AI project failure rate above 80%.

Those numbers aren't random. They trace back to bad partnerships. Wrong team, wrong approach, wrong incentives. The sales process for AI agencies is designed to make you feel confident. Your job is to get past the confidence and find the substance.

Here are the 10 questions that will tell you whether an agency can actually deliver... or whether they're just good at selling.

1. "Show Me Something Running in Production Right Now"

Not a demo. Not a prototype. Not a proof of concept that wowed the CEO but never saw real data. You want to see a system that's been running with real users, real edge cases, and real 3 AM failures for at least six months.

The gap between demo and production is where most AI projects die. A demo doesn't need to handle authentication, error states, concurrent users, data privacy, or cost controls. Production needs all of that. RAND identified five root causes of AI project failure, and several of them... inadequate data infrastructure, technology-first thinking, deployment pipeline gaps... only surface when you move past the demo stage.

What a good answer sounds like: They pull up a dashboard showing uptime, error rates, and cost-per-transaction for a system they built months ago. They tell you what broke last month and how they fixed it. They have war stories, not just slide decks.

What a bad answer sounds like: They show you a polished video, pivot to a different demo, or say "we can't share client work due to NDAs" without offering any alternative proof.

2. "What's Your Process When You Discover Our Data Is a Mess?"

Data quality kills more AI projects than bad models. A Gartner survey of data management leaders found that 63% of organizations either don't have or aren't sure they have the right data management practices for AI. RAND explicitly calls out lacking necessary data as one of the five root causes of project failure.

Every company thinks their data is better than it is. A good agency knows this and has a plan for it. They should be asking hard questions about your data in the first conversation... what format, what quality, what volume, where it lives, who owns it.

What a good answer sounds like: "We do a data assessment in the first two weeks. Here's what we look for. Here's what we've found at companies like yours. Here's how we handle it when the data isn't ready." They should be able to describe a project where data issues changed the scope or timeline.

What a bad answer sounds like: "We can work with whatever you have." Or worse: no questions about your data at all during the sales process.

3. "Walk Me Through a Project That Went Sideways"

Every agency that's done real work has had projects go wrong. Models that hallucinated in front of customers. Integrations that took three times longer than scoped. Stakeholders who changed requirements mid-build. Data that turned out to be unusable.

If an agency presents an unblemished track record, they're either lying or they haven't done enough work to have been tested. You want to hear about failure because the quality of the answer tells you more than any case study on their website.

What a good answer sounds like: A specific story with real detail. What went wrong, when they discovered it, what they did about it, what they learned. Bonus points if they explain how they changed their process to prevent it from happening again.

What a bad answer sounds like: Vague platitudes about "challenges" and "learning opportunities" with no specifics. Or deflecting: "Our projects don't fail because of our rigorous process."

4. "Who Specifically Would Work on Our Project?"

You want names, not roles on a slide deck. Building production AI automation requires deep specialization across data engineering, AI architecture, prompt engineering, backend development, DevOps, and security. If the agency's "AI team" is three full-stack developers who picked up the OpenAI API last quarter, that's not an AI team.

Deloitte's State of AI in the Enterprise report identifies insufficient worker skills as the single biggest barrier to integrating AI into workflows. Fewer than half of organizations say they are strategically prepared for AI. That gap applies to agencies too... many of them are staffing AI projects with people who were doing something else six months ago.

What a good answer sounds like: "Here's the team. Sarah is our AI architect; she's been building production AI systems for three years. Mike handles data engineering; he built the pipeline for [specific project]. They've worked together on four previous engagements."

What a bad answer sounds like: "We'll assign the right team based on your needs." Or a slide with generic role titles and no names.

5. "If We Need to Switch from OpenAI to Anthropic Next Year, What Does That Look Like?"

The AI landscape is moving faster than anything in tech history. The model you use today might be obsolete in six months. The API you depend on might change pricing, capabilities, or terms of service. If your agency builds everything tightly coupled to a single provider, you're locked in... and lock-in in AI is expensive.

This question tests whether the agency thinks architecturally or just knows how to call an API. A production-ready team builds abstraction layers that make model swaps a configuration change, not a rewrite.

What a good answer sounds like: "Our architecture abstracts the model layer. We've actually done this migration before... here's what it looked like, here's how long it took, here's what broke." They should be able to draw the architecture on a whiteboard and show you where the model layer sits.

What a bad answer sounds like: "We're partnered with [single vendor] and they're the best." Or a blank stare that tells you they've never considered the question.

6. "What Does Month 6 Look Like After Launch?"

AI systems aren't websites. You don't build them, launch them, and walk away. Models drift. Data patterns shift. Providers update their models and silently change behavior. Prompt strategies that worked in January may fail by June. Costs creep up as usage grows.

If an agency's engagement model is "build, handoff, goodbye," that's a problem. You need to understand their approach to model monitoring, cost management, incident response, and ongoing optimization. As we've written about in our post on red flags when hiring an AI agency, the absence of a post-deployment story is one of the clearest warning signs.

What a good answer sounds like: "Here's our support model. We monitor [specific metrics]. We do monthly optimization reviews. When a model provider ships an update, here's our evaluation process. Here's what our incident response looks like."

What a bad answer sounds like: "We'll hand off the documentation and your team can maintain it." Or a support contract that's clearly an afterthought bolted onto the proposal.

7. "How Do You Scope and Price This Work?"

Be wary of two extremes. Flat-rate pricing for AI projects ("We'll automate your customer service for $50K") ignores the enormous variability in scope, data quality, and integration complexity between businesses. Pure time-and-materials with no caps or milestones gives the agency no incentive to be efficient and gives you no predictability.

McKinsey's State of AI survey found that while organizations are broadly adopting gen AI, only about one-third report successfully scaling it across the organization. Part of that gap is financial... projects that seemed affordable in the pilot phase become unsustainable at scale because nobody modeled the real costs.

What a good answer sounds like: "We start with a paid discovery phase... two to three weeks, defined deliverable, fixed price. That gives us both enough information to scope the build accurately. The build itself is phase-gated with go/no-go decisions at each milestone. Here's how we break down what drives cost."

What a bad answer sounds like: Specific ROI promises before they've seen your data. Or a flat-rate quote with no discovery phase and no explanation of assumptions.

8. "How Do You Handle Security, Compliance, and Data Privacy?"

AI automation across your business means AI touching sensitive data... customer records, financial information, proprietary business logic, internal communications. The compliance implications are real, and they vary by industry.

This isn't a "nice to have" conversation. It's table stakes. If the agency can't articulate their approach to data handling, audit trails, access controls, and regulatory compliance in the context of AI systems, they're not ready for enterprise work.

What a good answer sounds like: Specific policies: where data goes, which models see it, whether it's used for training, how they handle PII. They should know whether you need SOC 2, HIPAA, GDPR, or whatever applies to your industry... and have built systems under those constraints before. Every AI decision that affects customers or operations should be logged and traceable.

What a bad answer sounds like: "We follow best practices." With no specifics. Or the topic never comes up in their sales process at all.

9. "What Would You Tell Us NOT to Automate?"

This question separates the agencies optimizing for your success from the ones optimizing for their revenue. A good agency will push back. They'll tell you when a process isn't a good candidate for AI automation... maybe the data isn't there, maybe the process is too ambiguous, maybe a simpler rule-based approach would work better and cost less.

The RAND study found that a primary root cause of AI failure is organizations focusing more on using the latest technology than on solving real problems. If an agency mirrors this behavior... if they say "yes" to everything you suggest... expect them to mirror it during delivery too, until the budget runs out and nothing works.

What a good answer sounds like: "Based on what you've described, I'd start with [specific workflow] because the data is structured and the ROI is clear. I'd hold off on [other workflow] until we've validated [specific assumption]. And honestly, [third thing] might be better solved with a simple rule engine, not AI."

What a bad answer sounds like: "We can automate all of that." Especially if they say it on the first call before understanding your data or processes.

10. "Can We Talk to Your Last Three Clients?"

Not their best client. Not a hand-picked reference from two years ago. Their last three. This is the single most revealing question you can ask, and how they respond tells you everything.

An agency with a strong track record will connect you without hesitation. They'll let you ask whatever you want. The clients will describe real experiences... what went well, what was hard, how the agency handled problems.

What a good answer sounds like: "Absolutely. Here are three contacts. Call them directly; you don't need to go through us." Bonus points if they proactively say: "Client two had a rough patch with us around month three; they'll tell you about it. Ask them how we handled it."

What a bad answer sounds like: They stall, they offer a single cherry-picked reference, they say NDAs prevent it (NDAs rarely prevent a client from confirming they worked with someone and sharing their experience), or they redirect you to case studies on their website instead.

The Question Behind the Questions

All 10 of these questions are really testing one thing: has this agency done this before, for real, in production, with the scars to prove it?

The AI agency market is in a hype cycle. Gartner estimates that only about 130 of the thousands of agentic AI vendors are legitimate... the rest are "agent washing," rebranding chatbots and RPA tools as agentic AI. The same dynamic applies to agencies. Everyone added "AI" to their website. The question is whether they added it to their capabilities too.

Question	Tests For
Show me production systems	Real delivery experience vs. demo-ware
What if our data is messy?	Data maturity and honest assessment
Tell me about a failure	Depth of experience and learning culture
Who works on our project?	Team depth vs. subcontractor patchwork
What if we switch models?	Architectural thinking vs. vendor lock-in
What does month 6 look like?	Post-launch operational maturity
How do you scope and price?	Financial transparency and risk sharing
How do you handle security?	Enterprise readiness and compliance depth
What should we NOT automate?	Honesty and problem-first thinking
Can we talk to recent clients?	Confidence in their own track record

Key Takeaways

Demand production evidence. Demos prove nothing. You need to see systems running with real users, real data, and real failure recovery.
Interrogate the data story early. If they're not asking about your data in the first conversation, they're not serious about delivery.
Listen for honesty, not confidence. The best agencies tell you what's hard, push back on unrealistic scope, and recommend against AI when it's not the right tool.
Know the team, not the brand. The people doing the work matter more than the logo on the proposal. Get names, backgrounds, and tenure.
Plan for the long game. AI systems require ongoing attention. An agency with no post-launch story is setting you up for an orphaned system.
Trust the reference check. Talking to recent clients is the single highest-signal evaluation step. If an agency resists it, that tells you everything.

The agency that welcomes these questions... that answers them with specifics instead of generalities... that pushes back where you need pushing back... that's the one worth trusting with your AI automation strategy.