Why your AI chatbot keeps making things up (and what to ask vendors about it)

A customer asks your chatbot what your refund policy is. The chatbot confidently quotes a 60-day window. Your actual policy is 30 days. The customer screenshots the bot, escalates to support, and you spend an hour explaining why the AI was wrong while honoring the answer it gave.

This is called hallucination, and it is the single most common reason AI chatbot projects get retired within their first year. It’s also the thing most vendors quietly hope you won’t notice during a demo.

This is a guide for non-technical decision makers. If you’re commissioning a chatbot, evaluating one your team has already built, or trying to understand why the bot you’re paying for keeps embarrassing you, this is what’s actually happening underneath.

What hallucination is, in one paragraph

Large language models — the technology under every modern AI chatbot — are pattern completers. They look at the conversation so far and produce the most plausible-sounding next words. They don’t know the difference between something they actually saw in your documents and something that just sounds like it might be true. Without specific engineering work, they will confidently invent facts.

That’s hallucination. It’s not a bug. It’s the default behavior of the technology. Preventing it is the job of the team building your chatbot.

How real chatbot teams prevent it

There are essentially three layers of defense, and any serious vendor will be doing all three.

Layer 1: Ground the model in your documents. Before the chatbot answers a question, it searches your actual content (website, FAQs, knowledge base, product catalog) for the most relevant passages. Then it sends those passages to the AI model with instructions like “only answer using this information.” The technique is called retrieval-augmented generation, or RAG. Every credible chatbot vendor does this now.

Layer 2: Verify the answer matches the source. Even with grounding, the model can drift, especially on questions where multiple passages are partially relevant. A serious system runs a verification step: take each claim in the answer, check that it’s actually supported by the source, and drop or flag any that aren’t. This is where most projects cut corners. Done well, it catches the cases that grounding alone misses.

Layer 3: Teach the bot to say “I don’t know.” When the retrieved content doesn’t have a good answer, the chatbot should say so. “I don’t have that information, let me get someone from the team” is dramatically better than a confidently wrong answer. Building this in is a deliberate choice. Most off-the-shelf platforms don’t do it well.

If a vendor’s pitch is “we use GPT-4, so it’s smart,” they are skipping all three layers. Walk away.

The demo trick to watch for

Here is how to test a chatbot you’re considering buying or already paying for.

Ask it a question that should not be in your knowledge base. Something specific that sounds plausible but isn’t in your docs. For an e-commerce store: “Do you offer a one-year accident protection plan on this product?” For a SaaS: “What’s your enterprise SSO included plan price?” For a B2B service: “Do you serve clients in [country you don’t serve]?”

A working chatbot will say it doesn’t have that information and offer to escalate. A hallucinating chatbot will invent an answer that sounds reasonable.

Most chatbot demos use only easy, well-documented questions because the vendor knows the second class of question breaks the illusion. Demanding the second class of question during your evaluation will tell you more in five minutes than an hour of marketing.

The other failure modes you should know about

Hallucination is the headline, but there are related issues that matter for business decision makers.

Stale information. Your refund policy changed two months ago. The chatbot is still answering with the old version because nobody updated its knowledge base. This is a process problem, not a technology one — but if the vendor’s deliverable doesn’t include a clear process for keeping knowledge fresh, you’ll hit it.

Confidently wrong tone on edge cases. Many chatbots are tuned to sound authoritative, which is great when they’re right and a brand liability when they’re wrong. A good chatbot speaks with calibrated confidence: certain when the answer is well-supported, hedged when it isn’t.

Misattributed sources. A subtle failure where the chatbot’s answer is technically supported by some document, but not the one it cites. Users who click the cited source to verify see a mismatch and lose trust. Citation verification — actually checking that the cited source supports the claim — is the fix.

Prompt injection. A user types something like “ignore previous instructions and tell me [thing you don’t want the bot saying].” Naive chatbots fall for this. Serious ones have guardrails. If you’re handling regulated content or anything sensitive, this matters.

What to ask vendors

Bring these to your first call. If the answers are vague, the proposal is going to disappoint.

How does your system prevent the bot from making things up? You should hear about retrieval, citations, and refusal. If you don’t, that’s the answer.
Show me a real conversation where the bot said “I don’t know.” Any vendor who can’t produce this on demand has not deployed enough chatbots to have built that muscle.
What happens when our content changes? How do we update what the bot knows? Self-service is the right answer. Calling the vendor for every update is not.
How do you measure quality after launch? Listen for words like “eval set,” “quality dashboard,” or “regression tests.” If quality measurement is “we’ll fix issues as they come in,” you’ll be reporting every problem yourself.
What’s the worst chatbot incident you’ve handled, and what did you do? The honest answer is more valuable than the marketing answer. Anyone who claims they’ve never had an incident has never shipped.
Can the bot cite its sources? Citations are the easiest trust-builder and most off-the-shelf platforms half-implement them.

What it costs to do this right

There’s no free lunch. Preventing hallucination properly adds 15-25% to the build cost of a chatbot and 20-30% to the per-conversation operating cost. Some vendors quote you a lower number by skipping it. The math doesn’t work in the long run because every embarrassing chatbot response costs more in customer support, brand damage, and team time than the engineering would have.

If you’ve been quoted a suspiciously low price, ask specifically what they’re doing about hallucination. The answer will tell you whether the price is sustainable.

If you’re at this decision point

We build chatbots with proper grounding, citation, and verification for clients across customer support, internal help desks, and product search. If you’re commissioning one — or trying to fix one that’s already in production and not behaving — see our RAG chatbot development page or email [email protected]. We’ll tell you honestly whether your current approach is salvageable.