Grice's Maxims as conversational design heuristics

When I designed the conversational prototype for my thesis, an AI interface to support people with IBS, I needed a rubric to evaluate whether the agent’s responses were good. The checklists out there (tone, brevity, clarity) are useful but vague. I ended up using something older and more solid: Grice’s Maxims.

H. P. Grice published them in 1975, in an essay on logic and conversation. He didn’t have AI in mind. He had in mind what makes two people in conversation actually understand each other. The four maxims became a reference in linguistics, and they work as conversational design heuristics, especially in health.

The cooperative principle

Before the maxims, Grice sets out the principle that grounds them: in a conversation, both parties cooperate. Each contributes, at the right moment, what serves the shared purpose of the exchange.

It’s a simple, radical principle. It implies conversation isn’t just transmitting information. It’s a joint effort where each turn is judged by how it helps the other’s next turn.

Applied to chatbots: the agent isn’t “responding to a prompt”. It’s participating in a cooperative conversation where the user also has work to do. This perspective changes what you design.

The four maxims

Grice splits the principle into four categories:

1. Quantity

Give the right amount of information. No more, no less.

In a health chatbot, this is the maxim I see violated most. Someone asks “I’ve had pain for days, is it IBS?” and the bot responds with three paragraphs on IBS subtypes, risk factors, and when to see a doctor. Too much. The user wanted validation, context, next action.

The right version is shorter: “Possibly, but I can’t diagnose. What kind of pain is it? And how long?”. Right amount, with follow-up.

2. Quality

Be truthful. Don’t say what you don’t have evidence for.

This is vital in health. The bot has to know what it knows and what it doesn’t. If there’s no evidence for a claim, say “there’s no clear evidence”. If unsure, say “I’m not sure”.

Most generative AI chatbots have a serious problem here: they hallucinate confidently. They invent studies, doses, brand names. In health this isn’t “sometimes wrong”, it’s dangerous.

Two ways to mitigate:

Explicit boundaries in the agent’s prompt: “if you don’t have enough info, say so clearly”.
Source attribution: “this comes from guideline X (2021)”.

I cover prompt design in more detail in Prompt engineering as design work.

3. Relation

Be relevant. Stay on topic.

Sounds obvious until you see bots responding to “I want to adjust my medication” with low FODMAP diet suggestions. The information is IBS-related, but not related to the question.

For a health agent, staying relevant means:

Recognise the current topic of conversation.
Don’t introduce new topics unprompted.
Respect when the user changes topic (don’t drag them back).

In multi-agent setups, this maxim becomes a routing criterion: which agent answers this query? Covered in Multi-agent orchestration for designers.

4. Manner

Avoid obscurity and ambiguity. Be brief and orderly.

Manner is how you say it, not what. In health, that translates to:

Plain language instead of clinical jargon (“cramps” instead of “smooth-muscle spasms in the bowel”).
Short sentences. The user may be in pain, anxious, or tired.
Predictable structure. If the bot gives a suggestion, always use the same format (suggestion + reason + next step, for example).
No temporal ambiguity. “Take the medication if you feel pain” is less clear than “Take the medication if you feel pain for more than 30 minutes”.

Crosses with Inclusive language in design (clear language is inclusive).

How to use it as a rubric

To evaluate an agent response, I ask four questions:

Quantity: does the amount of information serve the current question? If simple, is the response proportional?
Quality: is what’s being asserted supported? Are markers of uncertainty present where applicable?
Relation: is the response about what the user asked? Does it avoid introducing unsolicited topics?
Manner: is the language clear, jargon-free, with predictable structure?

If a response fails any of the four, it’s a candidate for prompt refinement.

Where Grice falls short (and what comes next)

Grice doesn’t cover everything. In particular:

Emotional tone. In health, there are emotional weights that ask for more than precision. Empathy, calm, presence. Grice is silent on this.
Accessibility. The maxims assume both can participate fully. Someone with dyslexia, low literacy, or in crisis, may not.
Culture. What counts as “right amount” varies by culture.

So I use Grice as a baseline and add layers: tone-of-voice guidelines, accessibility checks, cultural sensitivity reviews. But without Grice the rest sits on nothing.

More on the health background in the Design for Health guide. On the mental models users apply to AI, see Mental models for AI design. On how to surface conversational responses with observability, see Observability in agentic UX.