Composable Intelligence Patterns: Beyond Single-Model Prompts

Most AI integrations are a single prompt to a single model. That works for a lot of things. But some problems are better solved by combining multiple models — routing to specialists, cross-validating outputs, or reasoning in multiple stages.

Neureus exposes these patterns via the /composite/execute endpoint. This post explains the 7 patterns, when each makes sense, and how to call them.

Why composite patterns?

Single-model prompts have well-known failure modes:

Overconfidence: GPT-4o gives a wrong answer with high confidence
Task mismatch: Using a premium general model for tasks a cheaper specialized model handles better
Single point of failure: One model’s biases or gaps determine the outcome

Composite patterns address these by using multiple models — either in sequence or in parallel — and synthesizing their outputs.

The 7 patterns

1. Ensemble voting

Run the same prompt through multiple models. Return the response that the majority agrees on (or the consensus synthesis).

Use when: High-stakes factual questions where you want cross-validation. A “yes” from GPT-4o, Claude, and DeepSeek is more trustworthy than a “yes” from one of them.

curl -X POST https://app.neureus.ai/composite/execute \
  -H "Authorization: Bearer nr_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": "ensemble",
    "input": "What is the capital of Australia?",
    "models": ["claude-sonnet-4-6", "gpt-4o", "deepseek-r1"],
    "profile": "general"
  }'

Response includes the consensus answer and how many models agreed. Divergence signals ambiguity or low confidence — useful signal in itself.

2. Expert routing

Classify the input, then route to the model best suited for that category.

Use when: Your users ask questions across multiple domains — some need a reasoning model, some need a code model, some need a general model. Routing to the right specialist improves quality while reducing cost (cheap model for simple tasks, premium model for complex ones).

curl -X POST https://app.neureus.ai/composite/execute \
  -H "Authorization: Bearer nr_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": "expert_routing",
    "input": "Write a binary search in TypeScript",
    "routing_model": "@wai/llama-3.3-70b",
    "experts": {
      "code": "deepseek-r1",
      "math": "deepseek-r1",
      "general": "claude-haiku-4-5",
      "creative": "claude-sonnet-4-6"
    }
  }'

The routing model classifies the task (free, via Workers AI). The specialist model handles the response (priced at actual usage). For most codebases, 60–70% of requests are “general” tasks that can use cheap or free models.

3. Step-back reasoning

Ask the model to identify the underlying principle before answering the specific question.

Use when: Questions where the specific phrasing might mislead, or where domain knowledge needs to be surfaced before problem-solving. Classic example: math word problems where the model jumps to computation before understanding the problem structure.

{
  "pattern": "step_back",
  "input": "If I invest $10,000 at 7% compound interest for 20 years, how much do I have?",
  "model": "claude-sonnet-4-6"
}

Under the hood: two calls. First call asks the model to state the relevant principles (compound interest formula, time value of money). Second call uses those principles as context for the actual calculation. Reduces arithmetic errors by ~30% on financial and scientific problems.

4. Least-to-most decomposition

Break a complex task into subtasks, solve each in order, and build up to the final answer.

Use when: Multi-step problems where solving the whole thing at once leads to incomplete or incorrect answers. Software debugging, multi-step planning, complex research questions.

{
  "pattern": "least_to_most",
  "input": "Design a database schema for a multi-tenant SaaS app with billing and RBAC",
  "model": "claude-sonnet-4-6",
  "max_subtasks": 5
}

The model generates a decomposition plan, then executes each step using the previous step’s output as context. Each subtask response feeds into the next. Effective for problems where the answer to step 3 depends on the answer to step 2.

5. Maieutic prompting

Iteratively ask “why” until the model provides a fully grounded explanation.

Use when: You want explanations that don’t rely on assumed background knowledge. Educational content, compliance documentation, onboarding materials.

{
  "pattern": "maieutic",
  "input": "Why does garbage collection cause pauses in application performance?",
  "model": "claude-sonnet-4-6",
  "depth": 3
}

Three levels of “why” questions get unpacked into a fully grounded explanation. Good for transforming expert knowledge into beginner-friendly content.

6. Contrastive chain-of-thought

Ask the model to show its work on both the correct AND incorrect approach.

Use when: You want reasoning traces that are more reliable. Standard chain-of-thought prompting has models sometimes reasoning toward the wrong answer and then constructing post-hoc justifications. Contrastive CoT forces the model to work through why alternatives are wrong.

{
  "pattern": "contrastive_cot",
  "input": "Which sorting algorithm is best for a nearly-sorted array of 10,000 items?",
  "model": "gpt-4o"
}

The response includes: a chain-of-thought for the incorrect approach (e.g., quicksort), the errors in that reasoning, and the correct approach (insertion sort or timsort for nearly-sorted data). More expensive (2–3× calls), but significantly higher factual reliability.

7. Industry profile execution

Apply a pre-configured pattern designed for a specific industry with compliance guardrails.

Use when: Healthcare, financial, legal, or support use cases where industry conventions, terminology, and compliance requirements should be baked in rather than re-specified in every prompt.

{
  "pattern": "profile",
  "profile": "financial",
  "input": "Summarize the key risks in this loan agreement",
  "model": "claude-sonnet-4-6"
}

The financial profile automatically:

Rejects SSNs, card numbers, and account numbers (422 error before any LLM call)
Adds a “not financial advice” disclaimer to outputs
Sets response format to structured financial terminology
Returns regulated: true in the response for your audit log

Healthcare profile does the same for PHI/PII detection.

TypeScript SDK

import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });

// Ensemble: get consensus across 3 models
const consensus = await client.composite.execute({
  pattern: 'ensemble',
  input: 'Is this API endpoint RESTful?',
  models: ['claude-sonnet-4-6', 'gpt-4o', '@wai/llama-3.3-70b'],
});
console.log(consensus.answer, consensus.confidence);

// Expert routing: let a free model decide which paid model to use
const routed = await client.composite.execute({
  pattern: 'expert_routing',
  input: userQuestion,
  routingModel: '@wai/llama-3.3-70b',
  experts: {
    code: 'deepseek-r1',
    math: 'deepseek-r1',
    general: 'claude-haiku-4-5',
    creative: 'claude-sonnet-4-6',
  },
});
console.log(routed.answer, routed.expert_used, routed.costUsd);

Choosing a pattern

Pattern	Best for	Cost multiplier
Ensemble	High-stakes factual	3× (3 model calls)
Expert routing	Mixed-domain chatbots	1.1× (routing is free)
Step-back	Math, science, planning	2×
Least-to-most	Complex multi-step tasks	3–5×
Maieutic	Educational content	3×
Contrastive CoT	Reliable reasoning	2–3×
Industry profile	Compliance use cases	1.5× + guardrails

For most general-purpose chatbots: expert routing is the best default — it improves quality while reducing cost, since cheap models handle most requests.

For high-stakes decisions (medical triage, financial analysis): ensemble voting with 3 models gives meaningful cross-validation.

What you’re not doing with patterns

These patterns are handled server-side by Neureus. You’re not:

Making 3 parallel API calls and writing merge logic
Running a routing classifier yourself
Building step-back prompting infrastructure

One API call, one billing line item, structured response. The complexity is managed for you.

Available on all plans. See app.neureus.ai/onboard to get started.