# How AI Chat Works: Optimization & Cost

This guide explains what happens behind the scenes when you use AI chat in Weavestream, how the app optimizes for speed and cost, and how you can make the most of it.

# The Problem: APIs Generate a Lot of Data

When you connect multiple APIs, you can easily accumulate thousands of items — each with dozens of fields. Sending all of that raw data to an AI model for every question would be:

Expensive — AI models charge by the token (roughly 1 token per 4 characters). A single item's JSON might be 500–5,000 tokens. Multiply that by hundreds or thousands of items and costs add up fast.
Slow — More data means longer processing time.
Less accurate — Models perform better with focused, relevant context. Flooding them with irrelevant data can dilute the quality of the answer.

Weavestream solves this with an agent-based tool system — the AI doesn't receive your data upfront. Instead, it actively searches and filters to pull in only what it needs to answer your question.

# The Agent Loop

When you ask a question, the AI acts as an agent with access to a set of tools. It decides what data it needs, fetches it, analyzes the results, and may fetch more — all in a loop until it has enough to answer.

# Flowchart

┌──────────────────────────┐
│    You ask a question     │
└────────────┬─────────────┘
             ▼
┌──────────────────────────┐
│  System prompt sent with  │
│  source/endpoint summary  │
│  (names & field lists,    │
│   NOT your actual data)   │
└────────────┬─────────────┘
             ▼
┌──────────────────────────────────────────────┐
│                                               │
│            ┌─────────────────┐                │
│            │  AI thinks about │                │
│            │  what it needs   │                │
│            └────────┬────────┘                │
│                     ▼                         │
│            ┌─────────────────┐                │
│            │  Calls a tool    │                │
│            │  (search, filter,│                │
│            │   count, etc.)   │                │
│            └────────┬────────┘                │
│                     ▼                         │
│            ┌─────────────────┐                │
│            │  Tool runs       │                │
│            │  locally against │                │
│            │  your database   │                │
│            └────────┬────────┘                │
│                     ▼                         │
│            ┌─────────────────┐                │
│            │  Result sent     │     AGENT     │
│            │  back to AI      │      LOOP     │
│            └────────┬────────┘                │
│                     ▼                         │
│              ┌────────────┐                   │
│              │ Need more   │── Yes ──┐        │
│              │ data?       │         │        │
│              └──────┬──────┘         │        │
│                 No  │         ┌──────┘        │
│                     │         │               │
│                     │    (loops back to        │
│                     │     "AI thinks")         │
│                     │                         │
└─────────────────────┼─────────────────────────┘
                      ▼
             ┌─────────────────┐
             │  AI writes its   │
             │  final answer    │
             └────────┬────────┘
                      ▼
             ┌─────────────────┐
             │  Response shown  │
             │  with tool call  │
             │  activity log    │
             └─────────────────┘

# What Happens Step by Step

You ask a question — Your message is sent to the AI along with a system prompt. The system prompt includes the names of your sources and endpoints, their field names, and item counts — but not the actual data.
The AI calls tools — Based on your question, the AI decides which tools to use. For example, if you ask "Show me critical alerts from the last week," it might call filter_items with conditions for status and date.
Tools run locally — Every tool executes against your local database. The results are sent back to the AI as context for its next step.
The loop continues — The AI might call additional tools to refine its search, cross-reference data across endpoints, or get full details on specific items. Each iteration adds to the conversation context.
The AI responds — Once it has enough information, the AI writes its final answer. You see both the response and a collapsible log of every tool call it made along the way.

The loop runs for up to 20 iterations by default before the AI is asked to summarize with whatever it has. In practice, most questions resolve in 2–5 tool calls.

# The Tools

The AI agent has six tools at its disposal:

Tool	What It Does	When the AI Uses It
list_sources	Returns your sources, endpoints, field names, and item counts	First — to discover what data is available
search_items	Full-text search across titles, summaries, and raw JSON	When looking for specific keywords or terms
filter_items	Filters items by field conditions (equals, contains, greater than, before/after date, etc.)	When narrowing by specific criteria
count_items	Counts items with optional grouping by endpoint, status, or source	When asked for totals, breakdowns, or distributions
get_item_details	Fetches the full raw JSON for a single item	When it needs complete data on a specific item
get_field_values	Gets distinct values for a field across items	When exploring what values exist in a field

Each tool returns only the data requested. Results are capped at 16,000 characters per tool call to keep context manageable. Search results are limited to 500 items, filters to 1,000, and counts to 5,000.

# How Scoping Keeps Things Efficient

What you select in the sidebar before asking a question directly controls what the agent's tools can access.

# Selecting a Specific Endpoint

The agent's tools can only query items from that endpoint. This is the most efficient option — searches and filters run against a focused dataset, and the AI gets relevant results quickly with fewer tool calls.

# Selecting a Smart Filter

The agent's tools are scoped to endpoints included in the filter, and only items matching the filter's conditions are visible. This is highly efficient and ideal for recurring analysis — the data is already pre-narrowed before the agent even starts.

# Selecting a Source

All endpoints in the source are in scope. The agent can search and filter across all of them. This gives it more to work with but may require more tool calls to narrow things down.

# No Selection (All Items)

Everything is in scope. The agent can discover and query all sources and endpoints. This gives it maximum flexibility but uses the most tokens, since it may need several tool calls just to orient itself before finding relevant data.

# Summary

Selection	Data in Scope	Tool Call Efficiency	Recommended For
Endpoint	One endpoint only	Highest — focused results	Specific, targeted questions
Smart Filter	Pre-filtered items	Very high — already narrowed	Recurring analysis patterns
Source	All endpoints in source	Medium — broader search space	Exploring across a service
All Items	Everything	Lower — more discovery needed	Broad, exploratory questions

# Understanding Token Costs

Every message you send, every tool call, and every tool result adds to the conversation's token count. The AI provider charges based on the total tokens processed.

# How Tokens Accumulate

Unlike a simple send-and-receive, the agent loop means tokens grow with each iteration:

Iteration 1: Your question + system prompt + tool definitions (~1,500–2,500 tokens)
Iteration 2: All of the above + tool call + tool result + AI's next request
Iteration 3: All of the above + another tool call + result
...and so on

Each tool result adds to the running context. A question that takes 3 tool calls will use more tokens than one that resolves in a single call. This is why scoping matters — a focused scope means the agent finds what it needs faster, with fewer iterations.

# Prompt Caching (Claude)

Claude supports prompt caching — the system prompt and tool definitions are cached after the first iteration. For iterations 2 through 6, these cached portions cost only 10% of the normal input rate. This significantly reduces the cost of multi-step agent interactions.

# Provider Pricing

Provider	Input Tokens	Output Tokens	Notes
Claude	$3 / 1M tokens	$15 / 1M tokens	Prompt caching reduces repeat costs
Gemini	Varies by model	Varies by model	2.5 Flash offers best cost/performance
Ollama	Free	Free	Runs locally, limited by context window

A typical focused question (endpoint selected, 2–3 tool calls) might cost $0.005–$0.03 with Claude. Broad exploratory questions with many tool calls can cost more.

# Ollama Considerations

Ollama is free but has a smaller context window (default: 8,192 tokens, configurable). Since the agent loop accumulates context with each tool call, complex questions with many iterations may exceed the context limit. For best results with Ollama, select a specific endpoint and ask focused questions.

# Tips for Efficient AI Usage

# 1. Select the Narrowest Scope Possible

This is the single most impactful thing you can do. Selecting a specific endpoint or Smart Filter means the agent's tools return focused results immediately, reducing the number of iterations needed.

# 2. Use Smart Filters for Recurring Analysis

If you regularly ask the same types of questions about the same data, create a Smart Filter with conditions that pre-narrow items to what's relevant. The agent starts with a smaller, focused dataset and resolves your question faster with fewer tokens. See Smart Filters for how to set one up.

# 3. Use Endpoint Joins Instead of Broad Selections

If your question spans data from two endpoints (e.g., "Show me alerts with their device names"), don't select the whole source. Instead, create a Smart Filter with an Endpoint Join that combines the two endpoints. The joined data is pre-assembled, so the agent gets what it needs without making multiple cross-referencing tool calls.

# 4. Ask Specific Questions

Vague questions like "Tell me about my data" force the agent to make many exploratory tool calls. Specific questions like "Which critical alerts were created in the last 48 hours?" let it go directly to a targeted filter, often resolving in a single tool call.

# 5. Watch the Activity Log

As the AI works, you can see each tool call in real time — what it searched for, what it filtered, how many results came back. If you notice it's making many calls to find something, that's a sign you could scope your selection more tightly or create a Smart Filter for that query.

# 6. Consider Ollama for Exploratory Questions

If you're just poking around your data or asking casual questions, Ollama is free and keeps everything local. Save Claude or Gemini for complex analysis where quality and speed matter most.

# 7. Tune Advanced Settings If Needed

In Settings → Intelligence → Advanced, you can adjust agent limits:

Max tool iterations — How many loops before the agent is forced to summarize (default: 20)
Tool result max characters — How much data each tool call can return (default: 16,000)
Search / Filter / Count limits — Maximum items fetched per tool call

The defaults work well for most use cases. Lower the limits if you want to reduce token usage; raise them if the agent isn't finding enough data.