#
How AI Chat Works: Optimization & Cost
This guide explains what happens behind the scenes when you use AI chat in Weavestream, how the app optimizes for speed and cost, and how you can make the most of it.
#
The Problem: APIs Generate a Lot of Data
When you connect multiple APIs, you can easily accumulate thousands of items — each with dozens of fields. Sending all of that raw data to an AI model for every question would be:
- Expensive — AI models charge by the token (roughly 1 token per 4 characters). A single item's JSON might be 500–5,000 tokens. Multiply that by hundreds or thousands of items and costs add up fast.
- Slow — More data means longer processing time.
- Less accurate — Models perform better with focused, relevant context. Flooding them with irrelevant data can dilute the quality of the answer.
Weavestream solves this with an agent-based tool system — the AI doesn't receive your data upfront. Instead, it actively searches and filters to pull in only what it needs to answer your question.
#
The Agent Loop
When you ask a question, the AI acts as an agent with access to a set of tools. It decides what data it needs, fetches it, analyzes the results, and may fetch more — all in a loop until it has enough to answer.
#
Flowchart
┌──────────────────────────┐
│ You ask a question │
└────────────┬─────────────┘
▼
┌──────────────────────────┐
│ System prompt sent with │
│ source/endpoint summary │
│ (names & field lists, │
│ NOT your actual data) │
└────────────┬─────────────┘
▼
┌──────────────────────────────────────────────┐
│ │
│ ┌─────────────────┐ │
│ │ AI thinks about │ │
│ │ what it needs │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Calls a tool │ │
│ │ (search, filter,│ │
│ │ count, etc.) │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Tool runs │ │
│ │ locally against │ │
│ │ your database │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Result sent │ AGENT │
│ │ back to AI │ LOOP │
│ └────────┬────────┘ │
│ ▼ │
│ ┌────────────┐ │
│ │ Need more │── Yes ──┐ │
│ │ data? │ │ │
│ └──────┬──────┘ │ │
│ No │ ┌──────┘ │
│ │ │ │
│ │ (loops back to │
│ │ "AI thinks") │
│ │ │
└─────────────────────┼─────────────────────────┘
▼
┌─────────────────┐
│ AI writes its │
│ final answer │
└────────┬────────┘
▼
┌─────────────────┐
│ Response shown │
│ with tool call │
│ activity log │
└─────────────────┘
#
What Happens Step by Step
You ask a question — Your message is sent to the AI along with a system prompt. The system prompt includes the names of your sources and endpoints, their field names, and item counts — but not the actual data.
The AI calls tools — Based on your question, the AI decides which tools to use. For example, if you ask "Show me critical alerts from the last week," it might call
filter_itemswith conditions for status and date.Tools run locally — Every tool executes against your local database. The results are sent back to the AI as context for its next step.
The loop continues — The AI might call additional tools to refine its search, cross-reference data across endpoints, or get full details on specific items. Each iteration adds to the conversation context.
The AI responds — Once it has enough information, the AI writes its final answer. You see both the response and a collapsible log of every tool call it made along the way.
The loop runs for up to 20 iterations by default before the AI is asked to summarize with whatever it has. In practice, most questions resolve in 2–5 tool calls.
#
The Tools
The AI agent has six tools at its disposal:
Each tool returns only the data requested. Results are capped at 16,000 characters per tool call to keep context manageable. Search results are limited to 500 items, filters to 1,000, and counts to 5,000.
#
How Scoping Keeps Things Efficient
What you select in the sidebar before asking a question directly controls what the agent's tools can access.
#
Selecting a Specific Endpoint
The agent's tools can only query items from that endpoint. This is the most efficient option — searches and filters run against a focused dataset, and the AI gets relevant results quickly with fewer tool calls.
#
Selecting a Smart Filter
The agent's tools are scoped to endpoints included in the filter, and only items matching the filter's conditions are visible. This is highly efficient and ideal for recurring analysis — the data is already pre-narrowed before the agent even starts.
#
Selecting a Source
All endpoints in the source are in scope. The agent can search and filter across all of them. This gives it more to work with but may require more tool calls to narrow things down.
#
No Selection (All Items)
Everything is in scope. The agent can discover and query all sources and endpoints. This gives it maximum flexibility but uses the most tokens, since it may need several tool calls just to orient itself before finding relevant data.
#
Summary
#
Understanding Token Costs
Every message you send, every tool call, and every tool result adds to the conversation's token count. The AI provider charges based on the total tokens processed.
#
How Tokens Accumulate
Unlike a simple send-and-receive, the agent loop means tokens grow with each iteration:
- Iteration 1: Your question + system prompt + tool definitions (~1,500–2,500 tokens)
- Iteration 2: All of the above + tool call + tool result + AI's next request
- Iteration 3: All of the above + another tool call + result
- ...and so on
Each tool result adds to the running context. A question that takes 3 tool calls will use more tokens than one that resolves in a single call. This is why scoping matters — a focused scope means the agent finds what it needs faster, with fewer iterations.
#
Prompt Caching (Claude)
Claude supports prompt caching — the system prompt and tool definitions are cached after the first iteration. For iterations 2 through 6, these cached portions cost only 10% of the normal input rate. This significantly reduces the cost of multi-step agent interactions.
#
Provider Pricing
A typical focused question (endpoint selected, 2–3 tool calls) might cost $0.005–$0.03 with Claude. Broad exploratory questions with many tool calls can cost more.
#
Ollama Considerations
Ollama is free but has a smaller context window (default: 8,192 tokens, configurable). Since the agent loop accumulates context with each tool call, complex questions with many iterations may exceed the context limit. For best results with Ollama, select a specific endpoint and ask focused questions.
#
Tips for Efficient AI Usage
#
1. Select the Narrowest Scope Possible
This is the single most impactful thing you can do. Selecting a specific endpoint or Smart Filter means the agent's tools return focused results immediately, reducing the number of iterations needed.
#
2. Use Smart Filters for Recurring Analysis
If you regularly ask the same types of questions about the same data, create a Smart Filter with conditions that pre-narrow items to what's relevant. The agent starts with a smaller, focused dataset and resolves your question faster with fewer tokens. See Smart Filters for how to set one up.
#
3. Use Endpoint Joins Instead of Broad Selections
If your question spans data from two endpoints (e.g., "Show me alerts with their device names"), don't select the whole source. Instead, create a Smart Filter with an Endpoint Join that combines the two endpoints. The joined data is pre-assembled, so the agent gets what it needs without making multiple cross-referencing tool calls.
#
4. Ask Specific Questions
Vague questions like "Tell me about my data" force the agent to make many exploratory tool calls. Specific questions like "Which critical alerts were created in the last 48 hours?" let it go directly to a targeted filter, often resolving in a single tool call.
#
5. Watch the Activity Log
As the AI works, you can see each tool call in real time — what it searched for, what it filtered, how many results came back. If you notice it's making many calls to find something, that's a sign you could scope your selection more tightly or create a Smart Filter for that query.
#
6. Consider Ollama for Exploratory Questions
If you're just poking around your data or asking casual questions, Ollama is free and keeps everything local. Save Claude or Gemini for complex analysis where quality and speed matter most.
#
7. Tune Advanced Settings If Needed
In Settings → Intelligence → Advanced, you can adjust agent limits:
- Max tool iterations — How many loops before the agent is forced to summarize (default: 20)
- Tool result max characters — How much data each tool call can return (default: 16,000)
- Search / Filter / Count limits — Maximum items fetched per tool call
The defaults work well for most use cases. Lower the limits if you want to reduce token usage; raise them if the agent isn't finding enough data.