How Token Tracking and Cost Calculation Work in Pi-AI: A Complete Technical Guide
Pi-AI tracks input, output, and cache tokens through a unified Usage interface, then calculates costs using per-model pricing rates defined in packages/ai/src/models.ts.
The badlogic/pi-mono repository implements a comprehensive token tracking and cost calculation system that normalizes provider-specific token data into a standardized format. This architecture enables accurate cost attribution across different AI providers like OpenAI, Anthropic, and Bedrock while maintaining a consistent interface for the Pi-AI runtime and UI components.
The Unified Usage Interface
At the core of Pi-AI's token tracking system is the Usage interface defined in packages/ai/src/types.ts. This contract ensures every assistant message carries standardized token metadata regardless of which provider generated the response.
// packages/ai/src/types.ts
export interface Usage {
/** raw token counts */
input: number;
output: number;
cacheRead: number;
cacheWrite: number;
totalTokens: number; // convenience sum
/** derived monetary amounts */
cost: {
input: number;
output: number;
cacheRead: number;
cacheWrite: number;
total: number;
};
}
Every AssistantMessage in the same file includes a usage: Usage field, ensuring token data travels together with the conversation transcript through the entire pipeline.
Normalizing Provider Token Data
Provider adapters in packages/ai/src/providers/*.ts map raw API responses onto the unified Usage interface. Each adapter extracts provider-specific fields and normalizes them into the standard format before cost calculation occurs.
For example, the OpenAI Responses adapter handles field mapping as follows:
// packages/ai/src/providers/openai-responses.ts (excerpt)
output.usage.input = response.usage.input_tokens ?? 0;
output.usage.output = response.usage.output_tokens ?? 0;
output.usage.cacheRead = response.usage.input_tokens_details?.cached_tokens ?? 0;
...
// After the raw values are set, the shared logic below adds cost.
Similar mapping exists for Anthropic, Bedrock, Google, and other providers. After populating the raw token counts, each adapter calls the shared cost calculation logic to populate the monetary fields.
Converting Tokens to Dollars
The calculateCost() function in packages/ai/src/models.ts transforms token counts into monetary values using per-model pricing rates. These rates are defined as dollars-per-million-tokens in the generated model catalog.
// packages/ai/src/models.ts
export function calculateCost<TApi extends Api>(model: Model<TApi>, usage: Usage): Usage["cost"] {
// model.cost.* are $/million‑tokens values defined in the generated model catalog
usage.cost.input = (model.cost.input / 1_000_000) * usage.input;
usage.cost.output = (model.cost.output / 1_000_000) * usage.output;
usage.cost.cacheRead = (model.cost.cacheRead / 1_000_000) * usage.cacheRead;
usage.cost.cacheWrite= (model.cost.cacheWrite/ 1_000_000) * usage.cacheWrite;
usage.cost.total = usage.cost.input + usage.cost.output +
usage.cost.cacheRead + usage.cost.cacheWrite;
return usage.cost;
}
When a provider finishes building a Usage object, Pi-AI calls calculateCost() to populate the monetary fields. The resulting usage.cost.total is stored on the message and later summed across the entire conversation.
Runtime Aggregation and UI Display
The agent runtime aggregates token and cost fields across all messages in a run, exposing totals for both CLI tools and web interfaces.
In packages/mom/src/agent.ts, the runtime accumulates usage statistics:
// packages/mom/src/agent.ts (excerpt)
runState.totalUsage.input += assistantMsg.usage.input;
runState.totalUsage.output += assistantMsg.usage.output;
runState.totalUsage.cacheRead += assistantMsg.usage.cacheRead;
runState.totalUsage.cacheWrite += assistantMsg.usage.cacheWrite;
runState.totalUsage.cost.total += assistantMsg.usage.cost.total;
The coding-agent UI footer displays these aggregates in real-time:
// packages/coding-agent/src/modes/interactive/components/footer.ts
let totalCost = 0;
for (const entry of entries) {
totalCost += entry.message.usage.cost.total;
}
For human-readable display, packages/web-ui/src/utils/format.ts provides formatting utilities:
// packages/web-ui/src/utils/format.ts
export function formatUsage(usage: Usage) {
if (!usage) return "";
const parts = [];
if (usage.input) parts.push(`↑${formatTokenCount(usage.input)}`);
if (usage.output) parts.push(`↓${formatTokenCount(usage.output)}`);
if (usage.cacheRead) parts.push(`R${formatTokenCount(usage.cacheRead)}`);
if (usage.cacheWrite) parts.push(`W${formatTokenCount(usage.cacheWrite)}`);
if (usage.cost?.total) parts.push(formatCost(usage.cost.total));
return parts.join(" ");
}
The UI shows token arrows (↑ for input, ↓ for output, R for cache read, W for cache write) alongside the dollar amount for each message.
CLI Cost Reporting
The scripts/cost.ts CLI tool reads session files and generates per-day, per-provider cost summaries:
// scripts/cost.ts (excerpt)
stats[day][provider].total += cost.total || 0;
stats[day][provider].input += cost.input || 0;
stats[day][provider].output += cost.output || 0;
...
// Finally prints a table with $‑values per provider and day.
This script aggregates the usage.cost fields stored in conversation history files, enabling users to track spending across different providers and time periods.
Handling Token Overflow
Pi-AI also monitors totalTokens against model context windows to detect silent truncation. The overflow detector in packages/ai/src/utils/overflow.ts compares usage.input against model.contextWindow:
When the provider silently truncates input, the overflow detector still reports the actual token count, letting the UI warn the user that part of the context was dropped.
Practical Example: Calculating Cost for a Single Response
To calculate the cost of a single assistant message programmatically:
import { getModel } from "packages/ai/src/models.js";
import { calculateCost } from "packages/ai/src/models.js";
async function example() {
const model = getModel("openai", "gpt-4o-mini");
// Assume `response` is an AssistantMessage returned from a provider
const { usage } = response; // already filled by the provider
const cost = calculateCost(model, usage);
console.log(`This reply cost $${cost.total.toFixed(4)}`);
}
The usage object already contains token counts populated by the provider adapter; calculateCost adds the monetary values based on the selected model's pricing.
Summary
- Unified Interface: The
Usageinterface inpackages/ai/src/types.tsstandardizes input, output, cache read, and cache write tokens across all providers. - Provider Normalization: Adapters in
packages/ai/src/providers/*.tsmap raw API responses to theUsagestructure before cost calculation. - Cost Calculation: The
calculateCost()function inpackages/ai/src/models.tsconverts token counts to dollars using per-model rates defined as $/million-tokens. - Runtime Aggregation: Agent runtimes in
packages/mom/src/agent.tsandpackages/coding-agent/src/core/agent-session.tssum usage across all messages to provide session totals. - Human-Readable Output: The
scripts/cost.tsCLI andpackages/web-ui/src/utils/format.tsformat token counts and costs for display, including per-day cost breakdowns.
Frequently Asked Questions
How does Pi-AI handle different token counting methods across providers?
Pi-AI uses provider-specific adapters located in packages/ai/src/providers/*.ts to normalize raw API responses into the unified Usage interface. Each adapter extracts provider-specific fields—such as OpenAI's input_tokens or Anthropic's cache read/write statistics—and maps them to the standard input, output, cacheRead, and cacheWrite properties before cost calculation occurs.
Where are the per-model pricing rates defined in the codebase?
The pricing rates are defined as dollars-per-million-tokens values within the generated model catalog accessed via packages/ai/src/models.ts. The calculateCost() function retrieves these rates from the model.cost object—which includes input, output, cacheRead, and cacheWrite pricing—and applies them to the token counts stored in the Usage object.
How can I view total token costs for a conversation session?
You can view aggregated costs through multiple interfaces: the scripts/cost.ts CLI tool reads session files and prints per-day, per-provider cost tables; the web UI displays formatted costs via packages/web-ui/src/utils/format.ts next to each message; and the coding-agent interactive mode shows running totals in the footer component located at packages/coding-agent/src/modes/interactive/components/footer.ts.
Does Pi-AI detect when input exceeds the model's context window?
Yes, Pi-AI monitors token overflow through packages/ai/src/utils/overflow.ts, which compares the usage.input count against the model.contextWindow limit. This detection runs independently of the provider's truncation behavior, ensuring the UI can warn users when silent input truncation occurs, even if the API response indicates success.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →