PromptPriceCalculate

AI pricing guide

How Ai Token Pricing Works

AI token pricing usually separates input tokens from output tokens. Input tokens are the prompt and context you send. Output tokens are what the model generates back.

Output tokens often cost more because they are generated sequentially and require more active compute. Cached input and batch pricing can reduce the total bill when your workload fits those rules.

For the cleanest estimate, copy usage numbers from your provider response and paste them into the calculator advanced JSON box.

Calculate your AI cost

How AI token billing actually works

Most LLM APIs bill in two main buckets: tokens you send to the model and tokens the model generates back. The prompt, instructions, pasted documents, chat history, tool definitions, and system messages can all count as input. The answer, reasoning output exposed by the provider, citations, and tool results can add separate output or add-on charges.

A token is not exactly a word. For plain English, 1,000 tokens is often around 750 words, but code, tables, JSON, emojis, and multilingual text can move that estimate. This is why PromptPrice gives a practical estimate first, then lets advanced users paste provider usage JSON when they need billing-grade accuracy.

Why the same prompt can have different prices

Model pricing is not just one number. A cheap model may have low input prices but higher output prices, while a premium reasoning model may add separate reasoning or research fees. Some providers also price cached input, batch jobs, priority routing, web search, image generation, or long-context requests differently.

The safest way to compare models is to estimate the actual shape of your request: how much context you send, how long the answer should be, whether the model uses tools, and whether the same context repeats often enough for cache pricing to matter.

Common mistakes that make AI bills higher

  • Sending a full document or chat history every time when only a short excerpt is needed.
  • Using a high-end model for simple formatting, extraction, or classification tasks.
  • Asking for long answers by default instead of setting a concise answer target.
  • Ignoring output token pricing, even though generated answers often cost more than the prompt.
  • Forgetting tool, search, citation, image, audio, or regional add-on fees.

A practical workflow before you send an expensive prompt

  1. Pick the provider and model you plan to use.
  2. Paste the exact question, prompt, or document chunk you intend to send.
  3. Choose a short, medium, or long expected answer length.
  4. Open expert controls only if you have exact token counts, cache tokens, tool calls, or API usage JSON.
  5. Compare cheaper alternatives before sending the final request to your API.

When exact usage JSON matters

Estimates are useful before a request is sent, but provider usage JSON is the closer source of truth after a request runs. If your app is already calling an API, copy the usage object from the response and paste it into PromptPrice expert mode. That avoids guessing token counts and gives the calculator the same kind of numbers the provider uses for billing.

Quick estimate

No API key needed. Pick a model, paste your question, choose the answer length, and get a close cost estimate.

$5/M input$30/M output400,000 ctxCache pricingBatchFlexPriority

3. How long will the AI answer be?

Your estimate will appear here.

Paste a prompt, choose an answer length, and calculate.