If you’ve used ChatGPT or any large language model (LLM), you’ve probably encountered the term “token”. Understanding what tokens are—and how to manage them—is key to using these models effectively. In this post, we’ll cover everything you need to know about tokens, including what they are, how they’re counted, how they split text, model-specific limits, and practical tips for reducing token usage.
1. What Is a Token?
A token is the unit of text that language models like ChatGPT process. They are not the same as words. A token can be:
- A single word (
apple
) - Part of a word (
un
,believable
) - A punctuation mark (
.
) - A space-prefixed word (
" the"
) - Even a single character or byte, depending on the model
Example:
The sentence ChatGPT is amazing.
might be tokenized as:["Chat", "G", "PT", " is", " amazing", "."]
→ 6 tokens
The exact breakdown depends on the tokenizer used (e.g., Byte Pair Encoding for GPT models).
2. How to Count Tokens
The number of tokens in your prompt + the number of tokens in the model’s response = total tokens used.
To count tokens:
- Use OpenAI’s Tokenizer Tool
- API users can check token usage in responses via the
usage
field
3. Tokenization Example
Let’s look at a classic sentence:
Text:The quick brown fox jumps over the lazy dog.
Tokenized (approx.):["The", " quick", " brown", " fox", " jumps", " over", " the", " lazy", " dog", "."]
→ 10 tokens
Notes:
- Leading spaces may be tokenized separately
- Common phrases may be encoded as a single token
- Rare or compound words may be split into multiple tokens
4. Token Limits by Model
Each ChatGPT model has a maximum context length, i.e., the total number of tokens it can handle at once (prompt + response). Here’s a comparison of the most commonly used models:
Model | Max Tokens (Context Length) | Notes |
---|---|---|
GPT-4o | 128,000 | Fastest GPT-4 model, supports text, image, audio |
GPT-4-turbo | 128,000 | Used in ChatGPT Plus; optimized for speed and cost |
GPT-4 (standard) | 8,192 | Available via API only |
GPT-3.5-turbo | 16,385 | Free-tier users in ChatGPT; also available via API |
GPT-3.5-turbo (legacy) | 4,096 | Deprecated, not recommended for new projects |
Important:
The token limit includes:
- The system prompt
- Conversation history
- Your latest message
- The model’s reply
If your input consumes too many tokens, the model will have less room to generate output—or might even return an error.
5. Tips for Reducing Token Usage
To get the most out of your prompts (and avoid getting cut off), consider the following best practices:
a. Be concise
- Avoid verbose phrasing
- Instead of: “Please explain in great detail…”
- Try: “Explain…”
b. Eliminate redundancy
- Don’t restate the same idea multiple times
- Remove repeated qualifiers or disclaimers
c. Avoid unnecessary formatting
- Skip unnecessary whitespace, headers, or markup (unless formatting is the task)
d. Compress large inputs
- If you’re feeding in documents, summarize them before input
- Use bullet points or key phrases over full paragraphs
e. Stick to consistent vocabulary
- Tokenizers are frequency-based: common, consistent phrasing compresses better
f. Watch for contractions and compounds
doesn't
may tokenize as"doesn"
+"'t"
, not one token- Some complex or rare words may be split more than expected
g. Start a new session
- In long conversations, previous messages stay in the context window
→ Create a new session if the conversation gets too long
Conclusion
Tokens are the currency of language models. Every input and output is counted in tokens, and understanding how they work helps you write better prompts, control costs, and prevent output from being cut off.
By mastering token usage—how to count them, how they split, and how to reduce them—you’re well on your way to making the most of ChatGPT or any other LLM-based tool.