We keep hearing the word “token” whenever people talk about AI models. But what exactly are they?
Computers and LLMs (Large Language Models) can’t actually read text the way we do. To help them process our speech and writing, we have to break the text down into smaller, bite-sized pieces. These pieces are called tokens.
For example, a simple sentence like “I love you” would be broken down into three tokens: “I,” “love,” and “you.”
AI models process these tokens to understand meaning and perform tasks like generating answers, translating languages, or writing essays. When we evaluate an AI’s performance, “tokens per second” is a key metric. Think of it like a person’s reading speed—the more tokens a model can handle per second, the faster and more powerful it is.
You can think of tokens as the “Lego bricks” of AI language. The more efficiently the text is broken down, the more accurate and cost-effective the model becomes. For developers, tokens are also the unit of currency. When using an API like GPT-4, you are billed by the token—for instance, costing $2.50 per million input tokens and $10.00 per million output tokens.
