š§© Tokens: The Building Blocks of AI Thought
Understanding how AI models break down text into tokens is crucial - it’s literally how they think, process costs are measured by it, and getting it wrong can lead to some hilariously bad outputs. Here are some interesting resources about tokenization that I dove into today:
OpenAI Tokenizer. This highlighted some interesting takeaways:
- On average, a single token is about four characters.
- Many symbols, at least visually, consume a token. Is that why system prompts seem to shy away from symbols? (Except when they don’t.)
tiktoken - Byte Pair Encoding (BPE) came up in a conversation about “compressing” tokens / prompts.