Ryan Wersal - Learnings

🧩 Tokens: The Building Blocks of AI Thought

Understanding how AI models break down text into tokens is crucial - it’s literally how they think, process costs are measured by it, and getting it wrong can lead to some hilariously bad outputs. Here are some interesting resources about tokenization that I dove into today:

OpenAI Tokenizer. This highlighted some interesting takeaways:

  • On average, a single token is about four characters.
  • Many symbols, at least visually, consume a token. Is that why system prompts seem to shy away from symbols? (Except when they don’t.)

tiktoken - Byte Pair Encoding (BPE) came up in a conversation about “compressing” tokens / prompts.