hmu.ai
Back to AI Dictionary
AI Dictionary

Token

Definition

A basic unit of text (like a word or character) that an LLM processes.

Deep Dive

In the context of large language models (LLMs) and natural language processing (NLP), a token is the most fundamental unit of text that the model processes. This unit can vary significantly depending on the tokenization strategy employed; it might represent an entire word, a subword (like "un-" in "unhappy"), a single character, or even a punctuation mark. The process of breaking down raw text into these discrete tokens, known as tokenization, is a critical preprocessing step before any text can be fed into an LLM for analysis or generation.

Examples & Use Cases

  • 1The sentence "Hello, world!" might be tokenized into ["Hello", ",", "world", "!"].
  • 2A model processing a long document will count the total number of tokens to estimate processing cost and time.
  • 3Identifying common subword tokens like "un-" or "-ing" to represent variations of words more efficiently.

Related Terms

TokenizationEmbeddingVocabulary

Part of the hmu.ai extensive business and technology library.