Skip to main content

What is a large language model, really?

Time: 9:30 AM to 10:05 AM
Before you connect your robot to an LLM, you need to understand what an LLM actually is and how it works. No math, no jargon. Think of it as autocomplete on steroids.

Tokens

Everything starts with tokens. When you type a sentence, the model does not read words. It splits your text into tokens, which are fragments of words, whole words, or punctuation. You can visualize this using OpenAI’s tokenizer tool.

Training

A model learns by reading billions of tokens and adjusting billions of numbers (called weights) to get better at one task: predicting the next token. That is the entire training loop.

Inference

When you send a message, the model predicts the most likely next token, then the next, then the next. It does not “know” the answer. It generates one token at a time based on probability.

System prompts

A system prompt is context prepended before your message that shapes what the model generates. It defines the model’s role, personality, and constraints.

Key terms

TermMeaning
Parameters/weightsThe numbers the model learned during training
Context windowThe maximum amount of text the model can consider at once
TemperatureControls randomness: low = predictable, high = creative
TokensThe units of text the model reads and generates
PromptThe input you send to the model
CompletionThe output the model generates

What LLMs cannot do by default

LLMs have three major limitations out of the box:
  • No memory between conversations - each session starts fresh
  • No real-time information - the model only knows what it learned during training
  • No ability to take actions - unless you give it tools (which is exactly what you will do next)