What are LLMs?

What is a large language model, really?

Time: 9:30 AM to 10:05 AM

Before you connect your robot to an LLM, you need to understand what an LLM actually is and how it works. No math required, no jargon. Think of it as autocomplete on steroids. You know how your phone predicts the next word when you type a text? Imagine that same idea, but trained on billions of pages of text from the internet — books, Wikipedia, code, conversations, research papers. That is a large language model.

Tokens: how an LLM reads text

An LLM does not read words the way you do. It splits text into tokens — fragments that can be whole words, parts of words, or single characters. Every piece of text goes through this process before the model touches it. Here is how the sentence “The robot drives forward” gets tokenized:

Token #	Token	Type
1	`The`	Full word
2	`robot`	Full word (note the leading space)
3	`drives`	Full word
4	`forward`	Full word

That was a simple sentence. Longer or unusual words get split into pieces:

Text	Tokens
`hello`	`hello` (1 token)
`Raspberry`	`Rasp` + `berry` (2 tokens)
`PiCar-X`	`Pi` + `Car` + `-` + `X` (4 tokens)
`🤖`	`🤖` (1 token)

You can see exactly how any text gets tokenized using OpenAI’s free tool at platform.openai.com/tokenizer. Try pasting your name and see how many tokens it becomes.

Why does this matter? Because everything the model does is measured in tokens — how much text it can read, how much it can generate, and how much it costs. A typical English word is about 1.3 tokens.

How training works

A model learns by reading billions of tokens and adjusting billions of numbers (called weights) to get better at one task: predicting the next token. That is the entire training loop. No one programs the answers. No one writes rules. The model discovers patterns on its own by seeing enormous amounts of text. This loop runs trillions of times across weeks of training on thousands of GPUs. The result is a model that has learned grammar, facts, reasoning patterns, code syntax, and even humor — all from predicting the next token.

GPT-4 has roughly 1.8 trillion parameters (weights). Training it cost over $100 million in compute. You are about to use it for free from a Raspberry Pi.

How inference works

Inference is what happens when you actually use the model. You send a message, and the model generates a response one token at a time. It does not “know” the answer. It does not look anything up. It generates the most likely next token, appends it, and repeats — like an extremely sophisticated autocomplete. Each token takes a fraction of a second. The model generates dozens of tokens per second, which is why responses feel nearly instant.

Temperature: controlling randomness

When the model picks the next token, it does not always pick the single most likely one. The temperature setting controls how much randomness is allowed.

Temperature	Behavior	Example response to “Tell me a joke”
0.0	Always picks the most likely token. Deterministic and predictable.	”Why did the chicken cross the road? To get to the other side.”
0.7	Balanced. Some creativity, mostly coherent. (This is what most apps use.)	”Why did the robot go to school? To improve its bits and bytes!“
1.5	High randomness. Creative but sometimes nonsensical.	”A toaster walked into a cloud and said ‘beep boop Wednesday!’”

Think of temperature like a creativity dial. Turn it down for factual answers. Turn it up for brainstorming and storytelling.

System prompts: shaping the model’s personality

A system prompt is a hidden instruction prepended before your message. It defines the model’s role, personality, and constraints. The user never sees it, but it shapes every response. Here is the same question — “What is the speed of light?” — answered with three different system prompts:

System prompt	Response
You are a science teacher explaining to a 10-year-old.	”Light is super fast! It travels about 186,000 miles every single second. That means it could go around the Earth more than 7 times in one second!”
You are a pirate who answers everything in pirate speak.	”Arrr, the speed o’ light be about 300 million meters per second, matey! Faster than any ship on the seven seas!”
You are a robot that only responds with exactly 5 words.	”Three hundred million meters second.”

The system prompt is the single most powerful control you have over the model’s behavior. Later today, you will write system prompts for your robot.

Context window: the model’s working memory

The context window is the maximum amount of text the model can consider at once — your messages, its responses, and the system prompt all count. Think of it like a whiteboard that can only hold a fixed number of sticky notes. When the whiteboard fills up, the oldest notes fall off. The model cannot remember anything outside its context window.

Model	Context window
GPT-4o-mini	~128,000 tokens (~96,000 words)
GPT-4o	~128,000 tokens
Claude 3.5	~200,000 tokens
Llama 3 (8B)	~8,000 tokens

Your robot conversations will use a few hundred tokens at most, so you will not hit this limit. But it explains why the model “forgets” things from earlier in very long conversations.

Key terms

Term	What it means
Parameters / weights	The numbers the model learned during training — its “knowledge”
Context window	Maximum text the model can consider at once
Temperature	Controls randomness: low = predictable, high = creative
Tokens	The units of text the model reads and generates
Prompt	The input you send to the model
Completion	The output the model generates
Inference	The process of generating a response from a trained model
System prompt	Hidden instructions that shape the model’s behavior
Hallucination	When the model generates confident-sounding but factually wrong information
API	Application Programming Interface — how your code talks to the model over the internet
Fine-tuning	Additional training on specific data to specialize a model

What LLMs cannot do

Out of the box, an LLM is a brain in a jar. It can think, but it cannot interact with the world.

These are not bugs — they are fundamental limits of how LLMs work. Every conversation starts from scratch. The model cannot Google something. It cannot move your robot. It cannot remember what you said yesterday.Unless you give it tools. That is exactly what you will do next.

How you will fix these limits this week

Limitation	Solution	When
Cannot take actions	Tool calls — give the LLM functions it can trigger	Today (next section)
Cannot see	Vision language models — feed camera images to the model	Tomorrow (Day 4)
No live information	Tool calls — give it a “search” or “sensor read” function	Today
No memory	Conversation history — your code keeps track of past messages	Today

Discussion questions

Take 5 minutes to discuss these with your group or facilitator:

Can an LLM lie? It does not “know” what is true — it predicts likely text. Is a wrong prediction a lie?
Is the model creative? It generates novel combinations of patterns it learned. Is that creativity or just remixing?
What happens if training data is biased? If the internet text contains biases, what does the model learn?
Should you trust an LLM’s answer? How would you verify something it tells you?

Welcome

Class Recordings

Day 1: Setup and Calibration

Day 2: Code & Computer Vision

Day 3: GenAI and Cloud LLMs

Day 4: Vision AI

Day 5: AI Ethics & Final Project

What is a large language model, really?

Tokens: how an LLM reads text

How training works

How inference works

Temperature: controlling randomness

System prompts: shaping the model’s personality

Context window: the model’s working memory

Key terms

What LLMs cannot do

How you will fix these limits this week

Discussion questions

​What is a large language model, really?

​Tokens: how an LLM reads text

​How training works

​How inference works

​Temperature: controlling randomness

​System prompts: shaping the model’s personality

​Context window: the model’s working memory

​Key terms

​What LLMs cannot do

​How you will fix these limits this week

​Discussion questions

What is a large language model, really?

Tokens: how an LLM reads text

How training works

How inference works

Temperature: controlling randomness

System prompts: shaping the model’s personality

Context window: the model’s working memory

Key terms

What LLMs cannot do

How you will fix these limits this week

Discussion questions