Build a voice chatbot

Connect to OpenAI and build a voice chatbot

Time: 10:35 AM to 11:25 AM

In this section, you will connect your robot to a real large language model and have a spoken conversation with it. By the end, the full loop will be running: you speak, the robot transcribes, GPT responds, and the robot speaks the answer out loud.

The architecture

Here is what you are building: Every component runs on your Raspberry Pi except the OpenAI API, which runs in the cloud. Vosk (speech-to-text) and espeak (text-to-speech) both run locally — no internet needed for those.

Set up the API key

The facilitator will provide an OpenAI API key. Store it on your robot:

nano ~/camp/secret.py

Paste the key between the quotes:

OPENAI_KEY = 'sk-...'

Save with Ctrl+O, Enter, Ctrl+X.

If espeak is not producing sound, run bash ~/camp/troubleshoot.sh and select the audio check.

Program 1: Test the API connection

Before adding voice, confirm that your robot can talk to GPT over the internet.

Step 1 — Import and connect

Load the API key from your secret file and create an OpenAI client:

#!/usr/bin/env python3
import sys
import os

sys.path.insert(0, os.path.expanduser("~/camp"))
from secret import OPENAI_KEY
from openai import OpenAI

client = OpenAI(api_key=OPENAI_KEY)

sys.path.insert tells Python where to find secret.py. The OpenAI client handles all communication with the API.

Step 2 — Send a single prompt

The API uses a messages list. Each message has a role and content:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. Keep answers short."},
        {"role": "user", "content": "In exactly one sentence, explain what a robot is to a teenager."},
    ],
    max_tokens=100,
)

answer = response.choices[0].message.content.strip()
print(f"Response: {answer}")

There are three roles:

system — hidden instructions that shape the model’s behavior
user — what you (the human) say
assistant — what the model says back

The model reads the entire list and generates the next assistant message.

Step 3 — Interactive chat loop

To have a back-and-forth conversation, keep a growing messages list. Append each user message and assistant response so the model remembers the conversation:

messages = [
    {"role": "system", "content": "You are a friendly robot assistant."},
]

while True:
    user_input = input("  You: ").strip()
    if user_input.lower() in ("quit", "exit", "q"):
        break

    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=150,
    )
    answer = response.choices[0].message.content.strip()
    messages.append({"role": "assistant", "content": answer})
    print(f"  GPT: {answer}")

Each time through the loop, the model sees the entire conversation history. This is how you give an LLM “memory” within a single session.

Run it

python3 ~/camp/day3/test_openai.py

You should see:

==================================================
  Test OpenAI API Connection
==================================================

  Connecting to OpenAI...
  Prompt: "In exactly one sentence, explain what a robot is to a teenager."
  Waiting for response...

  Model:    gpt-4o-mini-2024-07-18
  Tokens:   57
  Response: A robot is a programmable machine that can perform tasks
            automatically, often mimicking human actions.

  API connection works!

  Try sending your own messages (type 'quit' to exit):

  You: how are you?
  GPT: I'm just a robot, but I'm here and ready to help you!

If you get an authentication error, double-check the API key in ~/camp/secret.py. Make sure there are no extra spaces or missing quotes.

Click to see the complete test_openai.py program

#!/usr/bin/env python3
"""
Test OpenAI API Connection — sends a prompt to GPT-4o-mini,
prints the response, then lets you chat interactively.
"""

import sys
import os

sys.path.insert(0, os.path.expanduser("~/camp"))

from secret import OPENAI_KEY
from openai import OpenAI


def main():
    print("=" * 50)
    print("  Test OpenAI API Connection")
    print("=" * 50)

    print("\n  Connecting to OpenAI...")
    client = OpenAI(api_key=OPENAI_KEY)

    prompt = "In exactly one sentence, explain what a robot is to a teenager."
    print(f"  Prompt: \"{prompt}\"")
    print("  Waiting for response...\n")

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Keep answers short."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=100,
    )

    answer = response.choices[0].message.content.strip()
    model = response.model
    tokens = response.usage.total_tokens

    print(f"  Model:    {model}")
    print(f"  Tokens:   {tokens}")
    print(f"  Response: {answer}")
    print("\n  API connection works!\n")

    print("  Try sending your own messages (type 'quit' to exit):\n")
    messages = [
        {"role": "system", "content": "You are a friendly robot assistant. Keep answers under 2 sentences."},
    ]

    while True:
        try:
            user_input = input("  You: ").strip()
        except (EOFError, KeyboardInterrupt):
            break
        if user_input.lower() in ("quit", "exit", "q"):
            break
        if not user_input:
            continue

        messages.append({"role": "user", "content": user_input})
        response = client.chat.completions.create(
            model="gpt-4o-mini", messages=messages, max_tokens=150,
        )
        answer = response.choices[0].message.content.strip()
        messages.append({"role": "assistant", "content": answer})
        print(f"  GPT: {answer}\n")

    print("\n  Done!\n")


if __name__ == "__main__":
    main()

Program 2: Test speech-to-text

Now test the microphone and Vosk speech recognition. This program listens through your USB mic and prints what it hears in real time.

Step 1 — Suppress ALSA noise and import libraries

The Pi’s audio system prints dozens of harmless warnings. This block silences them before importing the audio libraries:

#!/usr/bin/env python3
import sys, os, json, ctypes

try:
    asound = ctypes.cdll.LoadLibrary("libasound.so.2")
    c_handler = ctypes.CFUNCTYPE(None, ctypes.c_char_p, ctypes.c_int,
                                  ctypes.c_char_p, ctypes.c_int, ctypes.c_char_p)
    asound.snd_lib_error_set_handler(c_handler(lambda *_: None))
except:
    pass

import vosk
import pyaudio

You do not need to understand the ctypes code — it just tells ALSA to be quiet. The important imports are vosk (speech recognition) and pyaudio (microphone access).

Step 2 — Find the USB microphone

The Pi may have multiple audio devices. This function scans for one with “usb” in its name:

def find_mic():
    pa = pyaudio.PyAudio()
    usb_idx = None
    any_idx = None

    for i in range(pa.get_device_count()):
        info = pa.get_device_info_by_index(i)
        if info["maxInputChannels"] < 1:
            continue
        if any_idx is None:
            any_idx = i
        if "usb" in info["name"].lower():
            usb_idx = i

    idx = usb_idx if usb_idx is not None else any_idx
    native_rate = int(pa.get_device_info_by_index(idx)["defaultSampleRate"])
    pa.terminate()
    return idx, native_rate

It returns the device index and the mic’s native sample rate (usually 44100 Hz for USB mics). Using the native rate avoids “Invalid sample rate” errors.

Step 3 — Load Vosk and open the mic stream

Create the speech recognition model and open a live audio stream from the mic:

mic_idx, mic_rate = find_mic()
chunk = mic_rate // 4

vosk.SetLogLevel(-1)
model = vosk.Model(os.path.expanduser("~/camp/vosk-model"))
recognizer = vosk.KaldiRecognizer(model, mic_rate)

pa = pyaudio.PyAudio()
stream = pa.open(
    format=pyaudio.paInt16, channels=1, rate=mic_rate,
    input=True, input_device_index=mic_idx,
    frames_per_buffer=chunk,
)

chunk is how many audio samples to read at a time — one quarter of a second’s worth. The recognizer processes these chunks and detects when you finish a sentence.

Step 4 — The transcription loop

Read audio chunks in a loop. When Vosk detects a complete sentence, print it:

count = 0
while True:
    data = stream.read(chunk, exception_on_overflow=False)

    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        text = result.get("text", "").strip()
        if text:
            count += 1
            print(f"  [{count:>3}] {text}")

AcceptWaveform returns True when the recognizer detects a pause — meaning you finished a word or sentence. The result is a JSON string containing the transcribed text.

Run it

python3 ~/camp/day3/test_stt_vosk.py

Speak clearly into the USB microphone:

==================================================
  Test Speech-to-Text (Vosk)
==================================================

  Loading model from /home/car/vosk-model...
  Model loaded.
  Microphone: USB PnP Sound Device (device 0)
  Sample rate: 44100 Hz

  Listening! Speak into the microphone.
  Press Ctrl-C to stop.

  [  1] what is the meaning of life
  [  2] move forward
  [  3] celebrate

Press Ctrl-C to stop.

Tips for better recognition:

Speak slowly and clearly — about half your normal speed
Keep the microphone 6-12 inches from your mouth
Minimize background noise (close windows, turn off fans)
Short, distinct phrases work better than long sentences

Click to see the complete test_stt_vosk.py program

#!/usr/bin/env python3
"""
Test Speech-to-Text — records from the USB mic and transcribes
in real time using Vosk offline speech recognition.
"""

import sys, os, json, ctypes

try:
    asound = ctypes.cdll.LoadLibrary("libasound.so.2")
    c_handler = ctypes.CFUNCTYPE(None, ctypes.c_char_p, ctypes.c_int,
                                  ctypes.c_char_p, ctypes.c_int, ctypes.c_char_p)
    asound.snd_lib_error_set_handler(c_handler(lambda *_: None))
except: pass

import vosk, pyaudio

MODEL_PATH = os.path.expanduser("~/camp/vosk-model")

def find_mic():
    pa = pyaudio.PyAudio()
    usb_idx = any_idx = None
    for i in range(pa.get_device_count()):
        info = pa.get_device_info_by_index(i)
        if info["maxInputChannels"] < 1: continue
        if any_idx is None: any_idx = i
        if "usb" in info["name"].lower(): usb_idx = i
    idx = usb_idx if usb_idx is not None else any_idx
    rate = int(pa.get_device_info_by_index(idx)["defaultSampleRate"])
    pa.terminate()
    return idx, rate

def main():
    print("=" * 50)
    print("  Test Speech-to-Text (Vosk)")
    print("=" * 50)

    mic_idx, mic_rate = find_mic()
    chunk = mic_rate // 4

    print(f"\n  Loading model from {MODEL_PATH}...")
    vosk.SetLogLevel(-1)
    model = vosk.Model(MODEL_PATH)
    recognizer = vosk.KaldiRecognizer(model, mic_rate)
    print("  Model loaded.")

    pa = pyaudio.PyAudio()
    mic_name = pa.get_device_info_by_index(mic_idx)["name"]
    print(f"  Microphone: {mic_name} (device {mic_idx})")
    print(f"  Sample rate: {mic_rate} Hz")

    stream = pa.open(format=pyaudio.paInt16, channels=1, rate=mic_rate,
                     input=True, input_device_index=mic_idx,
                     frames_per_buffer=chunk)

    print("\n  Listening! Speak into the microphone.")
    print("  Press Ctrl-C to stop.\n")

    CLEAR = "\033[2K\r"
    count = 0
    try:
        while True:
            data = stream.read(chunk, exception_on_overflow=False)
            if recognizer.AcceptWaveform(data):
                result = json.loads(recognizer.Result())
                text = result.get("text", "").strip()
                if text:
                    count += 1
                    sys.stdout.write(CLEAR)
                    print(f"  [{count:>3}] {text}")
            else:
                partial = json.loads(recognizer.PartialResult())
                partial_text = partial.get("partial", "").strip()
                if partial_text:
                    sys.stdout.write(f"{CLEAR}  ...  {partial_text}")
                    sys.stdout.flush()
    except KeyboardInterrupt:
        pass
    finally:
        stream.stop_stream()
        stream.close()
        pa.terminate()

    print(f"\n\n  Done! Transcribed {count} sentences.\n")

if __name__ == "__main__":
    main()

Program 3: The voice chatbot

Now combine everything into a full conversation loop.

Step 1 — The speak function

Use espeak to convert text to speech:

import os

def speak(text, speed=160):
    safe = text.replace("'", "'\\''").replace('"', '\\"')
    os.system(f"espeak -s {speed} '{safe}' 2>/dev/null")

This blocks until the robot finishes speaking, so the mic does not pick up the robot’s own voice.

Step 2 — The listen function

Wrap the Vosk transcription loop in a function with a timeout. This listens for a single sentence:

import json, time

def listen_once(recognizer, stream, chunk, timeout=8):
    start = time.time()
    while time.time() - start < timeout:
        data = stream.read(chunk, exception_on_overflow=False)
        if recognizer.AcceptWaveform(data):
            result = json.loads(recognizer.Result())
            text = result.get("text", "").strip()
            if text:
                return text
    final = json.loads(recognizer.FinalResult())
    return final.get("text", "").strip()

If no speech is detected within timeout seconds, it returns whatever partial text it has (or an empty string).

Step 3 — The system prompt

Define the robot’s personality. This hidden instruction shapes every response:

SYSTEM_PROMPT = """You are a friendly robot assistant built by a student at a STEM camp.
You are running on a Raspberry Pi inside a PiCar-X robot car.
Keep your answers to 1-3 sentences so they sound natural when spoken aloud.
Be enthusiastic and encouraging. You love science and technology."""

Short answers work best because espeak reads them aloud. A 3-paragraph response would take 30 seconds to speak.

Step 4 — The main conversation loop

Tie everything together. Listen, send to GPT, speak the response, repeat:

from openai import OpenAI

client = OpenAI(api_key=OPENAI_KEY)
messages = [{"role": "system", "content": SYSTEM_PROMPT}]

speak("Hello! I am your robot. Ask me anything.")

while True:
    text = listen_once(recognizer, stream, chunk, timeout=10)
    if not text:
        continue

    if text.lower() in ("goodbye", "bye", "quit"):
        speak("Goodbye!")
        break

    messages.append({"role": "user", "content": text})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=200,
    )
    answer = response.choices[0].message.content.strip()
    messages.append({"role": "assistant", "content": answer})

    print(f"  Robot: {answer}")
    speak(answer)

The messages list grows with every exchange. The model sees the full history, so it can reference things you said earlier in the conversation.

Run it

python3 ~/camp/day3/voice_chatbot.py

Or use keyboard mode if your mic is giving trouble:

python3 ~/camp/day3/voice_chatbot.py --type

You should hear the robot greet you and then listen for your questions:

==================================================
  PiCar-X  —  Voice Chatbot
==================================================

  Robot: Hello! I am your robot. Ask me anything.

  🎤  Listening...
  You:   what is the meaning of life
  Robot: Great question! Many people believe it's about
         finding your purpose and making connections.
         What do you think?

Say “goodbye” to end the conversation.

Click to see the complete voice_chatbot.py program

#!/usr/bin/env python3
"""
Voice Chatbot — full conversation loop:
  Vosk (STT) → GPT-4o-mini → espeak (TTS)

Usage:
    python3 voice_chatbot.py          # voice mode
    python3 voice_chatbot.py --type   # keyboard mode
"""

import sys, os, json, time, threading, ctypes

try:
    asound = ctypes.cdll.LoadLibrary("libasound.so.2")
    c_handler = ctypes.CFUNCTYPE(None, ctypes.c_char_p, ctypes.c_int,
                                  ctypes.c_char_p, ctypes.c_int, ctypes.c_char_p)
    asound.snd_lib_error_set_handler(c_handler(lambda *_: None))
except: pass

sys.path.insert(0, os.path.expanduser("~/camp"))
from secret import OPENAI_KEY
from openai import OpenAI
import vosk, pyaudio

MODEL_PATH = os.path.expanduser("~/camp/vosk-model")

SYSTEM_PROMPT = """You are a friendly robot assistant built by a student at a STEM camp.
You are running on a Raspberry Pi inside a PiCar-X robot car.
Keep your answers to 1-3 sentences so they sound natural when spoken aloud.
Be enthusiastic and encouraging. You love science and technology."""

def speak(text, speed=160):
    safe = text.replace("'", "'\\''").replace('"', '\\"')
    os.system(f"espeak -s {speed} '{safe}' 2>/dev/null")

def find_mic():
    pa = pyaudio.PyAudio()
    usb_idx = any_idx = None
    for i in range(pa.get_device_count()):
        info = pa.get_device_info_by_index(i)
        if info["maxInputChannels"] < 1: continue
        if any_idx is None: any_idx = i
        if "usb" in info["name"].lower(): usb_idx = i
    idx = usb_idx if usb_idx is not None else any_idx
    rate = int(pa.get_device_info_by_index(idx)["defaultSampleRate"])
    pa.terminate()
    return idx, rate

def listen_once(recognizer, stream, chunk, timeout=8):
    start = time.time()
    while time.time() - start < timeout:
        data = stream.read(chunk, exception_on_overflow=False)
        if recognizer.AcceptWaveform(data):
            text = json.loads(recognizer.Result()).get("text", "").strip()
            if text: return text
    return json.loads(recognizer.FinalResult()).get("text", "").strip()

def main():
    type_mode = "--type" in sys.argv or "-t" in sys.argv

    print("=" * 50)
    print("  PiCar-X  —  Voice Chatbot")
    print("=" * 50)

    stream = pa = recognizer = None
    chunk = 0

    if not type_mode:
        mic_idx, mic_rate = find_mic()
        chunk = mic_rate // 4
        vosk.SetLogLevel(-1)
        recognizer = vosk.KaldiRecognizer(vosk.Model(MODEL_PATH), mic_rate)
        pa = pyaudio.PyAudio()
        stream = pa.open(format=pyaudio.paInt16, channels=1, rate=mic_rate,
                         input=True, input_device_index=mic_idx,
                         frames_per_buffer=chunk)

    client = OpenAI(api_key=OPENAI_KEY)
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    speak("Hello! I am your robot. Ask me anything.")
    print("\n  Say 'goodbye' to quit. Press Ctrl-C to stop.\n")

    try:
        while True:
            if type_mode:
                text = input("  You: ").strip()
            else:
                sys.stdout.write("  🎤  Listening... ")
                sys.stdout.flush()
                text = listen_once(recognizer, stream, chunk, timeout=10)
                if not text:
                    sys.stdout.write("(nothing heard)\n")
                    continue
                print(f"\n  You:   {text}")

            if not text: continue
            if text.lower() in ("goodbye", "bye", "quit", "exit", "q"):
                speak("Goodbye!"); break

            messages.append({"role": "user", "content": text})
            answer = client.chat.completions.create(
                model="gpt-4o-mini", messages=messages, max_tokens=200
            ).choices[0].message.content.strip()
            messages.append({"role": "assistant", "content": answer})
            print(f"  Robot: {answer}\n")
            speak(answer)
    except KeyboardInterrupt:
        print("\n  Interrupted.")
    finally:
        if stream: stream.close()
        if pa: pa.terminate()

if __name__ == "__main__":
    main()

Customize the personality

The system prompt defines who the robot thinks it is. Try changing the SYSTEM_PROMPT variable in voice_chatbot.py to give the robot a different personality:

Persona	System prompt
Space tour guide	You are a tour guide who only talks about the solar system. Every answer must reference a planet, moon, or star.
Grumpy mechanic	You are a grumpy old robot mechanic. You answer questions but always complain about how hard your job is.
Rhyming poet	You are a poet. Every response must rhyme. Keep responses to 2 lines.
Game show host	You are an enthusiastic game show host. Turn every answer into a trivia question format.

Edit the file:

nano ~/camp/day3/voice_chatbot.py

Find the SYSTEM_PROMPT variable near the top and replace the text between the triple quotes.

The system prompt is the single most powerful control you have over the model’s behavior. There is no wrong answer here — experiment freely. The weirder the persona, the more fun it is.

Challenge

Create a persona that refuses to answer questions about anything except robots. If someone asks about the weather, it should steer the conversation back to robots. Test it with 5 different questions and see if it stays in character.

Welcome

Class Recordings

Day 1: Setup and Calibration

Day 2: Code & Computer Vision

Day 3: GenAI and Cloud LLMs

Day 4: Vision AI

Day 5: AI Ethics & Final Project

Connect to OpenAI and build a voice chatbot

The architecture

Set up the API key

Program 1: Test the API connection

Step 1 — Import and connect

Step 2 — Send a single prompt

Step 3 — Interactive chat loop

Run it

Program 2: Test speech-to-text

Step 1 — Suppress ALSA noise and import libraries

Step 2 — Find the USB microphone

Step 3 — Load Vosk and open the mic stream

Step 4 — The transcription loop

Run it

Program 3: The voice chatbot

Step 1 — The speak function

Step 2 — The listen function

Step 3 — The system prompt

Step 4 — The main conversation loop

Run it

Customize the personality

Challenge

​Connect to OpenAI and build a voice chatbot

​The architecture

​Set up the API key

​Program 1: Test the API connection

​Step 1 — Import and connect

​Step 2 — Send a single prompt

​Step 3 — Interactive chat loop

​Run it

​Program 2: Test speech-to-text

​Step 1 — Suppress ALSA noise and import libraries

​Step 2 — Find the USB microphone

​Step 3 — Load Vosk and open the mic stream

​Step 4 — The transcription loop

​Run it

​Program 3: The voice chatbot

​Step 1 — The speak function

​Step 2 — The listen function

​Step 3 — The system prompt

​Step 4 — The main conversation loop

​Run it

​Customize the personality

​Challenge

Connect to OpenAI and build a voice chatbot

The architecture

Set up the API key

Program 1: Test the API connection

Step 1 — Import and connect

Step 2 — Send a single prompt

Step 3 — Interactive chat loop

Run it

Program 2: Test speech-to-text

Step 1 — Suppress ALSA noise and import libraries

Step 2 — Find the USB microphone

Step 3 — Load Vosk and open the mic stream

Step 4 — The transcription loop

Run it

Program 3: The voice chatbot

Step 1 — The speak function

Step 2 — The listen function

Step 3 — The system prompt

Step 4 — The main conversation loop

Run it

Customize the personality

Challenge