Setting Up Your Local Agentic AI Environment
In this tutorial series, examples default to Python so the material can stay focused and go deeper.
We strongly recommend running everything locally with Ollama.
Cloud APIs (even free tiers like Gemini) quickly hit rate limits when you experiment, debug, or run many iterations. Ollama gives you unlimited local inference, zero cost, and full privacy — ideal for learning agentic AI by trial and error.
Note: Local Ollama will be significantly slower than cloud APIs, especially on older hardware. If you need more speed, you can switch to a cloud API (like Gemini) at any time — just be mindful of rate limits and costs.
1. Programming Language Setup
We use Python as the default language throughout the tutorials.
Official install links:
2. Install Ollama (Local LLM Runtime)
Ollama lets you run powerful local models (like Qwen 3.5) directly on your laptop.
macOS (Recommended: Homebrew)
# Install Ollama via Homebrew (easiest way on macOS)brew install ollama
# Start the Ollama servicebrew services start ollamaWindows / Linux / Other macOS methods
Visit the official download page → https://ollama.com/download
Best multi-platform video (Mac + Windows + Linux) – “Install Ollama on Mac, Windows & Linux (Step-by-Step)” (Dec 2025)
Additional excellent videos:
- Installing Ollama is EASY Everywhere (macOS, Windows, Linux)
- How to Install Ollama on macOS (Apple Silicon) – great for M-series Macs
- Ollama on Windows in 5 minutes
3. Pull the Model We Will Use
# Pull Qwen 3.5 (9B) — excellent balance of speed and reasoningollama pull qwen3.5:9bThis model runs well on most recent laptops (8GB+ RAM recommended).
4. Quick Test – Is Ollama Working?
Run the interactive chat:
ollama run qwen3.5:9bThen type:
Explain how a ReAct agent works in one paragraph.You should get a coherent answer instantly.
5. Verifying Your Setup (Ollama + Python)
Install the Python packages used by the examples:
python3 -m pip install ollama pydanticfrom ollama import chatfrom pydantic import BaseModel
response = chat( model="qwen3.5:9b", messages=[{"role": "user", "content": "Say hello from Ollama!"}])
print(response.message.content)When to Use What
| Use Case | Recommended Setup | Reason |
|---|---|---|
| Learning / experimentation | Ollama (local) | Unlimited trials, no cost |
| Production / high-scale | OpenAI / Gemini | Higher speed & scale |
| Offline / privacy-first | Ollama | Runs 100% locally |
Optional: Cloud Model Providers
This series uses Ollama by default. If you prefer a hosted model, you can use any provider that supports chat/completion APIs and tool calling.
- Gemini: https://aistudio.google.com/
- OpenAI: https://platform.openai.com/
- Anthropic: https://console.anthropic.com/
Cloud providers are useful for speed and stronger frontier models, but they introduce API keys, costs, rate limits, and data-sharing considerations.
You’re now fully set up!
→ Next: What is an Agent?