# Building with Claude — Companion Notebooks

Companion lab notebooks for the *Building with Claude — A Practitioner's Guide to the Anthropic API* series, published by [DataMy](https://datamy.co).

This is the runnable-code half of the series. Each notebook corresponds to one article and runs top-to-bottom from a fresh kernel given only an `ANTHROPIC_API_KEY` and the `data/` folder.

> **Attribution.** Concepts in this series are adapted from Anthropic's "Building with the Claude API" course (Coursera) and public API documentation at [docs.anthropic.com](https://docs.anthropic.com). All code and example datasets in this repository are original work © 2026 DataMy. **Not affiliated with Anthropic.**

---

## Quick start

```bash
# 1. Clone or download this folder
cd claude-api-practitioner-guide

# 2. Create a virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies (pick one)
pip install -r requirements.txt
# or, for a much faster install:
# uv pip install -r requirements.txt

# 4. Configure your API keys
cp .env.example .env
# then edit .env and paste your real ANTHROPIC_API_KEY (and VOYAGE_API_KEY for RAG notebooks)

# 5. Generate the shared datasets (creates / refreshes everything under data/)
python scripts/generate_data.py

# 6. Launch Jupyter from the notebooks/ directory
#    (so `from llm_client import ClaudeClient` resolves cleanly)
jupyter notebook notebooks/
```

---

## Notebook map

Each notebook is a companion to one article in the series.

| Notebook | Companion article | What you'll build |
|---|---|---|
| `A2_setup_and_first_call.ipynb` | A2. Setup and your first robust API call | A reusable client wrapper with retries, streaming, and cost logging |
| `B1_system_prompts_output.ipynb` | B1. System prompts, roles, and output control | An analytics-copilot persona that returns structured JSON |
| `B2_multimodal_images_pdf.ipynb` | B2. Multimodal inputs: images and PDFs | A dashboard-screenshot reader + a QBR-report Q&A bot |
| `B3_caching_and_thinking.ipynb` | B3. Augmenting model reasoning | Prompt caching cost model + an extended-thinking root-cause analyzer |
| `B4_rag_essentials.ipynb` | B4. RAG essentials | Chunking, embedding, and hybrid retrieval over a runbook corpus |
| `B5_rag_advanced.ipynb` | B5. RAG advanced | Reranking + Anthropic-style contextual retrieval on the same corpus |
| `C1_builtin_tools.ipynb` | C1. Built-in tools | Code execution over a warehouse-usage CSV; web-search guardrails |
| `C2_custom_tools.ipynb` | C2. Custom tools / function calling | A custom tool suite: metrics query, job status, alert dispatch |
| `D1_mcp_client.ipynb` | D1. MCP concepts and using an MCP server as a client | Connecting Claude to an existing filesystem MCP server |
| `D2_mcp_server.ipynb` | D2. Building your own MCP server | Wrapping the C2 custom tools as a standalone MCP server |

> No notebook for A1 — that article is the series introduction and contains no code.

---

## Project structure & file roles

```
claude-api-practitioner-guide/
├── README.md              ← this file
├── requirements.txt       ← pinned Python dependencies for all 10 notebooks
├── .env.example           ← template for your API keys (copy to .env)
├── .gitignore             ← keeps secrets, venvs, and checkpoints out of version control
│
├── data/                  ← shared datasets read by the notebooks (generated, not hand-written)
│   ├── saas_metrics.csv
│   ├── pipeline_jobs.csv         (added when C2 ships)
│   ├── warehouse_usage.csv       (added when C1 ships)
│   ├── qbr_q3_2025.md / .pdf     (added when B2 ships)
│   ├── runbook_*.md              (added when B3/B4 ship)
│   └── dashboard_screenshot.png  (added when B2 ships)
│
├── scripts/
│   ├── generate_data.py   ← (re)builds everything in data/ from synthetic generators
│   └── data_monitor_cli.py ← standalone CLI agent backed by the MCP server (D2 capstone)
│
└── notebooks/
    ├── llm_client.py      ← the ClaudeClient wrapper (built in A2, imported by B1 onward)
    ├── A2_setup_and_first_call.ipynb
    ├── B1_system_prompts_output.ipynb
    └── ...                ← one .ipynb per article from B2 to D2
```

### Why each non-notebook file exists

| File | What it is | When you touch it |
|---|---|---|
| `requirements.txt` | A flat list of Python packages with minimum versions. Covers every notebook in the series so a single `pip install` is enough. | Once at setup. Revisit if you ever see an import error. |
| `.env.example` | A template showing which environment variables are needed (`ANTHROPIC_API_KEY`, `VOYAGE_API_KEY` for the RAG notebooks). It is committed; the real `.env` is not. | Copy once to `.env` and fill in your real keys. |
| `.gitignore` | Excludes `.env`, the virtualenv, Jupyter checkpoints, and OS noise from version control. | Generally don't touch. |
| `scripts/generate_data.py` | A pure-Python script that creates every CSV / Markdown / PDF / PNG under `data/`. Idempotent (re-running overwrites with the same deterministic output thanks to a fixed random seed). | Run once after install. Re-run if you change a generator function or want fresh data. |
| `scripts/data_monitor_cli.py` | The D2 capstone: a standalone CLI agent that connects to `notebooks/mcp_data_server.py` via MCP and answers monitoring questions using Claude. Run with `--list-tools` to inspect the server, or pass a natural-language question directly. | Reference implementation for D2; also useful as a working CLI template for your own MCP-backed agents. |
| `data/*` | The synthetic datasets that the notebooks read from. **Do not edit by hand** — anything you change here gets overwritten the next time `generate_data.py` runs. | Generally read-only. |
| `notebooks/llm_client.py` | The reusable `ClaudeClient` wrapper class plus its retry helper and cost helper. Originally walked through cell-by-cell in the A2 notebook, then **extracted into this file** so every later notebook can `from llm_client import ClaudeClient` instead of redefining the wrapper. | Edit when you want to change wrapper behaviour for *all* downstream notebooks at once (e.g. add structured logging, change the default model). |

### How A2 and B1 onward fit together

This is the one architectural transition worth understanding before you read the code.

- **A2** is a *teaching notebook*. It builds the `ClaudeClient` class from scratch, cell by cell, so a reader sees every piece (retry loop, streaming, cost helper, the final dataclass). The class is defined inline.
- **`llm_client.py`** is the *production form of that same class*. It contains the finished `ClaudeClient` with no narration, no exposition, no intermediate variables — just the working code.
- **B1 through D2** all start with `from llm_client import ClaudeClient` and never re-implement it. They focus entirely on their topic (system prompts, RAG, MCP, etc.) instead of rebuilding boilerplate every notebook.

You can think of `llm_client.py` as "the result of A2, frozen and made importable." If you change something in `llm_client.py`, you're changing the shared spine that B1 through D2 all sit on top of. If you want to play with retry / logging behaviour just inside one notebook, the safer move is to subclass `ClaudeClient` locally in that notebook rather than edit the shared file.

### Dependency graph at a glance

```
                    requirements.txt
                          │
                          ▼
                  (virtualenv / uv)
                          │
                          ▼
   .env  ──────►  llm_client.py  ──────►  notebooks/B1, B2, B3, ... .ipynb
                          ▲                        ▲
                          │                        │
                  A2 notebook                      │
                  (builds it from scratch)         │
                                                   │
                  scripts/generate_data.py ──► data/*  (CSV, MD, PDF, PNG)
                                                   │
                                                   ▼
                                          notebooks read from data/
```

Two simple rules follow from the diagram:

1. **Run `scripts/generate_data.py` once** before opening any notebook that uses a dataset — every notebook from B1 onward checks for its dataset and tells you if it's missing.
2. **Always launch Jupyter from the `notebooks/` directory** (`jupyter notebook notebooks/`) so the `from llm_client import ClaudeClient` import resolves without `sys.path` gymnastics.

---

## Shared datasets

All notebooks read from `data/`, which is generated by `scripts/generate_data.py`. Re-running the generator is idempotent and safe.

| File | Used in | What it represents |
|---|---|---|
| `data/saas_metrics.csv` | B1, B3, C2 | 12 months × 3 customer segments × MRR / new MRR / churn / expansion |
| `data/pipeline_jobs.csv` | C2, D2 | Daily ETL job runs with status, duration, and SLA flags |
| `data/warehouse_usage.csv` | C1, C2 | Per-user warehouse query counts and costs |
| `data/qbr_q3_2025.md` | B2, B4, B5 | Synthetic quarterly business review (RAG corpus + PDF source) |
| `data/runbook_data_quality.md` | B4, B5 | Synthetic data-quality incident runbook |
| `data/runbook_warehouse_cost.md` | B3, B4, B5 | Synthetic warehouse cost-control runbook (long doc — good caching target) |
| `data/qbr_q3_2025.pdf` | B2 | Generated from the markdown via `reportlab` |
| `data/dashboard_screenshot.png` | B2 | Generated chart PNG (matplotlib) |

All datasets are **synthetic**. No real client data is included.

---

## Conventions used across notebooks

- **Python 3.11+**, `anthropic>=0.40`.
- **A2 builds the wrapper, B1 onward imports it.** A2's notebook defines `ClaudeClient` inline as a teaching exercise. Every later notebook starts with `from llm_client import ClaudeClient` and never redefines it.
- **Three-section pattern** in every notebook: *Setup → Walkthrough → Practitioner Lab*.
- **No notebook depends on another notebook's state.** Notebooks may share datasets and the `llm_client` module; they never share Python variables.
- **No real API key in version control.** `.env` is gitignored; `.env.example` shows the shape.
- **Cost-conscious examples.** Every notebook prints estimated cost per call when relevant (after you populate `PRICES_PER_M_TOKENS` in `llm_client.py` with current rates from [anthropic.com/pricing](https://www.anthropic.com/pricing)).

---

## License

Code in this repository: MIT, © 2026 DataMy.
Concepts attributed to Anthropic as cited inline. Not affiliated with Anthropic.
