{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A2 - Environment Setup & Your First Robust API Call\n",
    "\n",
    "Companion notebook for article **A2** in *Building with Claude - A Practitioner's Guide to the Anthropic API*.\n",
    "\n",
    "**Attribution.** Concepts adapted from Anthropic's \"Building with the Claude API\" course (Coursera) and public API documentation at [docs.anthropic.com](https://docs.anthropic.com). All code below is original work (c) 2026 DataMy. Not affiliated with Anthropic.\n",
    "\n",
    "---\n",
    "\n",
    "## What you'll build in this notebook\n",
    "\n",
    "By the last cell you will have a reusable `ClaudeClient` wrapper that:\n",
    "\n",
    "1. Loads credentials from `.env`\n",
    "2. Makes batched and streamed calls\n",
    "3. Retries transient errors with jittered exponential backoff\n",
    "4. Logs model, input_tokens, output_tokens, estimated_cost_usd, latency_ms for every call\n",
    "5. Surfaces non-retriable errors immediately\n",
    "\n",
    "Every later notebook in this series imports this wrapper instead of rebuilding it.\n",
    "\n",
    "**Prerequisites:** `pip install -r requirements.txt` and a `.env` file with `ANTHROPIC_API_KEY` set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 1 - Setup\n",
    "\n",
    "Load environment variables, instantiate a client, and confirm the key is reachable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import json\n",
    "import random\n",
    "from dataclasses import dataclass, asdict\n",
    "from typing import Optional, Iterator\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "import anthropic\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "assert os.getenv(\"ANTHROPIC_API_KEY\"), (\n",
    "    \"ANTHROPIC_API_KEY not found. Copy .env.example to .env and add your key.\"\n",
    ")\n",
    "\n",
    "client = anthropic.Anthropic()\n",
    "print(\"Client ready. SDK version:\", anthropic.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 2 - Your first call\n\nThe minimal `messages.create()` call. Model availability changes over time; this series uses `claude-sonnet-4-5` throughout for consistency with the Coursera \"Building with the Claude API\" course. Before using in production, check the live model list at [docs.anthropic.com/en/docs/about-claude/models](https://docs.anthropic.com/en/docs/about-claude/models) and swap in the current recommended ID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DEFAULT_MODEL = \"claude-sonnet-4-5\"\n",
    "\n",
    "response = client.messages.create(\n",
    "    model=DEFAULT_MODEL,\n",
    "    max_tokens=300,\n",
    "    messages=[\n",
    "        {\"role\": \"user\", \"content\": \"In one sentence, what is prompt caching?\"}\n",
    "    ],\n",
    ")\n",
    "\n",
    "print(\"Text:\", response.content[0].text)\n",
    "print()\n",
    "print(\"Stop reason:\", response.stop_reason)\n",
    "print(\"Usage:\", response.usage)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the `usage` object -- `input_tokens`, `output_tokens`, and (later) cache fields. We use this in Section 5 to log cost.\n",
    "\n",
    "Also check `stop_reason`. Common values: `end_turn` (model finished naturally), `max_tokens` (hit the limit -- response is truncated), `tool_use` (model wants to call a tool -- covered in C2). Production code should always branch on this."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 3 - Streaming\n",
    "\n",
    "Same call, streamed. Text appears incrementally. The context-manager form guarantees the connection closes even if downstream code throws."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with client.messages.stream(\n",
    "    model=DEFAULT_MODEL,\n",
    "    max_tokens=400,\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": \"Explain why backend services should stream LLM responses, in 3 short bullet points.\"\n",
    "        }\n",
    "    ],\n",
    ") as stream:\n",
    "    for chunk in stream.text_stream:\n",
    "        print(chunk, end=\"\", flush=True)\n",
    "    final = stream.get_final_message()\n",
    "\n",
    "print(\"\\n\\n---\")\n",
    "print(\"Final usage:\", final.usage)\n",
    "print(\"Stop reason:\", final.stop_reason)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 4 - Errors and retry with jittered backoff\n",
    "\n",
    "Three error classes are retriable: connection errors, rate limits, and 5xx (including 529 overloaded). Everything else -- bad request, auth failure, permission denied -- should fail loudly so you fix it.\n",
    "\n",
    "We deliberately **do not catch generic `Exception`**. Silent error swallowing is the most common source of hard-to-diagnose production incidents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "RETRIABLE = (\n",
    "    anthropic.APIConnectionError,\n",
    "    anthropic.RateLimitError,\n",
    "    anthropic.InternalServerError,\n",
    ")\n",
    "\n",
    "def call_with_retry(client, *, max_retries=4, base_delay=1.0, **kwargs):\n",
    "    \"\"\"Wrap client.messages.create() with jittered exponential backoff on transient errors.\"\"\"\n",
    "    last_exc = None\n",
    "    for attempt in range(max_retries):\n",
    "        try:\n",
    "            return client.messages.create(**kwargs)\n",
    "        except RETRIABLE as e:\n",
    "            last_exc = e\n",
    "            if attempt == max_retries - 1:\n",
    "                raise\n",
    "            sleep_s = base_delay * (2 ** attempt) + random.uniform(0, 0.5)\n",
    "            print(f\"  [retry] {type(e).__name__} -- sleeping {sleep_s:.2f}s (attempt {attempt + 1}/{max_retries})\")\n",
    "            time.sleep(sleep_s)\n",
    "    raise last_exc  # type: ignore[misc]\n",
    "\n",
    "# Sanity check: should succeed on first try.\n",
    "resp = call_with_retry(\n",
    "    client,\n",
    "    model=DEFAULT_MODEL,\n",
    "    max_tokens=80,\n",
    "    messages=[{\"role\": \"user\", \"content\": \"Reply with exactly: OK\"}],\n",
    ")\n",
    "print(\"Response:\", resp.content[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 5 - Cost tracking\n",
    "\n",
    "Authoritative references:\n",
    "\n",
    "- **Current pricing** (per-million-token USD, by model and by token type): [anthropic.com/pricing](https://www.anthropic.com/pricing)\n",
    "- **usage object reference**: [docs.anthropic.com/en/api/messages](https://docs.anthropic.com/en/api/messages)\n",
    "\n",
    "This notebook ships with an **empty** price table so it cannot drift out of date silently. Before running the cost helper, open the pricing page and paste in the rates for the models you use. The helper returns `0.0` for any unconfigured model -- honest behaviour rather than misleading guesswork."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Per-million-token USD. Populate from https://www.anthropic.com/pricing.\n",
    "# Refresh whenever Anthropic updates prices.\n",
    "PRICES_PER_M_TOKENS: dict = {\n",
    "    # Example shape -- fill in real values from the pricing page:\n",
    "    # \"claude-sonnet-4-5\": {\"input\": 0.00, \"output\": 0.00},\n",
    "    # \"claude-haiku-4-5\":  {\"input\": 0.00, \"output\": 0.00},\n",
    "    # \"claude-opus-4-8\":   {\"input\": 0.00, \"output\": 0.00},\n",
    "}\n",
    "\n",
    "def estimate_cost_usd(model: str, usage) -> float:\n",
    "    \"\"\"Estimate USD cost from a response usage object.\n",
    "\n",
    "    Returns 0.0 for any model not present in PRICES_PER_M_TOKENS.\n",
    "    Deliberate honest default -- never guess prices.\n",
    "    \"\"\"\n",
    "    p = PRICES_PER_M_TOKENS.get(model)\n",
    "    if not p:\n",
    "        return 0.0\n",
    "    return (\n",
    "        usage.input_tokens  * p[\"input\"]  / 1_000_000\n",
    "      + usage.output_tokens * p[\"output\"] / 1_000_000\n",
    "    )\n",
    "\n",
    "cost = estimate_cost_usd(DEFAULT_MODEL, resp.usage)\n",
    "if cost == 0.0 and DEFAULT_MODEL not in PRICES_PER_M_TOKENS:\n",
    "    print(f\"No price configured for {DEFAULT_MODEL}. Cost helper returned $0.0000.\")\n",
    "    print(\"Paste current rates from https://www.anthropic.com/pricing into PRICES_PER_M_TOKENS.\")\n",
    "else:\n",
    "    print(f\"That last call cost ~${cost:.6f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 6 - The reusable ClaudeClient wrapper\n",
    "\n",
    "Assemble everything into one class. Every later notebook in this series imports this instead of rebuilding it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@dataclass\n",
    "class CallRecord:\n",
    "    model: str\n",
    "    input_tokens: int\n",
    "    output_tokens: int\n",
    "    estimated_cost_usd: float\n",
    "    latency_ms: int\n",
    "    streamed: bool\n",
    "    stop_reason: Optional[str]\n",
    "\n",
    "\n",
    "class ClaudeClient:\n",
    "    \"\"\"Thin wrapper around anthropic.Anthropic() with retries, streaming, and cost logging.\n",
    "\n",
    "    Usage:\n",
    "        cc = ClaudeClient(default_model=\"claude-sonnet-4-5\")\n",
    "        text = cc.complete(\"Say hi.\")\n",
    "        for chunk in cc.stream(\"Tell me a short story.\"):\n",
    "            print(chunk, end=\"\")\n",
    "        cc.print_summary()\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(\n",
    "        self,\n",
    "        *,\n",
    "        default_model: str = DEFAULT_MODEL,\n",
    "        max_retries: int = 4,\n",
    "        log_path: Optional[str] = None,\n",
    "    ):\n",
    "        self.client = anthropic.Anthropic()\n",
    "        self.default_model = default_model\n",
    "        self.max_retries = max_retries\n",
    "        self.log_path = log_path\n",
    "        self.records: list[CallRecord] = []\n",
    "\n",
    "    # --- batched ---\n",
    "    def complete(\n",
    "        self,\n",
    "        prompt: str,\n",
    "        *,\n",
    "        model: Optional[str] = None,\n",
    "        max_tokens: int = 1024,\n",
    "        system: Optional[str] = None,\n",
    "        temperature: float = 1.0,\n",
    "    ) -> str:\n",
    "        model = model or self.default_model\n",
    "        kwargs: dict = dict(\n",
    "            model=model,\n",
    "            max_tokens=max_tokens,\n",
    "            messages=[{\"role\": \"user\", \"content\": prompt}],\n",
    "            temperature=temperature,\n",
    "        )\n",
    "        if system is not None:\n",
    "            kwargs[\"system\"] = system\n",
    "        t0 = time.perf_counter()\n",
    "        resp = call_with_retry(self.client, max_retries=self.max_retries, **kwargs)\n",
    "        latency_ms = int((time.perf_counter() - t0) * 1000)\n",
    "        self._record(model, resp.usage, latency_ms, streamed=False, stop_reason=resp.stop_reason)\n",
    "        return resp.content[0].text\n",
    "\n",
    "    # --- streamed ---\n",
    "    def stream(\n",
    "        self,\n",
    "        prompt: str,\n",
    "        *,\n",
    "        model: Optional[str] = None,\n",
    "        max_tokens: int = 1024,\n",
    "        system: Optional[str] = None,\n",
    "        temperature: float = 1.0,\n",
    "    ) -> Iterator[str]:\n",
    "        model = model or self.default_model\n",
    "        kwargs: dict = dict(\n",
    "            model=model,\n",
    "            max_tokens=max_tokens,\n",
    "            messages=[{\"role\": \"user\", \"content\": prompt}],\n",
    "            temperature=temperature,\n",
    "        )\n",
    "        if system is not None:\n",
    "            kwargs[\"system\"] = system\n",
    "        t0 = time.perf_counter()\n",
    "        with self.client.messages.stream(**kwargs) as s:\n",
    "            for chunk in s.text_stream:\n",
    "                yield chunk\n",
    "            final = s.get_final_message()\n",
    "        latency_ms = int((time.perf_counter() - t0) * 1000)\n",
    "        self._record(model, final.usage, latency_ms, streamed=True, stop_reason=final.stop_reason)\n",
    "\n",
    "    # --- internals ---\n",
    "    def _record(self, model, usage, latency_ms, *, streamed, stop_reason):\n",
    "        rec = CallRecord(\n",
    "            model=model,\n",
    "            input_tokens=usage.input_tokens,\n",
    "            output_tokens=usage.output_tokens,\n",
    "            estimated_cost_usd=round(estimate_cost_usd(model, usage), 6),\n",
    "            latency_ms=latency_ms,\n",
    "            streamed=streamed,\n",
    "            stop_reason=stop_reason,\n",
    "        )\n",
    "        self.records.append(rec)\n",
    "        if self.log_path:\n",
    "            with open(self.log_path, \"a\") as f:\n",
    "                f.write(json.dumps(asdict(rec)) + \"\\n\")\n",
    "\n",
    "    def print_summary(self):\n",
    "        if not self.records:\n",
    "            print(\"No calls recorded yet.\")\n",
    "            return\n",
    "        total_cost = sum(r.estimated_cost_usd for r in self.records)\n",
    "        total_in   = sum(r.input_tokens for r in self.records)\n",
    "        total_out  = sum(r.output_tokens for r in self.records)\n",
    "        print(f\"Calls        : {len(self.records)}\")\n",
    "        print(f\"Input tokens : {total_in:>10,}\")\n",
    "        print(f\"Output tokens: {total_out:>10,}\")\n",
    "        print(f\"Est. cost    : ${total_cost:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Try it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc = ClaudeClient()\n",
    "\n",
    "# Batched call\n",
    "text = cc.complete(\n",
    "    \"Give me three reasons a data engineer should log token usage per API call.\",\n",
    "    max_tokens=400,\n",
    ")\n",
    "print(text)\n",
    "print()\n",
    "\n",
    "# Streamed call\n",
    "print(\"--- streamed ---\")\n",
    "for chunk in cc.stream(\n",
    "    \"In one short paragraph, when should I use claude-haiku instead of claude-sonnet?\",\n",
    "    max_tokens=300,\n",
    "):\n",
    "    print(chunk, end=\"\", flush=True)\n",
    "print(\"\\n\")\n",
    "\n",
    "cc.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df8d4f20",
   "source": "---\n\n### What happens to this class after A2\n\nThe `ClaudeClient` class above is the deliverable of this notebook. To avoid copy-pasting it into every later notebook in the series, we save the same class -- unchanged -- into a separate module at `notebooks/llm_client.py`.\n\nFrom B1 onward, each notebook simply does:\n\n```python\nfrom llm_client import ClaudeClient\n```\n\ninstead of redefining the wrapper. The teaching version (this notebook) and the production version (`llm_client.py`) stay in sync because they are the same code. If you ever want to change retry behaviour, logging, or pricing for the whole series, edit `llm_client.py` once.\n\nFor the full picture of which file does what -- `requirements.txt`, `generate_data.py`, `data/`, `.env.example`, and how they relate -- see the **\"Project structure & file roles\"** section of the project [`README.md`](../README.md).",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 7 - Practitioner Lab\n",
    "\n",
    "Open-ended extension. No reference solution provided -- the value is in the thinking.\n",
    "\n",
    "**Goal:** add a per-day token-budget cap to `ClaudeClient` that refuses calls once the daily spend exceeds a configured threshold.\n",
    "\n",
    "**Constraints:**\n",
    "1. The cap is set at construction: `ClaudeClient(daily_budget_usd=5.00)`.\n",
    "2. Daily means local-time calendar days, not a rolling 24-hour window.\n",
    "3. When the cap is exceeded, raise a custom `BudgetExceededError`.\n",
    "4. If `log_path` is set, the budget persists across sessions: on init, replay the log and re-tally today's spend.\n",
    "\n",
    "**Stretch:** add a soft-cap mode that downgrades the model to Haiku once 80% of the daily budget is spent, rather than refusing outright.\n",
    "\n",
    "---\n",
    "\n",
    "*Companion article: A2 - Environment Setup and Your First Robust API Call.*\n",
    "*Next notebook: B1_system_prompts_output.ipynb*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}