{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# B1 - System Prompts, Roles, and Output Control\n",
    "\n",
    "Companion notebook for article **B1** in *Building with Claude - A Practitioner's Guide to the Anthropic API*.\n",
    "\n",
    "**Attribution.** Concepts adapted from Anthropic's \"Building with the Claude API\" course (Coursera) and public API documentation at [docs.anthropic.com](https://docs.anthropic.com). All code below is original work (c) 2026 DataMy. Not affiliated with Anthropic.\n",
    "\n",
    "---\n",
    "\n",
    "## What you'll build in this notebook\n",
    "\n",
    "An **analytics-copilot persona** that reads a synthetic SaaS metrics table and answers questions about it. We progressively layer:\n",
    "\n",
    "1. A system prompt that defines the analyst persona\n",
    "2. `temperature=0` for deterministic analytical answers\n",
    "3. Assistant-turn pre-fill + stop sequences to force clean JSON\n",
    "4. A tool schema that validates the output structure (the production-ready approach)\n",
    "\n",
    "By the last cell you have four working call patterns and can pick the right one for any future structured-output task.\n",
    "\n",
    "**Prerequisites:**\n",
    "- `pip install -r ../requirements.txt`\n",
    "- A `.env` file with `ANTHROPIC_API_KEY` set\n",
    "- The dataset built by `python ../scripts/generate_data.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Section 1 - Setup and load the dataset\n\n### What this notebook imports, and where it comes from\n\nThis notebook -- and every notebook from B1 onward in the series -- uses three things that live **outside** the notebook file itself:\n\n| Import / file | Lives in | Role |\n|---|---|---|\n| `from llm_client import ClaudeClient` | `notebooks/llm_client.py` | The reusable client wrapper (retries, streaming, cost logging, tool support). Built cell-by-cell in the A2 notebook, then frozen into `llm_client.py` so B1+ can import it instead of redefining it. |\n| `data/saas_metrics.csv` | `data/` (at the project root) | A synthetic 12-month x 3-segment SaaS metrics table. Created by running `python scripts/generate_data.py` once after setup. Do not edit by hand. |\n| `.env` (read implicitly via `python-dotenv`) | project root, gitignored | Holds `ANTHROPIC_API_KEY`. Copy `.env.example` to `.env` and paste your key. |\n\nFor the full project layout -- what each file does, why it exists, and how the pieces connect -- see the **\"Project structure & file roles\"** section of the project [`README.md`](../README.md). The short version: A2 builds the wrapper for teaching purposes, `llm_client.py` is the production form of the same class, and B1 onward focuses entirely on the topic at hand without rebuilding boilerplate.\n\n### Load the dataset"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from llm_client import ClaudeClient\n",
    "\n",
    "DATA_PATH = Path(\"..\") / \"data\" / \"saas_metrics.csv\"\n",
    "assert DATA_PATH.exists(), (\n",
    "    f\"Dataset not found at {DATA_PATH}. Run: python ../scripts/generate_data.py\"\n",
    ")\n",
    "\n",
    "df = pd.read_csv(DATA_PATH)\n",
    "print(f\"Loaded {len(df)} rows, {df['segment'].nunique()} segments, \"\n",
    "      f\"months {df['month'].min()} -> {df['month'].max()}\")\n",
    "df.head(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc = ClaudeClient()\n",
    "print(\"Client ready. Default model:\", cc.default_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 2 - The same question, with and without a system prompt\n",
    "\n",
    "We feed Claude the full table as CSV text and ask the same question twice. First with no system prompt, then with the analytics-copilot persona. Notice how the response *discipline* changes -- shorter, less hedging, numbers-first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "csv_text = df.to_csv(index=False)\n",
    "\n",
    "user_question = (\n",
    "    \"Given this SaaS metrics CSV, which segment showed the strongest \"\n",
    "    \"net new MRR growth across the year, and by how much in absolute USD?\\n\\n\"\n",
    "    f\"CSV:\\n{csv_text}\"\n",
    ")\n",
    "\n",
    "print(\"=== WITHOUT system prompt ===\\n\")\n",
    "naive = cc.complete_text(user_question, temperature=0, max_tokens=600)\n",
    "print(naive)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SYSTEM_PROMPT = \"\"\"\n",
    "<role>\n",
    "You are a SaaS metrics analyst supporting a revenue operations team.\n",
    "</role>\n",
    "\n",
    "<constraints>\n",
    "- Only answer using the data provided in the user message. Never invent figures.\n",
    "- If a metric the user asks for is not in the data, say so explicitly.\n",
    "- Use standard SaaS terminology: MRR, ARR, NRR, gross retention, expansion MRR, churn MRR.\n",
    "</constraints>\n",
    "\n",
    "<style>\n",
    "- Direct. No hedging. No \"I hope this helps\".\n",
    "- Numbers first, explanation second.\n",
    "</style>\n",
    "\n",
    "<output_format>\n",
    "- Default to a short paragraph followed by a bullet list of supporting figures.\n",
    "- When the user asks for JSON, return ONLY a JSON object -- no preamble, no code fences.\n",
    "</output_format>\n",
    "\"\"\".strip()\n",
    "\n",
    "print(\"=== WITH system prompt ===\\n\")\n",
    "answer = cc.complete_text(\n",
    "    user_question,\n",
    "    system=SYSTEM_PROMPT,\n",
    "    temperature=0,\n",
    "    max_tokens=600,\n",
    ")\n",
    "print(answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 3 - Temperature: same prompt, same data, different answers\n",
    "\n",
    "We call the same question twice at `temperature=0` and twice at `temperature=1.0`. The deterministic-leaning calls should be nearly identical; the high-temperature calls should differ noticeably in wording (sometimes in framing).\n",
    "\n",
    "For analytical work this is why `temperature=0` is the default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "small_question = (\n",
    "    \"Looking only at the Enterprise segment in this data, summarize the trend \"\n",
    "    \"in expansion MRR across the year in one short sentence.\\n\\n\"\n",
    "    f\"CSV:\\n{csv_text}\"\n",
    ")\n",
    "\n",
    "print(\"--- temperature = 0 ---\")\n",
    "for i in range(2):\n",
    "    out = cc.complete_text(small_question, system=SYSTEM_PROMPT, temperature=0, max_tokens=200)\n",
    "    print(f\"[call {i+1}] {out}\\n\")\n",
    "\n",
    "print(\"--- temperature = 1.0 ---\")\n",
    "for i in range(2):\n",
    "    out = cc.complete_text(small_question, system=SYSTEM_PROMPT, temperature=1.0, max_tokens=200)\n",
    "    print(f\"[call {i+1}] {out}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 4 - Pre-fill + stop sequences: forcing clean JSON\n",
    "\n",
    "Trick: append an assistant turn that starts with `{`. Claude continues from there. Combined with `stop_sequences=[\"}\\n\\n\"]` (or similar), you get a clean JSON object with no preamble and no trailing prose.\n",
    "\n",
    "Watch the `stop_reason` -- `\"stop_sequence\"` means it stopped because of our stop string; `\"end_turn\"` means it finished naturally; `\"max_tokens\"` would mean truncation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "json_question = (\n",
    "    \"For the SMB segment in 2025-09, return a JSON object with keys: \"\n",
    "    \"month, segment, mrr_end, new_mrr, churn_mrr. Use the exact values from the data.\\n\\n\"\n",
    "    f\"CSV:\\n{csv_text}\"\n",
    ")\n",
    "\n",
    "resp = cc.complete(\n",
    "    json_question,\n",
    "    system=SYSTEM_PROMPT,\n",
    "    temperature=0,\n",
    "    max_tokens=400,\n",
    "    prefill=\"{\",\n",
    "    stop_sequences=[\"}\\n\\n\", \"```\"],\n",
    ")\n",
    "raw_continuation = resp.content[0].text\n",
    "raw_json = \"{\" + raw_continuation\n",
    "# If the stop sequence consumed the closing brace, add it back.\n",
    "if not raw_json.rstrip().endswith(\"}\"):\n",
    "    raw_json = raw_json.rstrip() + \"}\"\n",
    "\n",
    "print(\"Stop reason:\", resp.stop_reason)\n",
    "print(\"\\nRaw JSON returned:\\n\", raw_json)\n",
    "\n",
    "parsed = json.loads(raw_json)\n",
    "print(\"\\nParsed dict:\")\n",
    "for k, v in parsed.items():\n",
    "    print(f\"  {k}: {v}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pre-fill works, but you can see the failure modes:\n",
    "- We had to hand-stitch the opening `{` back onto the response.\n",
    "- We had to defensively re-add the closing `}` in case the stop sequence ate it.\n",
    "- A future model whose phrasing changes slightly could break this.\n",
    "\n",
    "For one-off scripts this is fine. For production, use a tool schema instead. That's next."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 5 - Tool schema: structured output, validated\n",
    "\n",
    "Define the desired output as a tool's `input_schema`. Force Claude to call that one tool with `tool_choice={\"type\": \"tool\", \"name\": ...}`. Read the `input` off the tool_use block -- it's already a parsed dict that matches the schema."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metrics_schema = {\n",
    "    \"name\": \"report_metrics\",\n",
    "    \"description\": \"Reports SaaS metrics extracted from a data table.\",\n",
    "    \"input_schema\": {\n",
    "        \"type\": \"object\",\n",
    "        \"properties\": {\n",
    "            \"month\":         {\"type\": \"string\", \"description\": \"ISO month, e.g. 2025-09\"},\n",
    "            \"segment\":       {\"type\": \"string\", \"enum\": [\"SMB\", \"Mid-Market\", \"Enterprise\"]},\n",
    "            \"mrr_end\":       {\"type\": \"number\"},\n",
    "            \"new_mrr\":       {\"type\": \"number\"},\n",
    "            \"churn_mrr\":     {\"type\": \"number\"},\n",
    "            \"commentary\":    {\"type\": \"string\", \"description\": \"One-sentence interpretation of the figures.\"},\n",
    "        },\n",
    "        \"required\": [\"month\", \"segment\", \"mrr_end\", \"new_mrr\", \"churn_mrr\"],\n",
    "    },\n",
    "}\n",
    "\n",
    "resp = cc.complete(\n",
    "    json_question,\n",
    "    system=SYSTEM_PROMPT,\n",
    "    temperature=0,\n",
    "    max_tokens=600,\n",
    "    tools=[metrics_schema],\n",
    "    tool_choice={\"type\": \"tool\", \"name\": \"report_metrics\"},\n",
    ")\n",
    "\n",
    "tool_use_block = next(b for b in resp.content if getattr(b, \"type\", None) == \"tool_use\")\n",
    "structured = tool_use_block.input\n",
    "\n",
    "print(\"Stop reason:\", resp.stop_reason)\n",
    "print(\"\\nStructured output (already a dict, no json.loads needed):\")\n",
    "for k, v in structured.items():\n",
    "    print(f\"  {k}: {v}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why the tool-schema approach wins in production:\n",
    "\n",
    "- **No json.loads.** The SDK hands you a dict.\n",
    "- **The schema is enforced.** Try changing the `enum` to `[\"SMB\", \"MID\", \"ENT\"]` and re-running -- the segment field comes back constrained to those values.\n",
    "- **Self-documenting.** The schema is the contract. Six months from now, the shape is obvious from the code.\n",
    "\n",
    "We pick up tool schemas as a way to take *actions* (not just enforce output) in C2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 6 - Practitioner Lab\n",
    "\n",
    "Open-ended extension. No reference solution.\n",
    "\n",
    "**Goal:** extend the analytics-copilot persona to produce a *monthly summary* row across all three segments for a user-specified month.\n",
    "\n",
    "**Constraints:**\n",
    "1. Build a single tool schema named `monthly_summary` with one property per segment, each containing the full set of metrics for that segment in the chosen month.\n",
    "2. The system prompt must instruct Claude to refuse if the requested month is not in the data.\n",
    "3. Validate the returned object against the schema with `jsonschema` (or any library you prefer) -- catch the schema breakage you cannot catch by reading the model output by eye.\n",
    "\n",
    "**Stretch:** add a second tool, `month_not_in_data`, with a single `reason` field. Pass both tools and `tool_choice=\"any\"`. Claude now picks between \"answer\" and \"refuse\" depending on the input. This is the canonical pattern for letting a model gracefully decline.\n",
    "\n",
    "Why this matters: every production analytics assistant eventually faces \"the user asked about a quarter we don't have data for.\" Modeling refusal as a structured tool call -- not as a free-form apology -- is what separates demo code from production code.\n",
    "\n",
    "---\n",
    "\n",
    "*Companion article: B1 - System Prompts, Roles, and Output Control.*\n",
    "*Next notebook: B2_multimodal_images_pdf.ipynb*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}