{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.0"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# C1 - Built-in Tools: Code Execution, Web Search, and the Tool Use Loop\n",
    "\n",
    "Companion notebook for article **C1** in *Building with Claude - A Practitioner's Guide to the Anthropic API*.\n",
    "\n",
    "**Attribution.** Concepts adapted from Anthropic's \"Building with the Claude API\" course (Coursera) and public API documentation at [docs.anthropic.com](https://docs.anthropic.com). All code below is original work (c) 2026 DataMy. Not affiliated with Anthropic.\n",
    "\n",
    "---\n",
    "\n",
    "## What you'll build in this notebook\n",
    "\n",
    "A working tool use loop plus two concrete tools over the Snowflake warehouse usage dataset:\n",
    "\n",
    "1. **Tool use anatomy** -- inspect a raw `tool_use` response object; understand `stop_reason`, `ToolUseBlock`, and `tool_use_id`.\n",
    "2. **The execution loop** -- a reusable `run_tool_loop()` that cycles until `end_turn` or a max-turn ceiling.\n",
    "3. **Code execution tool** -- a `run_python` tool backed by `exec()` + stdout capture; Claude analyses `warehouse_usage.csv` by writing and running its own pandas code.\n",
    "4. **Web search tool** -- the server-side `web_search_20250305` built-in; no execution loop needed.\n",
    "5. **Multi-turn code agent** -- a full session where Claude calls `run_python` multiple times to work toward a diagnostic conclusion.\n",
    "\n",
    "**Prerequisites:**\n",
    "- `pip install -r ../requirements.txt`\n",
    "- A `.env` file with `ANTHROPIC_API_KEY` set\n",
    "- Dataset: `python ../scripts/generate_data.py` (creates `warehouse_usage.csv`)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": [
    "## Section 1 - Setup\n",
    "\n",
    "This notebook uses the raw `anthropic` client directly for most calls so the tool use\n",
    "machinery is visible. `ClaudeClient` from `llm_client.py` is still imported for the\n",
    "`print_summary()` call at the end.\n",
    "\n",
    "The `warehouse_usage.csv` dataset covers 90 days (2025-07-01 to 2025-09-28) of daily\n",
    "Snowflake credit consumption across Acme's 7 warehouses. It contains two embedded anomalies\n",
    "from the Cost Runbook incidents -- Claude will find them without being told where to look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "import sys\n",
    "import traceback\n",
    "from pathlib import Path\n",
    "\n",
    "import anthropic\n",
    "import pandas as pd\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "from llm_client import ClaudeClient\n",
    "\n",
    "load_dotenv(\"../.env\")\n",
    "\n",
    "DATA_DIR = Path(\"..\") / \"data\"\n",
    "USAGE_PATH = DATA_DIR / \"warehouse_usage.csv\"\n",
    "assert USAGE_PATH.exists(), f\"Missing: {USAGE_PATH}. Run python ../scripts/generate_data.py\"\n",
    "\n",
    "df = pd.read_csv(USAGE_PATH, parse_dates=[\"date\"])\n",
    "\n",
    "print(f\"Loaded warehouse_usage.csv: {len(df):,} rows\")\n",
    "print(f\"Date range : {df.date.min().date()} to {df.date.max().date()}\")\n",
    "print(f\"Warehouses : {sorted(df.warehouse_name.unique())}\")\n",
    "print()\n",
    "print(df.head(7).to_string(index=False))\n",
    "\n",
    "client = anthropic.Anthropic()\n",
    "cc = ClaudeClient()\n",
    "MODEL = cc.default_model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## Section 2 - Tool use anatomy: inspecting the raw response\n",
    "\n",
    "Before building the loop, let's look at a single raw tool-use exchange. We define one tool\n",
    "(`run_python`), ask a question that requires data analysis, and inspect every field of the\n",
    "response before doing anything with it.\n",
    "\n",
    "Key fields to notice:\n",
    "- `response.stop_reason` -- `\"tool_use\"` signals Claude wants to call a tool.\n",
    "- `response.content` -- a list that may mix `TextBlock` and `ToolUseBlock` objects.\n",
    "- `block.id` -- the unique ID you must echo back in the `tool_result`.\n",
    "- `block.input` -- a Python dict matching the tool's `input_schema`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-4",
   "metadata": {},
   "outputs": [],
   "source": [
    "RUN_PYTHON_TOOL = {\n",
    "    \"name\": \"run_python\",\n",
    "    \"description\": (\n",
    "        \"Execute Python code and return the printed output. \"\n",
    "        \"Use this to analyse data, compute statistics, or generate insights. \"\n",
    "        \"A pandas DataFrame `df` with columns [date, warehouse_name, credits_used, \"\n",
    "        \"query_count, avg_queue_time_s, avg_execution_time_s] is pre-loaded. \"\n",
    "        \"Always use print() to produce output.\"\n",
    "    ),\n",
    "    \"input_schema\": {\n",
    "        \"type\": \"object\",\n",
    "        \"properties\": {\n",
    "            \"code\": {\n",
    "                \"type\": \"string\",\n",
    "                \"description\": \"Valid Python code. Use print() to produce output.\",\n",
    "            }\n",
    "        },\n",
    "        \"required\": [\"code\"],\n",
    "    },\n",
    "}\n",
    "\n",
    "# Single API call -- Claude will decide to use the tool\n",
    "raw_response = client.messages.create(\n",
    "    model=MODEL,\n",
    "    max_tokens=2048,\n",
    "    tools=[RUN_PYTHON_TOOL],\n",
    "    messages=[{\n",
    "        \"role\": \"user\",\n",
    "        \"content\": \"What is the total credit spend per warehouse over the full dataset period?\",\n",
    "    }],\n",
    ")\n",
    "\n",
    "print(f\"stop_reason : {raw_response.stop_reason}\")\n",
    "print(f\"content blocks: {len(raw_response.content)}\")\n",
    "print()\n",
    "for i, block in enumerate(raw_response.content):\n",
    "    print(f\"Block {i}: type={block.type}\")\n",
    "    if block.type == \"text\":\n",
    "        print(f\"  text : {block.text[:120]} ...\")\n",
    "    elif block.type == \"tool_use\":\n",
    "        print(f\"  id   : {block.id}\")\n",
    "        print(f\"  name : {block.name}\")\n",
    "        code_preview = block.input.get(\"code\", \"\")[:200].replace(\"\\n\", \" | \")\n",
    "        print(f\"  code : {code_preview} ...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5",
   "metadata": {},
   "source": [
    "## Section 3 - The execution loop\n",
    "\n",
    "Now we build the loop. `run_tool_loop()` takes the initial messages, a dispatch function\n",
    "(`execute_tool_fn`), and cycles until `stop_reason==\"end_turn\"` or `max_turns` is reached.\n",
    "\n",
    "The dispatch function maps tool names to Python callables. For this notebook there is only one\n",
    "tool (`run_python`), but C2 will extend the same pattern to multiple tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def execute_python(code: str) -> str:\n    \"\"\"Execute code in a namespace containing df and pd. Capture stdout.\"\"\"\n    namespace = {\"pd\": pd, \"df\": df.copy()}\n    buf = io.StringIO()\n    old_stdout, sys.stdout = sys.stdout, buf\n    try:\n        exec(compile(code, \"<tool>\", \"exec\"), namespace)\n        output = buf.getvalue().strip()\n        return output if output else \"(no output -- add print() calls to see results)\"\n    except Exception:\n        return f\"EXECUTION ERROR:\\n{traceback.format_exc()}\"\n    finally:\n        sys.stdout = old_stdout\n\n\nTOOL_DISPATCH = {\n    \"run_python\": lambda inp: execute_python(inp[\"code\"]),\n}\n\n\ndef run_tool_loop(\n    messages: list[dict],\n    tools: list[dict],\n    *,\n    system: str | None = None,\n    max_turns: int = 10,\n    verbose: bool = True,\n):\n    \"\"\"Run Claude with tools until stop_reason='end_turn' or max_turns is reached.\"\"\"\n    for turn in range(1, max_turns + 1):\n        kwargs = dict(model=MODEL, max_tokens=4096, tools=tools, messages=messages)\n        if system:\n            kwargs[\"system\"] = system\n        response = client.messages.create(**kwargs)\n\n        if response.stop_reason == \"end_turn\":\n            if verbose:\n                print(f\"[turn {turn}] end_turn -- done.\")\n            return response\n\n        if response.stop_reason == \"tool_use\":\n            messages.append({\"role\": \"assistant\", \"content\": response.content})\n            tool_results = []\n            for block in response.content:\n                if block.type == \"tool_use\":\n                    if verbose:\n                        code_preview = block.input.get(\"code\", str(block.input))[:80]\n                        print(f\"[turn {turn}] calling {block.name}({code_preview!r} ...)\")\n                    result = TOOL_DISPATCH.get(block.name, lambda _: \"ERROR: unknown tool\")(block.input)\n                    if verbose:\n                        print(f\"           -> {result[:120].replace(chr(10), ' | ')} ...\")\n                    tool_results.append({\n                        \"type\": \"tool_result\",\n                        \"tool_use_id\": block.id,\n                        \"content\": str(result),\n                    })\n            messages.append({\"role\": \"user\", \"content\": tool_results})\n\n    raise RuntimeError(f\"Tool loop hit max_turns={max_turns} without end_turn\")\n\n\n# First real run: total spend per warehouse\nmsgs = [{\"role\": \"user\", \"content\": \"What is the total credit spend per warehouse over the full dataset? Rank them highest to lowest.\"}]\nfinal = run_tool_loop(msgs, [RUN_PYTHON_TOOL])\nprint()\nprint(\"=== Claude's answer ===\")\nfor block in final.content:\n    if block.type == \"text\":\n        print(block.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": [
    "## Section 4 - Code execution: anomaly detection\n",
    "\n",
    "A harder question: find anomalous days across all warehouses without being told where to look.\n",
    "Claude will need at least two tool calls -- one to understand the typical spend baseline, one\n",
    "to flag outliers. Watch the turn log to see its reasoning path.\n",
    "\n",
    "This is the kind of task where code execution genuinely earns its keep: the answer requires\n",
    "real arithmetic over 630 rows that Claude cannot do reliably from its training knowledge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {},
   "outputs": [],
   "source": [
    "ANALYST_SYSTEM = (\n    \"You are a Snowflake cost analyst for Acme SaaS Co. \"\n    \"Use the run_python tool to analyse the warehouse_usage dataset. \"\n    \"Think step-by-step: first explore the data to establish baselines, \"\n    \"then apply a statistical threshold to identify anomalies. \"\n    \"Report findings with specific dates, warehouse names, and credit figures.\"\n)\n\nanomaly_msgs = [\n    {\n        \"role\": \"user\",\n        \"content\": (\n            \"Identify the top 5 most anomalous warehouse-day combinations by credit spend. \"\n            \"Use a statistical threshold (e.g. z-score or IQR) relative to each warehouse's \"\n            \"own baseline. Report the date, warehouse, actual credits, baseline, and deviation.\"\n        ),\n    }\n]\n\nfinal = run_tool_loop(\n    anomaly_msgs,\n    [RUN_PYTHON_TOOL],\n    system=ANALYST_SYSTEM,\n)\nprint()\nprint(\"=== Anomaly report ===\")\nfor block in final.content:\n    if block.type == \"text\":\n        print(block.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-9",
   "metadata": {},
   "source": [
    "## Section 5 - Code execution: self-correcting on error\n",
    "\n",
    "Claude's code is not always correct on the first attempt. This cell uses a deliberately\n",
    "tricky question (referencing a column name that does not exist) to show the self-correction\n",
    "loop in action: Claude gets an execution error, reads the traceback, corrects the code, and\n",
    "produces the right answer on the second attempt.\n",
    "\n",
    "No special retry logic needed -- the execution error is just another tool result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-10",
   "metadata": {},
   "outputs": [],
   "source": [
    "correction_msgs = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": (\n",
    "            \"What was the average credits_per_query ratio for WH_BI_M across July 2025? \"\n",
    "            \"Use the credits_per_query column.\"\n",
    "        ),\n",
    "    }\n",
    "]\n",
    "# Note: there is no 'credits_per_query' column -- Claude must discover this and derive it.\n",
    "\n",
    "final = run_tool_loop(correction_msgs, [RUN_PYTHON_TOOL])\n",
    "print()\n",
    "print(\"=== Answer after self-correction ===\")\n",
    "for block in final.content:\n",
    "    if block.type == \"text\":\n",
    "        print(block.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-11",
   "metadata": {},
   "source": [
    "## Section 6 - Built-in server-side tool: web search\n",
    "\n",
    "The `web_search_20250305` tool is executed by Anthropic's infrastructure, not your code.\n",
    "You declare it in `tools`, Claude calls it, and the response arrives with the search results\n",
    "already incorporated -- no execution loop needed.\n",
    "\n",
    "Contrast with the `run_python` tool above: there, `stop_reason=\"tool_use\"` required us to\n",
    "run code and return a result. For web search, a single `client.messages.create()` call\n",
    "produces the complete answer.\n",
    "\n",
    "Note: web search requires an active internet connection and counts toward your API usage.\n",
    "If the feature is not enabled on your API key, the call will raise an error -- that is expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-12",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    web_response = client.messages.create(\n",
    "        model=MODEL,\n",
    "        max_tokens=1024,\n",
    "        tools=[\n",
    "            {\n",
    "                \"type\": \"web_search_20250305\",\n",
    "                \"name\": \"web_search\",\n",
    "            }\n",
    "        ],\n",
    "        messages=[{\n",
    "            \"role\": \"user\",\n",
    "            \"content\": (\n",
    "                \"What is the current Snowflake enterprise credit pricing for AWS US East? \"\n",
    "                \"Find the official published rate.\"\n",
    "            ),\n",
    "        }],\n",
    "    )\n",
    "\n",
    "    print(f\"stop_reason: {web_response.stop_reason}\")\n",
    "    print(f\"Content blocks: {len(web_response.content)}\")\n",
    "    print()\n",
    "    for block in web_response.content:\n",
    "        if block.type == \"text\":\n",
    "            print(block.text)\n",
    "\n",
    "except anthropic.BadRequestError as e:\n",
    "    print(f\"Web search not available on this API key: {e}\")\n",
    "    print(\"This is expected if web search is not enabled for your account.\")\n",
    "except Exception as e:\n",
    "    print(f\"Unexpected error: {type(e).__name__}: {e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-13",
   "metadata": {},
   "source": [
    "## Section 7 - Multi-turn diagnostic agent\n",
    "\n",
    "A longer task: ask Claude to run a full cost spike diagnosis for one warehouse over the\n",
    "dataset period, following the five-step diagnosis playbook from the Cost Runbook.\n",
    "\n",
    "Claude will need several tool calls:\n",
    "1. Identify which warehouse had the most anomalous period.\n",
    "2. Narrow the time window.\n",
    "3. Compute baseline vs spike statistics.\n",
    "4. Summarise findings in a format that matches the runbook's output expectations.\n",
    "\n",
    "This shows how a multi-turn tool loop naturally produces structured, reasoned analysis\n",
    "without any explicit step-by-step orchestration in your code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-14",
   "metadata": {},
   "outputs": [],
   "source": [
    "diagnostic_msgs = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": (\n",
    "            \"Run a cost spike diagnosis for WH_BI_M over the Q3 2025 period. \"\n",
    "            \"Follow these steps:\\n\"\n",
    "            \"1. Compute the warehouse's 30-day rolling median as the baseline.\\n\"\n",
    "            \"2. Identify the window where spend exceeded 1.25x the rolling median.\\n\"\n",
    "            \"3. Calculate total excess credits during that window.\\n\"\n",
    "            \"4. Estimate the approximate USD cost if the enterprise rate is $4 per credit.\\n\"\n",
    "            \"5. Write a 3-sentence incident summary suitable for a P2 post-mortem.\"\n",
    "        ),\n",
    "    }\n",
    "]\n",
    "\n",
    "final = run_tool_loop(\n",
    "    diagnostic_msgs,\n",
    "    [RUN_PYTHON_TOOL],\n",
    "    max_turns=12,\n",
    ")\n",
    "print()\n",
    "print(\"=== Diagnostic report ===\")\n",
    "for block in final.content:\n",
    "    if block.type == \"text\":\n",
    "        print(block.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: this notebook routes all model calls through a standalone anthropic client\n# (client.messages.create), not through cc.complete(). cc.records will be empty;\n# use resp.usage on individual responses to inspect per-call token counts.\ncc.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-16",
   "metadata": {},
   "source": [
    "## Section 8 - Practitioner Lab\n",
    "\n",
    "Open-ended extension. No reference solution.\n",
    "\n",
    "**Goal:** add a second tool -- `get_schema` -- and extend the dispatch loop.\n",
    "\n",
    "**Background:** The `run_python` tool works well when Claude already knows the column names\n",
    "and data types. In a real system, Claude may be given a new dataset it has never seen.\n",
    "A `get_schema` tool lets Claude inspect the DataFrame before writing analysis code.\n",
    "\n",
    "**Task:** implement the following tool and wire it into `TOOL_DISPATCH`:\n",
    "\n",
    "```python\n",
    "GET_SCHEMA_TOOL = {\n",
    "    \"name\": \"get_schema\",\n",
    "    \"description\": (\n",
    "        \"Return the schema of a named dataset: column names, dtypes, row count, \"\n",
    "        \"and the first 3 rows as a sample. Use this before run_python when you \"\n",
    "        \"are unsure about column names or data types.\"\n",
    "    ),\n",
    "    \"input_schema\": {\n",
    "        \"type\": \"object\",\n",
    "        \"properties\": {\n",
    "            \"dataset\": {\n",
    "                \"type\": \"string\",\n",
    "                \"description\": \"Dataset name. Available: 'warehouse_usage'.\",\n",
    "            }\n",
    "        },\n",
    "        \"required\": [\"dataset\"],\n",
    "    },\n",
    "}\n",
    "```\n",
    "\n",
    "**Test:** ask Claude a question without mentioning any column names -- e.g., \"Find the day\n",
    "with the worst queue congestion across all warehouses.\" Claude should call `get_schema` first,\n",
    "then `run_python` with the correct column names.\n",
    "\n",
    "**Stretch:** add a `save_result` tool that writes a Python dict (keys: `title`, `value`,\n",
    "`unit`) to a `results` list. After the loop, format the collected results as a simple\n",
    "markdown report. This is the beginning of a structured output pipeline from an agentic loop\n",
    "-- a pattern C2 extends further.\n",
    "\n",
    "Why this matters: real agentic systems almost always need a \"discovery\" tool and an\n",
    "\"execution\" tool as a pair. The discovery tool prevents Claude from hallucinating column\n",
    "names or API shapes it does not know; the execution tool acts on what it discovers. Getting\n",
    "the pairing right is one of the most impactful prompt engineering decisions in a tool-using\n",
    "agent.\n",
    "\n",
    "---\n",
    "\n",
    "*Companion article: C1 - Built-in Tools: Code Execution, Web Search, and the Tool Use Loop.*\n",
    "*Next notebook: C2_custom_tools.ipynb*"
   ]
  }
 ]
}