Concept · 2026-05-17
Function calling — how LLMs invoke tools
Function calling is how modern LLMs invoke external tools (APIs, databases, code execution). OpenAI launched it June 2023; the pattern is now table-stakes across every major vendor.
Definition
Function calling (also called "tool use" in Anthropic terminology) is an LLM capability where the model emits a structured tool invocation — function name + JSON-shape arguments — instead of free-text. The runtime executes the function and feeds the result back; the model continues with the result in context.
Concretely: you give the model a JSON schema of available tools. The model decides when to call which tool. It emits a message with tool_calls field. You execute the function. You append the result as a tool message. You ask the model to continue. The model produces the final user-facing response (or calls another tool).
Why it matters
Function calling closed the loop between LLM and the deterministic world. Before it, you had to parse free-text responses to extract structured intent (fragile). With it, the model emits structured output that's directly machine-readable.
Every modern agent loop, RAG retriever, code-running assistant, and tool-augmented chatbot is built on function calling. It's the primitive.
A short history
- June 2023 — OpenAI introduces function calling in the Chat Completions API.
tool_choice+toolsparameters. Initial release with GPT-3.5 + GPT-4. - 2023-09 — OpenAI parallel tool calling (model can request multiple tool calls in a single turn).
- 2023-11 — Anthropic adds tool use to Claude 2.1 (later refined for Claude 3 family).
- 2024 — Google Gemini, Meta Llama 3.1+, Mistral, Cohere, and most open-weight frontier models add function calling.
- 2024-11 — Anthropic releases Model Context Protocol (MCP) — open standard for tool exposure across vendors. Adopted by OpenAI and most agent frameworks within ~6 months.
The JSON schema
Function calling specifies tools via JSON Schema (OpenAI) or a similar structure. Example:
{
"type": "function",
"function": {
"name": "verify_claim",
"description": "Verify a natural-language factual claim about AI/ML.",
"parameters": {
"type": "object",
"properties": {
"claim": {
"type": "string",
"description": "Natural-language claim to verify"
},
"min_confidence": {
"type": "number",
"description": "Minimum confidence threshold (0.0-1.0)",
"default": 0.85
}
},
"required": ["claim"]
}
}
}The model sees this schema, decides when to call the function, fills in the arguments per the schema, and emits a structured tool_calls block.
The agent loop
- Send messages + tool schemas to the model
- If model emits
tool_calls: execute each tool, append results astoolmessages, loop back to step 1 - If model emits a final text message: done
Parallel tool calls let the runtime execute multiple tools concurrently before looping. Streaming tool calls let the model emit tool requests mid-stream (without waiting for full response).
Vendor flavors
| Vendor | Field name | Multi-call | Streaming |
|---|---|---|---|
| OpenAI | tool_calls | Yes (parallel) | Yes |
| Anthropic | tool_use blocks | Yes | Yes |
| Google Gemini | functionCall | Yes | Yes |
| Mistral | tool_calls (OpenAI-compatible) | Yes | Yes |
| Llama 3.1+ | via tool-token format (Llama-Stack) | Yes | Yes |
Cross-vendor portability — MCP
Each vendor's tool-call format is slightly different, which has historically forced per-vendor adapter code. Anthropic's Model Context Protocol (November 2024) is the cross-vendor standard:
- Tool servers expose tools via MCP
- Clients (any LLM vendor) connect to MCP servers
- One tool definition works across OpenAI, Anthropic, Gemini, Mistral, etc.
As of 2025, MCP adoption is high enough that new agent frameworks default to it. The vendor-specific tool formats remain but are increasingly thin wrappers over MCP-compatible tool definitions.
Common production patterns
Retrieval-then-cite
The model has a search_knowledge_base tool. When the user asks a question, the model calls the search tool, gets back relevant chunks, then composes an answer citing them. Compare with RAG vs VERITAS.
Verify-before-asserting
The model has a verify_claim tool. When the model is about to assert a factual claim, it calls verify first; only asserts confirmed claims. This is the AI agent grounding use case.
Code execution
The model has a run_python tool. For math / calculation / data-manipulation queries, the model writes code and runs it instead of computing in-head.
Multi-tool composition
The model has 10+ tools (search, calendar, email, database, calculator, etc.). It chains tool calls to accomplish complex tasks. The OpenAI Assistants API + Anthropic computer use target this shape.
Anti-patterns
- Tool descriptions too vague. The model decides when to call a tool based on the description. Vague description = mis-called or under-called tool. Write descriptions like API docs.
- Too many tools. 5-10 tools per agent is generally fine. 50+ tools degrades model performance — consider sub-agent decomposition or hierarchical tool routing.
- Trusting model arguments unconditionally. Models hallucinate JSON. Validate arguments before executing — Pydantic, JSON Schema validators, Instructor, Pydantic AI.
- No tool-call observability. Log every tool call + arguments + result. You'll need it when debugging why the agent did something weird.
Related
- LLM grounding — function calling enables verify-before-asserting
- Agent frameworks topic hub
- AI agent grounding use case
- OpenAI tool-calls integration
- Anthropic SDK tool-use integration
- Pydantic AI — type-safe variant