Technical Reference
AI / ML Glossary A comprehensive, searchable glossary of AI and machine learning terminology. Every definition is technically accurate, detailed, and written for engineers building AI-powered systems.
All (69) Protocols (7) Architecture (8) Training (9) Inference (16) Data (11) Safety (3) Agents (4) Deployment (11)
A B C D E F G H I J K L M O P Q R S T V W Z
69 terms found
A
A2A (Agent-to-Agent Protocol) Protocols Google's open protocol enabling AI agents built on different frameworks to communicate, collaborate, and delegate tasks to each other.
AEO (Answer Engine Optimization) Data The strategy of optimizing content specifically to be selected as the source for direct answers provided by AI assistants and answer engines.
Agent Framework Agents Software libraries and platforms that provide the infrastructure for building AI agents, including tool management, memory, planning, and orchestration capabilities.
Agentic AI Agents AI systems that can autonomously plan, reason, use tools, and take multi-step actions to accomplish complex goals with minimal human intervention.
AI SEO Data The comprehensive practice of optimizing content and digital presence for both traditional search engines and AI-powered discovery and answer systems.
API Rate Limiting Deployment Controls imposed by AI API providers that restrict the number of requests, tokens, or concurrent connections a client can use within a given time period.
API vs Open Weight Deployment The strategic choice between accessing AI models through cloud APIs versus self-hosting open-weight models, each offering distinct trade-offs in cost, control, and capability.
Attention Mechanism Architecture The core computational operation in Transformers that allows each token in a sequence to dynamically attend to and weight information from all other tokens.
B
Batch Processing Deployment Running multiple AI inference requests together as a batch, often at lower cost and lower priority, suitable for non-real-time workloads.
C
Chain of Thought Architecture A prompting technique that improves model reasoning by instructing or encouraging the model to show intermediate reasoning steps before providing a final answer.
Claude Inference A family of large language models developed by Anthropic, known for strong reasoning, instruction following, safety alignment, and long context capabilities.
Closed Source Models Deployment Proprietary AI models accessible only through vendor APIs, where the model weights, architecture details, and training data are not publicly shared.
Computer Use Agents The ability of AI models to interact with computer interfaces by viewing screenshots and executing mouse clicks, keyboard input, and other desktop actions.
Constitutional AI Safety An alignment approach developed by Anthropic where AI models are trained to follow a set of principles (a constitution) that guide their behavior toward being helpful, harmless, and honest.
Context Engineering Architecture The emerging discipline of systematically designing and optimizing the information provided to AI models to maximize output quality and reliability.
Context Protocol Protocols A general term for standardized protocols that define how AI models receive, process, and interact with contextual information from external systems.
Context Window Architecture The maximum amount of text (measured in tokens) that a language model can process in a single request, including both input and output.
Cosine Similarity Data A mathematical metric that measures the angular similarity between two vectors, widely used to compare embeddings and determine semantic relatedness.
D
DPO (Direct Preference Optimization) Training A simplified alternative to RLHF that directly optimizes language models on human preference data without needing a separate reward model.
E
Embeddings Data Dense vector representations of text, images, or other data that capture semantic meaning in a numerical format suitable for similarity search and machine learning.
F
Few-shot Learning Training A technique where a model learns to perform a task from a small number of examples provided in the prompt, without any parameter updates.
Fine-tuning Training The process of further training a pre-trained model on a specific dataset to specialize its behavior for particular tasks or domains.
Function Calling Protocols A capability that allows LLMs to generate structured JSON arguments for predefined functions, enabling models to interact with external systems and APIs.
G
Gemini Inference Google's family of natively multimodal AI models designed to understand and reason across text, images, audio, video, and code simultaneously.
GEO (Generative Engine Optimization) Data The practice of optimizing content to be effectively surfaced, cited, and synthesized by AI-powered generative search engines and answer systems.
GPT Inference Generative Pre-trained Transformer, OpenAI's series of large language models including GPT-4, GPT-4o, and the o-series reasoning models.
Grounding Data The process of anchoring AI model outputs in verifiable, factual information from specific sources to reduce hallucination and improve accuracy.
Guardrails Safety Safety mechanisms and validation layers that constrain AI model behavior, prevent harmful outputs, and ensure responses meet quality and policy requirements.
H
Hallucination Safety When an AI model generates plausible-sounding but factually incorrect, fabricated, or unsupported information in its outputs.
I
In-context Learning Training The ability of LLMs to learn and adapt their behavior based on information provided in the prompt context, without modifying model parameters.
Inference Inference The process of running a trained AI model on input data to generate predictions or outputs, as opposed to the training phase where the model learns its parameters.
J
JSON Mode Inference A model configuration that ensures outputs are valid JSON, a simpler alternative to full structured output when only JSON validity (not schema conformance) is needed.
JSON-RPC Protocols A lightweight remote procedure call protocol using JSON for data encoding, used as the communication format in protocols like MCP.
K
Knowledge Graph Data A structured representation of real-world entities and their relationships, enabling AI systems to reason over connected information and answer complex queries.
L
Latency Deployment The time delay between sending a request to an AI model and receiving the response, a critical performance metric for real-time AI applications.
Llama Inference Meta's family of open-weight large language models that have become foundational to the open-source AI ecosystem.
LoRA (Low-Rank Adaptation) Training A parameter-efficient fine-tuning method that trains small, low-rank matrices alongside frozen model weights, drastically reducing the resources needed for model customization.
M
MCP (Model Context Protocol) Protocols An open protocol by Anthropic that standardizes how AI models connect to external tools, data sources, and services through a unified client-server architecture.
Model Distillation Training A technique for training a smaller, faster student model to replicate the behavior of a larger, more capable teacher model, preserving quality while reducing resource requirements.
Multi-agent Systems Agents Architectures where multiple specialized AI agents collaborate, delegate tasks, and communicate to solve problems that are too complex for a single agent.
Multimodal Inference AI models and systems that can process, understand, and generate content across multiple data types including text, images, audio, and video.
O
Ontology Data A formal specification of concepts, categories, and relationships within a domain, providing a shared vocabulary and structure for organizing knowledge.
Open Source Models Deployment AI models whose weights are publicly released, allowing anyone to download, use, modify, and deploy them without per-query API costs.
P
Pre-training Training The initial large-scale training phase where a language model learns general language understanding by predicting text on massive datasets.
Prompt Caching Deployment An optimization that caches the processed representation of frequently reused prompt prefixes, reducing latency and cost for requests with common system prompts or context.
Prompt Engineering Architecture The practice of designing and refining input prompts to effectively guide AI model behavior and improve the quality of generated outputs.
Q
Quantization Deployment A technique that reduces model size and speeds up inference by representing model weights with lower-precision numbers, such as 8-bit or 4-bit integers instead of 16-bit floats.
R
RAG (Retrieval-Augmented Generation) Architecture An architecture pattern that enhances LLM responses by retrieving relevant information from external knowledge bases before generating answers.
Retrieval Data The process of finding and fetching relevant information from external data sources to provide as context for AI model generation.
RLHF (Reinforcement Learning from Human Feedback) Training A training technique that uses human preference judgments to fine-tune language models, aligning their outputs with human values and expectations.
S
Semantic Search Data A search approach that finds results based on the meaning and intent of a query rather than exact keyword matching, typically powered by vector embeddings.
Speech-to-Text Inference AI technology that transcribes spoken audio into written text, enabling voice interfaces, meeting transcription, and audio content processing.
SSE (Server-Sent Events) Protocols A web standard for pushing real-time updates from server to client over HTTP, widely used for streaming AI model responses and protocol communications.
Streaming Deployment The technique of delivering AI model outputs incrementally as they are generated, rather than waiting for the complete response before sending anything to the client.
Structured Output Inference A capability that constrains language model outputs to conform to a specific schema (like JSON Schema), guaranteeing parseable and type-safe responses.
System Prompt Architecture A special instruction provided at the beginning of a conversation that defines the AI model's behavior, role, constraints, and response format.
T
Temperature Inference A parameter that controls the randomness of model outputs, where lower values produce more focused responses and higher values increase creativity and diversity.
Text-to-Speech Inference AI technology that converts written text into natural-sounding spoken audio, enabling voice interfaces and audio content generation.
Throughput Deployment The total number of tokens or requests an AI system can process per unit of time, measuring the system's capacity for handling concurrent workloads.
Tokenization Inference The process of converting text into a sequence of tokens (sub-word units) that a language model can process, using algorithms like BPE or SentencePiece.
Tokens per Second Deployment A performance metric measuring how many tokens an AI model can generate per second, directly impacting response speed and user experience.
Tool Use Protocols The broader capability of AI models to interact with external tools, APIs, and systems to accomplish tasks beyond text generation.
Top-k Sampling Inference A sampling strategy that restricts token selection to the k highest-probability tokens at each generation step, providing a simple way to control output diversity.
Top-p (Nucleus Sampling) Inference A sampling strategy that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p, balancing quality and diversity.
Transformer Architecture The neural network architecture underlying virtually all modern large language models, based on self-attention mechanisms that process sequences in parallel.
V
Vector Database Data A specialized database designed to store, index, and query high-dimensional vector embeddings efficiently, enabling fast similarity search at scale.
Vision Inference The capability of AI models to understand and analyze images, enabling tasks like image description, visual question answering, and document analysis.
W
Whisper Inference OpenAI's open-source speech recognition model that provides robust multilingual transcription and translation capabilities across diverse audio conditions.
Z
Zero-shot Learning Training The ability of a model to perform a task correctly based solely on instructions, without any task-specific examples provided in the prompt.
CLAUDE_GLOSSARY
The definitive technical reference for AI protocols, standards, and integration patterns. Built for engineers building the future of AI-powered applications.
Stay Current
Get weekly protocol updates and analysis delivered to your inbox.
2026 Claude Glossary. All rights reserved.
claudeglossary.com