Azure AI, the practical way
An architecture-first reference for the Microsoft Azure AI stack as of June 2026. The platform formerly called Azure AI Foundry is now Microsoft Foundry - one surface for models, agents, evaluation, and governance. This portal covers Foundry, the model catalog, Foundry Agent Service, Copilot, and the silicon - trade-offs and risks, no marketing.
ai-foundry / foundry). Azure OpenAI models now live inside Foundry as part of Foundry Models. Agent 365 is the Microsoft-365-side governance layer for agents. Same lineage, new packaging.Azure's 2026 AI story has three pillars. Microsoft Foundry is the unified build platform: Foundry Models (OpenAI GPT-5.x, o-series, plus Llama, Mistral, DeepSeek, Phi, Nemotron, and partner catalogs), Foundry Agent Service (GA - Responses-API runtime, MCP + A2A, connected multi-agent), a model router, evaluations, and observability. Copilot is the distribution engine - Microsoft 365 Copilot, Copilot Studio, GitHub/Security Copilot, governed by Agent 365. Underneath sit Azure's data services (AI Search, Cosmos/SQL/PostgreSQL vectors, Fabric) and custom silicon (Maia, Cobalt) alongside NVIDIA GPUs. If you are a Microsoft shop with OpenAI ambitions and M365 reach, this stack is the default - the cost is keeping up with the fastest-moving naming in the industry.
The Azure AI mental model
What sets Azure apart in 2026
| Differentiator | What it means in practice |
|---|---|
| First access to OpenAI frontier | GPT-5.x (5.4, 5.4 Mini, 5.5) and o-series land on Foundry with enterprise SLAs, private networking, and quota tiers - often the cleanest enterprise path to OpenAI's newest models. |
| Open standards in the runtime | Foundry Agent Service speaks MCP, A2A, and OpenAPI natively - connected agents, tool reuse, and cross-vendor interop without protocol lock-in. |
| M365 distribution | One-click publish from Foundry to Microsoft 365 Copilot and Teams; Agent 365 gives a single registry and guardrails across every agent in the tenant. |
| Model breadth + router | OpenAI, Anthropic, Meta, Mistral, DeepSeek, Microsoft Phi, NVIDIA Nemotron, Fireworks-hosted open models - and a model router that auto-picks the cheapest model that clears quality. |
| Entra identity + governance | Identity, Content Safety, Foundry Control Plane, Observability (tracing, evals, continuous red-teaming) are first-class, not bolt-ons. |
Where Azure is weaker (be honest)
How to read this portal
Each service tab follows the same shape: Overview, Architecture, Capabilities, Pricing, Risks, When to use. If you only read one sub-tab, read Risks & gotchas. The others tell you what something does; Risks tells you what bites you in production.
What's New - Ignite 2025 through June 2026
Material changes that affect architecture, cost, or risk. Curated, not a press-release dump.
Three threads dominate. One: the platform rebrand - Azure AI Foundry to Microsoft Foundry, with Foundry Agent Service reaching GA (Responses-API runtime, private networking, MCP OAuth passthrough) and Observability GA. Two: the GPT-5 cadence - 5.4, 5.4 Mini, then 5.5 - plus a model router, Priority Processing, Phi-4 vision/reasoning, GPT-image-2, and Fireworks-hosted open models (DeepSeek V3.2, gpt-oss, Kimi, MiniMax). Three: agent governance across the tenant - Agent 365, the Agent registry, and Foundry Control Plane.
| Date | Release | Why it matters |
|---|---|---|
| Nov 2025 | Ignite 2025: Foundry Agent Service, Foundry Control Plane (preview), Agent 365 | Production agent runtime + one-click publish to M365/Teams; a single place to govern any agent (Foundry, Copilot Studio, third-party) with guardrails on inputs/outputs/tool calls. |
| Nov 2025 | Observability GA; Microsoft Agent Framework (AutoGen + Semantic Kernel lineage) | Evals + OpenTelemetry tracing + continuous red-teaming + Azure Monitor; a code-first agent framework supporting AG-UI and ChatKit front-ends. |
| Dec 2025 | Azure AI Foundry to Microsoft Foundry rebrand underway | One platform brand; Azure OpenAI becomes part of Foundry Models. Watch SDK/role/URL changes. |
| Jan 2026 | Model router; SDK 2.0 GA | Auto-select the optimal model per prompt to cut cost while holding quality; stable SDK surface. |
| Mar 2026 | GPT-5.4 (GA), GPT-5.4 Mini, Phi-4 Vision, Priority Processing, new evaluations | 5.4 targets agent reliability (task-drift, mid-workflow failures, tool-call consistency); Mini for cheap classify/extract; Priority Processing reserves low-latency compute lanes. |
| Mar 2026 | Foundry Agent Service GA runtime: Responses API, end-to-end private networking, MCP OAuth passthrough | Production-ready agent hosting with private networking and standardized tool auth. Migrate 2025 agent pilots here. |
| Apr-May 2026 | GPT-5.5; GPT-image-2 (4K); Fireworks AI open models; Nemotron first-class | Latest frontier OpenAI tier; high-res image gen; DeepSeek V3.2 / gpt-oss-120b / Kimi K2.5 / MiniMax M2.5 hosted; broader open-model choice in one catalog. |
| 2026 | MCP + A2A first-class in Agent Service; connected (multi-)agents | Agents call agents as tools and interoperate across vendors via open standards - real multi-agent systems, with the governance burden that implies. |
Service Map
The Azure AI services worth knowing, grouped by what you do with them.
Formerly Azure AI Foundry. Models, agents, router, evaluations, observability, content safety - one build platform.
OpenAI GPT-5.x & o-series, plus Llama, Mistral, DeepSeek, Phi-4, Nemotron, Fireworks-hosted open models.
GA. Responses-API runtime, MCP + A2A, connected multi-agent, private networking, observability.
Microsoft 365 Copilot, Copilot Studio, GitHub/Security/Azure Copilot - governed by Agent 365.
Vision, Document Intelligence, Language, Speech, Translator, Content Understanding.
Azure AI Search (vector + hybrid + semantic ranker), Cosmos/SQL/PostgreSQL vectors, Microsoft Fabric.
Maia AI accelerators, Cobalt ARM CPUs, ND-series NVIDIA GPUs (GB200), AI infrastructure.
Content filters, Control Plane guardrails, Agent registry, evaluations, continuous red-teaming.
Azure AI Search retrieval, "On Your Data", Bing grounding, Fabric data agents.
Microsoft Foundry was Azure AI Foundry
The unified platform to choose models, build and govern agents, evaluate, and ship - the center of gravity for AI on Azure.
Foundry is the single pane for the whole AI lifecycle on Azure: a 1000+ model catalog (Foundry Models) with a router, a managed Agent Service, prompt flow, fine-tuning/distillation, evaluations, content safety, and observability - all under Entra identity, private networking, and Azure billing. You organize work in projects inside a Foundry resource/hub, and promote from experiment to production without leaving the platform.
What problem this solves
Enterprises don't want to wire together a model API, a vector store, a guardrail service, an eval harness, an agent orchestrator, and a monitoring stack from separate vendors - each with its own identity and billing. Foundry's offer is one governed surface where you swap models without rewriting the app, apply the same Content Safety policy across every model, and trace/evaluate agents in production. The trade-off is breadth: the platform is large and renaming fast, so onboarding has a real learning curve.
The building blocks
| Concept | What it is |
|---|---|
| Foundry resource / hub | The top-level Azure resource that holds shared config, connections, and security boundaries. |
| Project | A workspace for a use case - models, data connections, agents, evaluations, and deployments scoped together. |
| Foundry Models | The model catalog: OpenAI, Microsoft, and partner/open models, sold directly by Azure or via the marketplace. |
| Foundry Agent Service | The managed runtime for production agents (see its own tab). |
| Evaluations & Observability | Quality/safety evaluation, OpenTelemetry tracing, continuous red-teaming, Azure Monitor. |
Reference architecture
Network and identity
Foundry projects support private endpoints (Private Link) so model and agent traffic never traverses the public internet. Authentication is Entra ID; apps use managed identities and RBAC scoped to the Foundry resource, project, and deployment. Secrets and keys belong in Key Vault, and you can enforce customer-managed keys (CMK) for data at rest. For regulated workloads, combine private networking, CMK, no-public-egress NSG rules, and Defender for AI monitoring.
Where the data goes
Microsoft's stated position is that prompts and completions in Azure OpenAI / Foundry Models are not used to train the foundation models, and data stays within your Azure tenant and chosen region/data-zone. You control whether request/response logging is enabled. For data residency, use region- or data-zone-pinned deployments and confirm the specific model's availability there before designing around it.
Capability matrix (June 2026)
| Capability | Status | Notes |
|---|---|---|
| Model catalog + router | ● | 1000+ models; router auto-selects the cheapest model that clears quality. |
| Foundry Agent Service | ● | GA - Responses API runtime, MCP + A2A, connected multi-agent. |
| Evaluations | ● | Automated + LLM-judge quality/safety evals, including agent evals. |
| Observability | ● | GA - OpenTelemetry tracing, continuous red-teaming, Azure Monitor. |
| Content Safety | ● | In-line filters: hate/sexual/violence/self-harm, jailbreak/prompt-shield, groundedness, protected material. |
| Fine-tuning / distillation | ● | Supervised fine-tuning, distillation; reinforcement methods on select models. |
| Prompt Flow | ● | Author, test, and deploy prompt/orchestration flows. |
| Private networking | ● | Private Link / VNet integration end to end. |
| Provisioned Throughput (PTU) | ● | Reserved capacity for predictable latency/cost at volume. |
| Priority Processing | ◐ | Preview - dedicated low-latency compute lanes for real-time agents/chat. |
How Foundry bills
| Mode | How you pay | Best for |
|---|---|---|
| Standard (pay-as-you-go) | Per input/output token, per model. | Prototyping, variable/low volume, model comparison. |
| Provisioned Throughput (PTU) | Reserved throughput units (hourly/monthly/annual reservations). | Steady high volume needing predictable latency and cost. |
| Priority Processing | Premium for reserved low-latency lanes. | Customer-facing real-time chat / agents with strict latency. |
| Fine-tuning | Training tokens + hosting of the tuned deployment. | Narrow tasks where a tuned small model beats prompting a large one. |
| Agent Service / tools | Underlying model tokens x steps + tool/runtime charges. | Production agents - watch step count. |
- Use Foundry when you are on Azure and want one governed surface for models, agents, evals, and safety - which is almost every Azure GenAI workload.
- Lead with the model router + GPT-5.x Mini for cost; reserve PTU once volume is steady.
- Go straight to Agent Service for anything heading to production rather than hand-rolling an orchestrator.
- Drop to raw endpoints / custom stack only for a specific capability Foundry doesn't cover.
Foundry Models
The model catalog behind Foundry - OpenAI frontier, Microsoft Phi, and a broad partner/open selection, with a router to pick between them.
| Family | Examples (June 2026) | Use |
|---|---|---|
| OpenAI flagship | GPT-5.5, GPT-5.4, GPT-5.4 Mini, o-series | Hardest reasoning, agents, coding; Mini for cheap classify/extract/tool-calls. |
| OpenAI media | GPT-image-2 (4K), TTS / Realtime | Image generation/editing and voice. |
| Microsoft (own) | Phi-4, Phi-4 Vision, Phi-4 Reasoning Vision 15B | Small, efficient multimodal/reasoning models; on-prem/edge via Foundry Local. |
| Open / partner | Llama, Mistral, DeepSeek V3.2, NVIDIA Nemotron, Anthropic Claude | Open-weight customization, cost, or specific-vendor strengths. |
| Fireworks-hosted | gpt-oss-120b, Kimi K2.5, MiniMax M2.5 | High-performance open-model inference without standing up your own serving. |
Foundry Agent Service GA
The managed runtime for production agents on Azure - Responses-API based, with MCP, A2A, connected multi-agent, private networking, and observability.
Agent Service turns a model + instructions + tools + knowledge into a managed, stateful agent you don't have to host. The 2026 GA runtime is built on the Responses API with threads/state, end-to-end private networking, and standardized tool auth (MCP OAuth passthrough). It speaks MCP, A2A, and OpenAPI, and supports connected agents - agents calling other agents as tools - so you can compose specialists instead of building one monolith.
What problem this solves
Hand-built agent loops are easy to prototype and hard to operate: state, retries, tool auth, networking, tracing, and safety all become your problem. Agent Service makes those managed concerns and standardizes the integration surface (MCP/A2A/OpenAPI) so tools and other agents plug in without bespoke glue. You publish to Microsoft 365 Copilot and Teams in one click and govern everything through Agent 365 and the Foundry Control Plane.
Reference architecture
Tools & protocols
| Surface | What it gives you |
|---|---|
| MCP (Model Context Protocol) | Connect external MCP servers as governed tools, with OAuth passthrough for delegated auth. |
| A2A (Agent-to-Agent) | Call other agents - your own or third-party - as interoperable endpoints. |
| OpenAPI tools | Wrap any REST API as a tool from its spec. |
| Connected agents | Compose specialist agents; an orchestrator delegates subtasks. |
| Hosted tools | Bing grounding, file search, code interpreter, browser, Logic Apps / Functions. |
| Knowledge | Azure AI Search, "On Your Data", Cosmos/SQL/PostgreSQL vectors, Fabric data agents. |
- Use Agent Service for any agent heading to production - you get managed state, private networking, MCP/A2A, and observability for free.
- Use connected agents / A2A when the problem decomposes into specialists; keep a single monolith only for simple flows.
- Pair with the model router and Priority Processing for cost and latency control.
- Govern through Agent 365 before publishing to M365/Teams.
Governance & Safety
The controls that make agents and models safe to run in an enterprise tenant.
| Control | What it does |
|---|---|
| Azure AI Content Safety | In-line filters for hate/sexual/violence/self-harm, plus prompt shields (jailbreak), groundedness detection, and protected-material checks - applied to any model. |
| Foundry Control Plane | Govern every agent in one place (Foundry, Copilot Studio, third-party) with consistent guardrails across inputs, outputs, tool calls, and tool responses. |
| Agent 365 + Agent registry | Tenant-wide discovery, identity, and management of all agents, with admin guardrails and DLP. |
| Observability | Evaluations, OpenTelemetry tracing, continuous red-teaming, Azure Monitor insights. |
| Defender for AI / Purview | Threat protection for AI workloads and data governance/compliance across prompts and outputs. |
| Entra identity | Managed identities, RBAC, conditional access - the same identity plane as the rest of Azure. |
Azure vs AWS vs OCI vs GCP
A practitioner's quick read. Every cloud does the basics; the differences are in defaults, data gravity, and silicon.
| Dimension | Azure | AWS | OCI | GCP |
|---|---|---|---|---|
| Frontier model | OpenAI GPT-5.x | Nova (mid); Claude hosted | None (partners) | Gemini 3.x |
| Model breadth (managed) | Foundry Models (1000+) | Bedrock (widest) | Broad (OCI Gen AI) | Model Garden (200+) |
| Agents | Foundry Agent Service + MCP/A2A | AgentCore | Enterprise AI Agents | Agent Platform + A2A |
| Custom silicon | Maia (emerging) | Trainium/Inferentia | GPU (NVIDIA) | TPU (Ironwood/8th) |
| Data gravity | Fabric / OneLake | S3 / Redshift | Oracle DB 26ai (in-DB vectors) | BigQuery |
| Distribution | Microsoft 365 | Console / partners | Oracle apps / EBS | Workspace |
| Best when | Microsoft shop; want OpenAI frontier + M365 reach | Already on AWS; want model choice + silicon economics | Run Oracle DB/EBS; want in-DB vectors + sovereignty | BigQuery/Workspace central; want Gemini + TPU full stack |
Sources
Primary Microsoft material used for this portal (June 2026). Names and versions are mid-transition - confirm in current docs before designing.
- Microsoft Foundry (Azure AI Foundry) · Foundry docs
- Foundry Agent Service overview · MCP tools · A2A
- What's new in Microsoft Foundry - Mar 2026 (and Apr/May 2026 editions)
- GPT-5 in Azure AI Foundry · GPT-5.5 in Microsoft Foundry
- Foundry Agent Service at Ignite 2025
- Azure AI Content Safety
Model Router
One endpoint that auto-selects the cheapest model clearing your quality bar - per prompt.
Instead of hard-coding GPT-5.5 (expensive) or GPT-5.4 Mini (cheap) at every call site, you target the router. It classifies each request and dispatches to the model that meets the quality target at the lowest cost and latency, escalating to stronger models only when the prompt needs it. For mixed workloads - where most requests are easy and a few are genuinely hard - it is one of the simplest cost levers on the platform.
| Use the router when | Skip it when |
|---|---|
| Workload mixes easy and hard prompts; you want cost savings without re-engineering call sites. | You need a single fixed model version for reproducibility or a compliance attestation. |
| You can tolerate a small classification step before dispatch. | Latency budget is so tight the routing hop is unacceptable. |
Knowledge & RAG - Azure AI Search
Keep answers grounded in your data. Azure AI Search is the default retrieval engine; several databases can also serve vectors.
| Retrieval option | Best for |
|---|---|
| Azure AI Search | The default - vector + keyword + semantic ranking, integrated vectorization, security trimming. Most RAG starts here. |
| "On Your Data" | Fastest path - point a model at a data source and get grounded answers with minimal code. |
| Cosmos DB / Azure SQL / PostgreSQL vectors | When vectors must live beside operational data with transactional consistency. |
| Microsoft Fabric data agents | Grounding over the lakehouse / OneLake for analytics-centric estates. |
Copilot & Agent 365
Microsoft's distribution layer - buy the assistant inside the tools people already use, and govern every agent in the tenant.
| Product | What it is |
|---|---|
| Microsoft 365 Copilot | The assistant embedded in Word, Excel, Outlook, Teams - grounded in your Graph (mail, files, chats) with user permissions. |
| Copilot Studio | Low-code builder for custom agents/topics; uses OpenAI and Anthropic models; publishes to Teams, web, and M365 Copilot. |
| GitHub Copilot | Coding agent across the SDLC - completion, chat, agent mode, code review. |
| Security Copilot | SOC assistant for triage, hunting, and incident summarization across Defender/Sentinel. |
| Azure Copilot | Operations assistant for managing and troubleshooting Azure resources. |
| Agent 365 | Tenant-wide governance: the Agent registry discovers and manages every agent (Copilot Studio, Agent Builder, SharePoint, M365 Agent SDK, Foundry, third-party), with identity, DLP, and admin guardrails. |
Applied AI Services
Task-specific managed APIs in Azure AI Services - call them, no model selection required.
| Service | Task |
|---|---|
| Azure AI Vision | Image analysis, OCR, spatial analysis, image captioning/tags. |
| Document Intelligence | Extract text, tables, key-value pairs, and structure from documents (the former Form Recognizer). |
| Azure AI Language | Entity recognition, sentiment, PII detection, summarization, custom classification, question answering. |
| Azure AI Speech | Speech-to-text, text-to-speech (incl. custom/neural voices), translation, diarization. |
| Translator | Neural machine translation across many languages, document translation. |
| Content Understanding | Multimodal extraction across documents, images, audio, and video into structured output. |
Data & Vectors
Where embeddings and ground-truth live. Pick by where your data already is.
| Store | Best for |
|---|---|
| Azure AI Search (vector) | The default RAG index - hybrid (vector + keyword) search with a semantic ranker and security trimming. |
| Azure Cosmos DB (vector) | Vectors beside globally-distributed operational/app data, low latency, NoSQL. |
| Azure SQL / SQL DB (vector) | Vectors next to relational data with transactional consistency. |
| Azure Database for PostgreSQL (pgvector + DiskANN) | Open-source vector path beside Postgres data; DiskANN for scale. |
| Microsoft Fabric / OneLake | Lakehouse-scale data and Fabric data agents for analytics-centric grounding. |
Maia & Silicon
The compute under the stack - Microsoft's custom accelerators alongside NVIDIA GPUs.
| Silicon | Role |
|---|---|
| Maia AI accelerator | Microsoft's in-house AI chip for training/inference economics on first-party and hosted workloads; the emerging cost lever versus pure-GPU. |
| Cobalt (Arm CPU) | Microsoft's Arm-based general-purpose CPU - efficient serving and supporting workloads around AI. |
| ND-series GPU VMs (NVIDIA, incl. GB200) | Top-end GPU training/inference with full CUDA/framework compatibility. |
| Azure AI infrastructure / Maia clusters | Network-dense accelerator fabrics for large-scale training; reserved capacity options. |
Architecture Patterns
The shapes most Azure GenAI workloads fall into.
Foundry model + Azure AI Search (hybrid) + Content Safety, fronted by an app or Copilot Studio. The default knowledge assistant.
Foundry Agent Service + model router + MCP/OpenAPI tools + connected agents (A2A), governed by Agent 365 and Observability. Human-in-the-loop on high-impact actions.
Microsoft 365 Copilot or a Copilot Studio agent over Graph data - buy the assistant in the tools people already use.
Content Understanding or a Foundry multimodal model extracts from docs/images/audio/video into structured output feeding Search or a warehouse.
Fine-tune or host an open model (Phi, Llama) via Foundry; distill to cut run-cost once quality is proven.
Model router + GPT-5.x Mini for routine traffic, PTU for the steady tier, flagship models only on hard prompts.
Decision Matrix
Fast answers for design reviews.
| Question | Default answer |
|---|---|
| Which model? | Router by default; GPT-5.x Mini for routine, GPT-5.5/o-series for hard reasoning, Phi for small/edge, Claude/open when they win your eval. |
| Buy or build the assistant? | M365 Copilot / Copilot Studio first; build on Foundry/Agent Service for bespoke logic or UX. |
| Agent runtime? | Foundry Agent Service for anything production; connected agents + A2A when the problem decomposes into specialists. |
| RAG how? | Azure AI Search (hybrid + semantic ranker) by default; "On Your Data" for speed; DB-native vectors for locality. |
| Standard or PTU? | Standard for variable/low volume; PTU once traffic is steady and you need latency/cost predictability. |
| Where do vectors live? | AI Search default; Cosmos/SQL/PostgreSQL when beside operational data; Fabric for lakehouse scale. |
Pricing & Cost Control
Shape, not exact numbers - rates change and vary by model/region. Confirm on the Azure pricing pages.
| Lever | How it bills | Control |
|---|---|---|
| Standard (pay-go) | Per input/output token, per model. | Use the router; GPT-5.x Mini for routine; cap output tokens; cache where possible. |
| Provisioned Throughput (PTU) | Reserved throughput units (hourly + reservations). | For steady high volume needing predictable latency; commit after you know the load. |
| Priority Processing | Premium for low-latency lanes. | Only for strict real-time chat/agents. |
| AI Search | Service tier (search units) + storage. | Right-size the tier; prune stale docs; tune replicas/partitions to load. |
| Agents | Model tokens x steps + tool/runtime. | Cap loop length; route routine steps to Mini; budget per conversation. |
Risks & Gotchas
Read this one. What actually bites teams in production.