As on 26 June 2026
← expertoracle.com

AWS AI, the practical way

An architecture-first reference for the Amazon AI stack as of June 2026. From Amazon Bedrock and the Nova model family, to Bedrock AgentCore for production agents, to SageMaker for custom models, to the applied-AI services. Trade-offs, pricing shape, and risks. No marketing.

Refreshed June 2026Architecture-firstEnterprise focusVendor-neutral
TL;DR

AWS's AI story in 2026 has three layers. Amazon Bedrock is the managed gateway to dozens of foundation models (Anthropic Claude, Meta Llama, Mistral, DeepSeek, NVIDIA Nemotron, and Amazon's own Nova) behind one API, with Knowledge Bases, Guardrails, and customization. Bedrock AgentCore turned the 2025 agent preview into a managed runtime for production agents - memory, gateway/tools, identity, observability, web search, and (preview) payments. SageMaker - now repositioned as the unified center for data + analytics + AI - is where you train, fine-tune, and host custom models. If you already run on AWS, the data-gravity and IAM integration make this stack the path of least resistance.

The AWS AI mental model

Think of three layers. Most teams start at the top (consume a model via Bedrock) and only drop down when they need custom training.

LAYER 3 - ASSISTANTS & AGENTS (consume) Amazon Q Developer & Q Business - Bedrock Agents / AgentCore - Amazon Quick - Q in QuickSight / Connect Pre-built or low-code. Governed by IAM. You configure tools and knowledge, not model weights. LAYER 2 - PLATFORM AI (build) Amazon Bedrock: model catalog - Knowledge Bases - Guardrails - Flows - Evaluations - AgentCore SageMaker AI - Unified Studio - HyperPod - JumpStart - applied AI (Rekognition, Textract, Transcribe, Polly…) Consume hosted models, import/customize, build agents, govern responses, fine-tune, deploy, monitor. LAYER 1 - DATA & INFRASTRUCTURE (ground) S3 (incl. S3 Vectors) - OpenSearch / Aurora pgvector / Kendra - Bedrock Data Automation - Lake Formation Trainium2/3 - Inferentia2 - EC2 P5/P6 (Blackwell) / G7 - UltraClusters - SageMaker HyperPod Your data and vectors live here, next to the rest of your AWS estate and IAM.
Figure 1 - The AWS AI stack is layered. Start at Layer 3/2 (Bedrock); drop to Layer 1 only when you need custom training or chips.

What sets AWS apart in 2026

DifferentiatorWhat it means in practice
Widest managed model catalogBedrock fronts Anthropic Claude, Meta Llama, Mistral, Cohere, AI21, DeepSeek, NVIDIA Nemotron, Stability, and Amazon Nova behind one API and one bill. Switching models is a parameter change, not a re-architecture.
Anthropic relationship + TrainiumDeep Anthropic partnership (Project Rainier Trainium clusters) means frontier Claude models are first-class on Bedrock, often with strong price/perf on AWS silicon.
AgentCore as managed runtimeMemory, Gateway (tools/MCP), Identity, Observability, Browser, Code Interpreter, Web Search, and Payments (preview) - framework-agnostic (Strands, LangChain, OpenAI Agents SDK, Claude Agent SDK).
Data gravity + IAMIf your data is already in S3/Redshift/Aurora, RAG ground truth and access control are native. No new identity plane.
Custom silicon economicsTrainium/Inferentia give a cost lever for training and high-volume inference that pure-GPU clouds cannot match on price.

Where AWS is weaker (be honest)

Own frontier model
Amazon Nova is competitive on price/latency and improving fast, but it is not the model you reach for when you need the absolute top of the reasoning leaderboard - that is usually Claude (also on Bedrock) or a competitor's flagship. Amazon's bet is breadth, integration, and silicon economics, not owning the #1 model.
Surface area & sprawl
The catalog of overlapping services (Bedrock vs SageMaker vs Q vs applied-AI, three vector stores, two studios) is large. Picking the right primitive is itself an architecture decision - see the Decision Matrix tab.

How to read this portal

Each service tab follows the same shape: what it is, architecture, when to use, and risks. If you only read one tab, read Risks & Gotchas. The other tabs tell you what something does; Risks tells you what bites you in production.

What's New - late 2025 through June 2026

Material changes that affect architecture, cost, or risk. Curated, not a press-release dump.

TL;DR

The dominant 2026 theme is agents going to production: Bedrock AgentCore added managed Knowledge Bases, a managed agent harness, native Web Search, and (preview) autonomous Payments. Model breadth widened (NVIDIA Nemotron 3 on Bedrock, Nova Forge for Nova customization, Reinforcement Fine-Tuning). And SageMaker was repositioned as the unified data+AI center, with SageMaker Unified Studio now GA and Amazon Q Developer embedded throughout.

DateReleaseWhy it matters
Dec 2025Next-gen SageMaker + Unified Studio (re:Invent)SageMaker repositioned as the single center for data, analytics, and AI - Glue, EMR, Athena, Redshift, Bedrock, and SageMaker AI in one workspace with a lakehouse.
Dec 2025Trainium3 announcedNext-gen training/inference silicon; continues AWS's price/perf lever vs pure-GPU stacks. Confirm region/instance availability before designing around it.
Jan 2026SageMaker Unified Studio GA + Amazon Q Developer GA in StudioData professionals get GenAI assistance across the lifecycle; Bedrock and SageMaker AI usable from one IDE.
Feb 2026Reinforcement Fine-Tuning in BedrockTailor models to narrow tasks with reward signals - higher accuracy on domain workflows without full training.
Mar 2026NVIDIA Nemotron 3 Super on Bedrock; Nova Forge SDKOpen-weight frontier reasoning model available managed; Nova Forge lets enterprises customize Nova on their data and deploy inside Bedrock.
Apr 2026AgentCore Payments (preview)Agents can autonomously pay for APIs, MCP servers, web content, and other agents - built with Coinbase and Stripe. New control-plane and audit considerations.
May 2026Agent Toolkit for AWS; AgentCore managed harnessDeclare and run an agent in ~3 API calls, no orchestration code. Lowers time-to-first-agent dramatically.
Jun 2026AWS Summit NY: Managed Knowledge Bases (Smart Parsing, Agentic Retriever), Web Search on AgentCore (GA), Amazon Quick, S3 Annotations, EC2 G7 (RTX PRO Blackwell)Fully-managed RAG with multi-format parsing; grounded answers with zero data egress; mutable per-object context in S3; new inference GPU tier.
Practical read
If you piloted Bedrock Agents in 2025, plan a migration review to AgentCore: the managed Memory, Gateway, Identity, and Observability replace a lot of custom glue. If you run SageMaker Studio (classic), plan the move to Unified Studio.

Service Map

The AWS AI services worth knowing, grouped by what you do with them.

COREAmazon Bedrock

Managed multi-model API: catalog, Knowledge Bases, Guardrails, Flows, Evaluations, customization, AgentCore.

MODELSAmazon Nova

Amazon's own FM family: Micro, Lite, Pro, Premier, plus Canvas (image), Reel (video), Sonic (speech). Forge to customize.

AGENTSBedrock AgentCore

Runtime, Memory, Gateway, Identity, Observability, Browser, Code Interpreter, Web Search, Payments (preview).

BUILDSageMaker AI + Unified Studio

Train, fine-tune, host custom models; HyperPod for FM training; one studio over data+analytics+AI.

ASSISTAmazon Q

Q Developer (coding/ops agent), Q Business (enterprise RAG assistant), Q in QuickSight/Connect, Amazon Quick.

APPLIEDApplied AI

Rekognition, Textract, Comprehend, Transcribe, Polly, Translate, Lex, Kendra, Personalize.

DATAVectors & Data

S3 Vectors, OpenSearch vector, Aurora/RDS pgvector, MemoryDB, Kendra GenAI Index, Bedrock Data Automation.

SILICONChips & GPUs

Trainium2/3, Inferentia2, EC2 P5/P6 (Blackwell), G7, UltraClusters, Capacity Blocks.

GOVERNGuardrails

Content filters, denied topics, PII redaction, contextual grounding, Automated Reasoning checks.

Amazon Bedrock

The managed, serverless gateway to foundation models. One API, one IAM model, one bill, many vendors.

Official documentation ↗

Overview
Capabilities
When to use
Risks

Bedrock exposes many foundation models through a unified API. You never manage servers; you call InvokeModel / Converse and pay per token (on-demand) or reserve capacity (Provisioned Throughput). It is the default starting point for almost any GenAI workload on AWS.

ServerlessConverse APIStreamingCross-region inferenceBatchPrompt caching
CapabilityWhat it does
Model catalog & MarketplaceFirst-party and partner FMs, plus 100+ models via Bedrock Marketplace; import your own custom weights.
Knowledge BasesManaged RAG: ingest from S3 and connectors, chunk/embed, retrieve. 2026 adds Smart Parsing and an Agentic Retriever.
GuardrailsIndependent safety layer: content filters, denied topics, PII redaction, contextual grounding, Automated Reasoning checks.
FlowsVisual orchestration of prompts, models, KBs, and Lambda into a deployable workflow.
EvaluationsAutomatic and LLM-as-judge evaluation of model and RAG quality before you ship.
CustomizationFine-tuning, continued pre-training, distillation, and Reinforcement Fine-Tuning.
Prompt caching & cross-regionCut cost/latency on repeated context; route to capacity in other regions automatically.
  • You want model optionality without re-architecting - swap Claude / Llama / Nova with a parameter.
  • You need managed RAG, guardrails, and evaluation without standing up infrastructure.
  • Your data and identity already live in AWS.
Rule of thumb
Start in Bedrock. Drop to SageMaker only when you need custom training, exotic hosting, or a model not in the catalog.
Region/model availability
Not every model is in every region. Confirm the exact model+region before you design around it; cross-region inference helps but has data-residency implications.
Cost surprises
On-demand token pricing varies widely by model. A flagship model in a chatty agent loop can be 10-30x the cost of a small model. Set budgets, cache prompts, and right-size the model per task.

Foundation Model Catalog

Indicative view of model families on Bedrock in 2026. Exact versions and regions change frequently - confirm in the console.

Official documentation ↗

ProviderFamiliesTypical use
AnthropicClaude (Opus / Sonnet / Haiku tiers)Top-tier reasoning, agents, coding, long context. The frontier default on Bedrock.
AmazonNova Micro / Lite / Pro / Premier; Canvas, Reel, SonicCost/latency-optimized text and multimodal; image, video, and speech generation.
MetaLlama (open weights)Open-weight workloads, customization, on-prem parity.
MistralMistral / MixtralEfficient European open-weight options.
DeepSeekDeepSeek-R1 and successorsStrong open reasoning at low cost.
NVIDIANemotron 3 (Super)Open-weight frontier reasoning/agentic, hosted managed.
Cohere / AI21 / StabilityCommand / Embed / Rerank, Jamba, Stable Diffusion / ImageEmbeddings, reranking, long-context, image generation.
Embeddings + rerank
For RAG, pair an embedding model (Amazon Titan Text Embeddings, Cohere Embed) with a reranker (Cohere Rerank) for a quality lift at low engineering cost.

Amazon Nova

Amazon's own foundation-model family - optimized for price, latency, and AWS integration.

Official documentation ↗

ModelModalityBest for
Nova MicroTextCheapest, fastest text - classification, routing, simple extraction at scale.
Nova LiteMultimodal (text+image/video in)Low-cost multimodal understanding, high-volume workloads.
Nova ProMultimodalBalanced capability/cost for most enterprise tasks and agents.
Nova PremierMultimodal, most capableComplex reasoning; also the teacher model for distillation.
Nova CanvasImage generationStudio-quality images with content credentials/watermarking.
Nova ReelVideo generationShort-form video from text/image prompts.
Nova SonicSpeech-to-speechReal-time voice interactions with low latency.
Nova Forge (2026)
Forge SDK lets you customize Nova on domain data (fine-tune/distill) and deploy directly within Bedrock - useful when you want Nova's economics with your own task accuracy.
Positioning
Use Nova where cost and latency dominate and the task is well-scoped. For the hardest reasoning, A/B it against Claude on the same Bedrock API before committing.

Amazon Bedrock AgentCore

The managed runtime for production agents. Framework-agnostic - bring Strands, LangChain, OpenAI Agents SDK, or the Claude Agent SDK.

Official documentation ↗

Your agent (any framework) - Runtime / managed harness Memoryshort + long term Gatewaytools / MCP / APIs Identityscoped access Observabilitytraces / eval Browserheadless web Code Interpretersandboxed exec Web Searchgrounded, zero-egress Paymentspreview
Figure 2 - AgentCore modules. Mix and match; you don't have to adopt all of them.
ModuleWhat it gives youStatus
Runtime / HarnessManaged serverless execution; declare and run an agent in ~3 API calls, no orchestration code.GA
MemoryShort-term and long-term memory stores so agents retain context across turns and sessions.GA
GatewayTurn APIs, Lambda, and MCP servers into governed agent tools with auth and access control.GA
IdentityScoped, least-privilege access for agents; policies verified by Automated Reasoning (same tech as IAM/S3).GA
ObservabilityTraces of every step, tool call, and where the agent went off track; evaluation against real traffic.GA
Browser & Code InterpreterHeadless browsing and sandboxed code execution as managed tools.GA
Web SearchGrounded, cited answers from the live web with zero data egress from your AWS environment.GA
PaymentsAgents autonomously pay for APIs, MCP servers, content, and other agents (Coinbase/Stripe).Preview
Agentic payments = new risk class
An agent that can spend money needs hard budget caps, human-in-the-loop thresholds, and immutable audit. Treat AgentCore Payments as a controlled pilot, not a default.

Knowledge Bases & RAG

Managed retrieval-augmented generation - the most common enterprise GenAI pattern.

Official documentation ↗

Bedrock Knowledge Bases ingest from S3 and connectors, chunk and embed content, store vectors (OpenSearch Serverless, Aurora pgvector, S3 Vectors, and more), and retrieve relevant passages at query time. The 2026 fully managed version adds Smart Parsing (automatic multi-format prep: PDFs, tables, images) and an Agentic Retriever for multi-step queries.

Use Knowledge Bases when

You want managed RAG with minimal code, your corpus is mostly documents, and you value Smart Parsing and built-in retrieval quality.

Build your own when

You need fine control over chunking, hybrid search, custom rerankers, or a vector store you already operate (e.g. OpenSearch with bespoke pipelines).

Bedrock Data Automation
For multimodal corpora (documents, images, audio, video), BDA extracts structured output you can feed into a Knowledge Base - a cleaner pipeline than rolling your own parsers.

Guardrails & Governance

An independent safety layer you apply to any model - first-party or imported.

Official documentation ↗

ControlWhat it catches
Content filtersHate, insults, sexual, violence, misconduct, prompt attacks - tunable thresholds.
Denied topicsBlock subjects out of scope for your application.
Sensitive info / PIIDetect and redact or block PII and custom regex patterns.
Contextual groundingScore answers for grounding against source and relevance to the query - reduce hallucination.
Automated Reasoning checksMathematically verify outputs against encoded policies/rules - high-assurance domains.
Apply at the platform layer
Guardrails sit between the app and the model, so the same policy applies regardless of which model the agent picks. Validate prompts and responses here, not only in app code.

SageMaker AI

Where you train, fine-tune, and host models when Bedrock's managed path isn't enough.

Official documentation ↗

ComponentUse
JumpStartOne-click deploy/fine-tune of open and partner foundation models.
Training & InferenceManaged training jobs and real-time/serverless/async/batch endpoints with autoscaling.
HyperPodResilient, large-scale clusters for foundation-model pre-training and heavy fine-tuning (self-healing across thousands of accelerators).
Pipelines / Model RegistryMLOps: reproducible pipelines, lineage, approval gates, deployment.
Clarify / Model MonitorBias/explainability and drift detection in production.
Bedrock vs SageMaker
Bedrock = consume/customize managed models, fast. SageMaker = full control of training, hosting, and MLOps. Many teams use both: Bedrock for the app, SageMaker for the custom model behind it.

SageMaker Unified Studio

The single workspace over data, analytics, and AI - GA in 2026.

Official documentation ↗

Unified Studio brings EMR, Glue, Athena, Redshift, Bedrock, and SageMaker AI into one IDE on a lakehouse foundation, with Amazon Q Developer embedded for code, troubleshooting, and ETL. It replaces the older SageMaker Studio Classic experience and stitches the data and AI lifecycles together so the same governed data powers both analytics and model building.

LakehouseGlue / EMR / AthenaRedshiftBedrockQ DeveloperGovernance / catalog
Migration
If you run Studio Classic, plan the move to Unified Studio - newer Bedrock and governance features land here first.

Model Customization

Four ways to make a model better at your task, from cheapest to most involved.

Official documentation ↗

TechniqueWhenCost/effort
Prompt + RAGMost tasks - ground the model in your data without changing weights.Low
Fine-tuningConsistent style/format or narrow task accuracy from labeled examples.Medium
Reinforcement Fine-TuningOptimize toward a reward signal where correctness is checkable (2026).Medium-High
DistillationTeach a small, cheap model from a large one - keep quality, cut cost/latency.Medium
Continued pre-trainingInject large domain corpora; rarely needed for most enterprises.High
Order of operations
Exhaust prompt engineering and RAG first. Fine-tune only when you have evidence the base model can't hit your accuracy/format bar. Distill once a fine-tuned large model proves out, to cut run-cost.

Amazon Q

AWS's family of GenAI assistants for developers, businesses, and operations.

Official documentation ↗

ProductWhat it does
Q DeveloperAgentic coding and ops assistant - code generation, transformation/modernization, troubleshooting, and AWS console help. Embedded in IDEs and SageMaker Unified Studio.
Q BusinessEnterprise RAG assistant over your apps and documents (40+ connectors), with access controls inherited from the source systems.
Amazon Quick2026 evolution toward autonomous background agents with specialized expertise; an activity feed across email, messaging, calendar, and tasks.
Q in QuickSight / ConnectNatural-language BI and contact-center assistance embedded in those services.
Build vs buy
For internal knowledge assistants, pilot Q Business before building custom RAG - the connectors and permission inheritance save real engineering. Build custom on Bedrock when you need bespoke UX or logic Q can't express.

Applied AI Services

Task-specific managed APIs - no model selection, just call them.

Official documentation ↗

ServiceTask
RekognitionImage/video analysis: labels, faces, moderation, text-in-image.
TextractDocument extraction: text, forms, tables, queries from PDFs/images.
ComprehendNLP: entities, sentiment, key phrases, PII, custom classification.
TranscribeSpeech-to-text with diarization, custom vocabulary, call analytics.
PollyText-to-speech with neural and generative voices.
TranslateNeural machine translation across many languages.
LexConversational bots (the engine behind many IVR/chat flows).
KendraEnterprise search; the GenAI Index feeds RAG with permission-aware retrieval.
PersonalizeReal-time recommendations from your interaction data.
Trend
Several classic tasks (doc extraction, classification, summarization) are increasingly done with Bedrock + a multimodal model or Bedrock Data Automation. Use the applied service when it is cheaper, lower-latency, or compliance-certified for that exact task; reach for Bedrock when you need flexibility.

Vectors & Data

Where your embeddings and ground-truth live. AWS gives you several stores - pick by scale, latency, and what you already run.

Official documentation ↗

StoreBest for
S3 VectorsCost-optimized vector storage/query at massive scale directly in S3 (2026) - cheapest for large, less latency-sensitive corpora.
OpenSearch Serverless (vector)Low-latency hybrid (keyword + vector) search; the common Knowledge Base default.
Aurora / RDS PostgreSQL (pgvector)Vectors next to relational data with transactional consistency.
MemoryDB / DocumentDB / Neptune AnalyticsIn-memory vectors, document-store vectors, and graph+vector analytics respectively.
Kendra GenAI IndexManaged, permission-aware retrieval index purpose-built for RAG.
Default
Most teams start with a Bedrock Knowledge Base on OpenSearch Serverless. Move to S3 Vectors for cost at scale, or pgvector when vectors must sit beside operational rows.

Chips & GPUs

The silicon under the stack. AWS's custom chips are the cost lever; NVIDIA GPUs are the compatibility/flexibility lever.

Official documentation ↗

SiliconRole
Trainium2 / Trainium3AWS training (and increasingly inference) accelerators; Trn2 UltraServers and Project Rainier power large Anthropic/enterprise training at strong price/perf.
Inferentia2Cost-efficient high-volume inference.
EC2 P5 / P6 (NVIDIA Blackwell)Top-end GPU training/inference; maximum framework compatibility.
EC2 G7 (RTX PRO Blackwell)2026 graphics/inference tier for cost-effective serving and visual workloads.
UltraClusters / Capacity Blocks / HyperPodNetwork-dense GPU/Trainium fabrics; reserve capacity windows; resilient FM-training clusters.
Architect's lever
For high-volume inference, benchmark Inferentia2/Trainium against GPU instances - the price difference can dominate TCO. Keep GPUs where you need a specific CUDA/framework path.

Architecture Patterns

The handful of shapes most AWS GenAI workloads fall into.

1. Managed RAG assistant

Bedrock + Knowledge Base (OpenSearch/S3 Vectors) + Guardrails, fronted by API Gateway/Lambda or Q Business. The default enterprise knowledge assistant.

2. Production agent

AgentCore Runtime + Memory + Gateway (tools/MCP) + Identity + Observability. Add Web Search for grounding. Human-in-the-loop on high-impact actions.

3. Custom model service

SageMaker fine-tune/host (or import to Bedrock) behind a private endpoint; distill to cut cost once quality is proven.

4. Multimodal pipeline

Bedrock Data Automation extracts from docs/images/audio/video into structured output feeding a Knowledge Base or warehouse.

5. Batch enrichment

Bedrock batch inference over large datasets in S3 for classification, summarization, or embedding generation at lowest cost.

6. Embedded BI/ops assistant

Q in QuickSight/Connect, or Q Developer in the SDLC - buy the assistant rather than build it.

Decision Matrix

Fast answers to the questions that come up in every design review.

QuestionDefault answer
Consume a model or train one?Consume via Bedrock. Train/fine-tune in SageMaker only with evidence the base model can't meet the bar.
Which model?Claude for hardest reasoning/agents; Nova for cost/latency; Llama/Mistral/DeepSeek for open-weight/customization. A/B on the same Bedrock API.
Build RAG or use Knowledge Bases?Knowledge Bases unless you need bespoke chunking/hybrid/rerank control.
Bedrock Agents or AgentCore?AgentCore for anything heading to production - managed memory, tools, identity, observability.
Which vector store?OpenSearch Serverless default; S3 Vectors for cost at scale; pgvector for vectors beside relational data.
Buy an assistant or build?Q Business/Q Developer first; build on Bedrock when you need custom UX/logic.
GPU or AWS silicon?Trainium/Inferentia for cost at volume; NVIDIA for specific framework/CUDA needs.

Pricing & Cost Control

Shape, not exact numbers - rates change and vary by model/region. Always confirm in the AWS pricing pages.

LeverHow it billsControl
Bedrock on-demandPer input/output token, per model.Right-size model per task; cache prompts; cap output tokens.
Provisioned ThroughputReserved model units (hourly).For steady high volume; commit only after you know the load.
Batch inferenceDiscounted vs on-demand.Use for non-interactive enrichment jobs.
Knowledge Bases / OpenSearchStorage + query + embedding tokens.Tune chunk size; prune stale docs; pick S3 Vectors for cost.
SageMakerTraining + endpoint instance-hours.Serverless/async endpoints; autoscale to zero where possible.
AgentsModel tokens x steps + tool calls.Cap loop length; cheap model for routing, strong model only when needed.
The agent cost trap
Agent loops multiply token cost by the number of steps. A 10-step loop on a flagship model is the most common surprise bill. Budget per-conversation, log token usage, and route to small models for routine steps.

Risks & Gotchas

Read this one. What actually bites teams in production.

Model/region drift
Models and versions change and aren't uniform across regions. Pin model IDs, monitor deprecations, and test before auto-upgrading.
Runaway agent cost & actions
Unbounded loops and tool access cause both cost blowouts and unintended actions. Enforce step caps, budgets, least-privilege Identity, and human approval on high-impact tools. For AgentCore Payments, treat spend as a first-class control.
Data egress & residency
Cross-region inference and external tools (web search, third-party MCP) can move data. Confirm residency; prefer zero-egress options where compliance requires.
Service sprawl & lock-in
Mixing Bedrock, SageMaker, Q, and three vector stores creates operational complexity and AWS-specific coupling. Standardize on a few primitives; keep prompts/eval portable.
Hallucination in RAG
Retrieval quality, not the model, is usually the failure. Use contextual grounding guardrails, rerankers, and evaluation before blaming the LLM.
Quotas
Default account quotas (tokens/min, requests/min, concurrent training) throttle real workloads. Request increases early; design for backoff.

AWS vs OCI vs Azure vs GCP

A practitioner's quick read. Every cloud can do the basics; the differences are in defaults, data gravity, and silicon.

DimensionAWSOCIAzureGCP
Model breadth (managed)Widest (Bedrock)Broad (OCI Gen AI)OpenAI + catalogGemini + Model Garden
Frontier own modelNova (mid-tier); Claude hostedNone (partners)OpenAI partnershipGemini
AgentsAgentCoreEnterprise AI AgentsFoundry AgentsVertex Agent Builder
Custom siliconTrainium/InferentiaGPU (NVIDIA)Maia (emerging)TPU
Vectors in source DBpgvector/OpenSearch/S3In-DB 26aipgvector/AI SearchAlloyDB/Vertex
Best whenYou already run on AWS; want model choice + silicon economicsYou run Oracle DB/EBS; want in-DB vectors + sovereigntyYou're Microsoft-centric; want OpenAI + M365You want Gemini + BigQuery data gravity
Honest take
The cloud you already run is usually the right one for GenAI - data gravity and IAM beat a marginally better model. Pick by where your data and identity live, then choose the model per task.

Sources

Primary AWS material used for this portal (June 2026). Verify specifics against current docs before committing - this space moves weekly.

Accuracy note
Compiled by Brijesh Gogia for expertoracle.com. Independent and not affiliated with Amazon/AWS. Model names, availability, and pricing change frequently - treat this as orientation, confirm in the AWS console/docs before designing.