AI Services Across Cloud Providers

A neutral deep-dive comparison of AI, ML, GenAI, agents, vector search, RAG, document/vision/speech AI, MLOps, governance, and AI infrastructure across OCI, AWS, Azure, Google Cloud, and other AI platforms - a practical reference for enterprise architects and engineers choosing the right AI platform for real workloads.

4+ providers compared Searchable AI matrix Match, maturity & readiness Governance & cost No "best AI cloud" claims

Last reviewed: July 2026 AI services change very fast - verify with current vendor documentation before production use.

THE CORE PRINCIPLE: NEUTRAL & PRACTICAL

This portal does not favor any provider and makes no "best AI cloud" claim. Some AI services are close equivalents, some only partially similar, and some have no direct match - we say which, and we mark anything fast-moving as needing verification. AI is the fastest-changing area in cloud: model availability, regions, pricing, token/context limits, fine-tuning, agent features, and data-handling terms shift constantly. Treat concrete claims here as a starting point to verify, not a guarantee.

How to read this portal

The heart is the AI Service Comparison Matrix (section 1): a searchable, filterable table mapping each AI capability across providers with a match rating, maturity, enterprise-readiness, key difference, and risk. Deep-dive sections (2-11) go further by category; the decision sections (12-14) help you choose by workload and cost; and the reference sections (15-17) cover risk, troubleshooting, and learning paths.

Match types

Exact functionally equivalent Close same purpose, minor differences Partial overlaps part of the capability Conceptual similar goal, different model No direct no real counterpart Specific provider-specific capability Verify fast-moving, confirm current state

Maturity levels

Mature · Strong · Evolving · Limited · Preview / region-dependent · Needs verification

Enterprise readiness

Production-ready · Production-ready with constraints · Good for experimentation · Requires strong governance · Not recommended without review

Providers compared

OCI

OCI Generative AI + Agents, AI Vector Search in Oracle Database 23ai, Data Science, OCI AI Services.

AWS

Bedrock (multi-model), SageMaker, Kendra/OpenSearch, Textract/Rekognition/Comprehend.

Azure

Azure OpenAI / AI Foundry, Azure AI Search, Azure ML, AI Document Intelligence/Vision/Speech.

Google Cloud

Vertex AI / Gemini, Vertex AI Search, Vector Search, Document AI, BigQuery ML.

Other platforms, where relevant

Where they add meaningful comparison, this portal also references IBM watsonx, Databricks Mosaic AI, Snowflake Cortex, Hugging Face, and specialist vector databases (Pinecone, Weaviate, Milvus, Qdrant). Providers are not forced into every row - if there is no meaningful equivalent, it is marked "No direct equivalent" or "Not a primary service in this category."

Reading the callouts

Architect note

Design-time trade-offs.

DBA note

Database and vector-search behavior.

Security note

Exposure, access, and private networking.

Cost note

AI cost drivers and control.

AI governance note

Auditability, guardrails, responsible AI.

Common mistake

Errors teams make, incl. fake equivalency.

1 - Matrix 2 - GenAI 4 - Agents 5 - RAG 6 - Vector 11 - Governance 13 - Workloads 15 - Risk

Accuracy, neutrality & independence

Independent educational resource, not affiliated with or endorsed by Oracle, Amazon, Microsoft, Google, IBM, or any AI vendor. Mappings are engineering judgments as of July 2026 and deliberately conservative. Because AI moves so quickly, verify model/region availability, pricing, limits, fine-tuning, agent features, and data-handling terms against each vendor's official documentation before any design or purchasing decision.

1. AI Service Comparison Matrix

A searchable, filterable map of AI capabilities across OCI, AWS, Azure, Google Cloud, and other platforms - with match rating, maturity, enterprise readiness, best-fit use, key difference, and risk. Click any row to expand.

Last reviewed: July 2026 AI mappings change fast - verify with current vendor docs.

ExactClosePartialConceptualNo directSpecificVerify

AI Capability	OCI	AWS	Azure	Google Cloud	IBM / Other	Match

Match ratings are conservative and providers are not forced into every row. A "No direct" or "Conceptual" rating means the architecture changes when you move - read the key difference before assuming portability.

2. Generative AI Services Deep Dive

The main managed GenAI platforms compared neutrally - model access, RAG, agents, guardrails, private networking, enterprise controls, and best-fit use.

Last reviewed: July 2026 GenAI features/models change weekly - verify with current vendor docs.

TL;DR

Every major cloud has a managed GenAI platform: OCI Generative AI + Agents, Amazon Bedrock, Azure OpenAI / AI Foundry, Vertex AI / Gemini, plus IBM watsonx.ai, Databricks Mosaic AI, and Snowflake Cortex. They differ most in model catalog (Bedrock is multi-provider; Azure centers on OpenAI/GPT; Vertex on Gemini + open; OCI is curated) and in how you wire private data, agents, and guardrails. Choose by the models you need, region/quota, enterprise controls, and where your data already lives - not by benchmarks.

Platform comparison

Provider	GenAI platform	Model access	Agents	RAG	Guardrails	Private networking	Best-fit use	Key gotcha
OCI	Generative AI + GenAI Agents	Curated (Cohere, Llama, etc.) via API; dedicated clusters	GenAI Agents	Agent knowledge bases + DB Vector Search	Guardrails	Private endpoints	Oracle-data-centric RAG; enterprises on OCI	Verify model catalog + region availability
AWS	Amazon Bedrock (+ SageMaker JumpStart)	Many providers (Anthropic, Meta, Cohere, Amazon, etc.)	Bedrock Agents	Bedrock Knowledge Bases	Bedrock Guardrails	PrivateLink / VPC	Model choice + AWS-native governance	Model availability varies by region
Azure	Azure OpenAI / AI Foundry	OpenAI GPT family + catalog	AI Agent Service (Foundry)	Azure AI Search + OpenAI	AI Content Safety	Private Endpoints	OpenAI/GPT + Microsoft ecosystem	GPT model/region/quota gating
Google	Vertex AI / Gemini	Gemini + Model Garden (open + partner)	Vertex AI Agent Builder	Vertex AI Search / RAG Engine	Safety filters / Model Armor	Private Service Connect	Gemini + data/BigQuery integration	Feature/region availability
IBM/Other	watsonx.ai; Databricks Mosaic; Snowflake Cortex	Granite + open (IBM); open (Databricks); Cortex (Snowflake)	watsonx Orchestrate	watsonx Discovery; Cortex Search; Databricks	watsonx.governance	Varies	Governance focus (IBM); data-platform-native AI	Verify enterprise coverage per platform

How enterprise data is connected (common to all)

Across every platform, connecting private data safely follows the same shape: ingest → chunk → embed → store vectors → retrieve (entitlement-filtered) → ground the model → audit, all behind a governed serving layer. What differs is the managed convenience (Knowledge Bases, Vertex AI Search, AI Search integration) and where vectors live. See sections 5-6.

AI governance note - the enterprise controls that matter

Evaluate each platform on: IAM/identity integration (does it use the cloud's native identity?), private networking (private endpoints), prompt + output logging and retention, guardrails/content safety on input and output, data-use terms (is your data used to train the base model? - generally no for these enterprise services, but verify), and region availability. These, not raw model quality, usually decide enterprise fit.

Cost note

GenAI cost is driven by input + output tokens, embeddings, optional dedicated capacity/provisioned throughput, and fine-tuning. On-demand is simple; provisioned/dedicated gives predictable throughput at a fixed cost. Retrieve the smallest sufficient context, cache common responses, and use smaller models where they suffice (section 14).

Verify before choosing

The differentiators (available models, regions, quotas, agent/guardrail features, data-handling terms) move fastest here. Do not select a GenAI platform on an announcement or leaderboard - confirm the specific models and terms you need are available in your region today, and that the governance controls meet your requirements.

Vendor GenAI references: OCI GenAI, AWS Bedrock, Azure OpenAI/Foundry, Google Vertex AI, IBM watsonx →

3. Foundation Model Comparison

How foundation-model access differs across providers - first-party, third-party, and open models, fine-tuning, and private deployment. Model availability changes constantly, so this is a shape, not a live catalog.

Last reviewed: July 2026 Model availability/pricing/limits change frequently - VERIFY before relying on any specific model.

Read this first

Model availability, pricing, token limits, context windows, fine-tuning support, and data-handling terms change frequently and vary by region. Everything below is a general shape to help you reason - treat specific model names as "verify with current vendor documentation." Do not design around a model without confirming it is available (and supported for your use) in your region today.

TL;DR

Providers differ in how open their model access is: Bedrock offers the broadest multi-vendor catalog (Anthropic, Meta, Cohere, Amazon, etc.); Azure centers on OpenAI GPT models; Google on Gemini plus Model Garden (open + partner); OCI offers a curated set (Cohere, Llama, etc.); Hugging Face and Databricks/others give access to a large open ecosystem. Abstract the model behind your own serving layer so you can switch as the landscape shifts.

Model access, side by side (verify current state)

Provider	Model platform	Example families (verify)	First-party	Third-party	Open-source	Fine-tuning	Private deploy	Notes
OCI	OCI Generative AI	Cohere, Meta Llama (verify)	Curated	Yes (partner)	Some	Some	Dedicated AI clusters	Curated catalog; verify current models
AWS	Amazon Bedrock	Anthropic Claude, Meta Llama, Cohere, Amazon Titan/Nova, Mistral (verify)	Titan/Nova	Broad (many vendors)	Yes (incl. via Bedrock/SageMaker)	Yes (varies by model)	VPC/PrivateLink	Broadest multi-vendor catalog
Azure	Azure OpenAI + Foundry catalog	OpenAI GPT family; catalog models (verify)	Microsoft/Phi	OpenAI + catalog	Via catalog	Yes (select models)	Private Endpoints	OpenAI-centric; region/quota gated
Google	Vertex AI + Model Garden	Gemini; open + partner models (verify)	Gemini/Gemma	Partner + open	Model Garden / open	Yes (select models)	PSC / private	Gemini + broad Garden
IBM / HF	watsonx.ai / Hugging Face	IBM Granite; large open ecosystem (verify)	Granite (IBM)	Some	Extensive (HF)	Yes	Varies	Open-model breadth (HF); governance (IBM)

Architect note - avoid model lock-in

Because the model landscape shifts monthly, put an abstraction layer (your serving API) between your application and the model. Standardize your prompts, retrieval, and evaluation around a provider-neutral interface so you can swap models/providers with a config change. Pick the platform for its enterprise controls and data locality, and keep the specific model a replaceable component.

Common mistake

Designing an architecture around one specific model's context window, pricing, or a benchmark result - then discovering it is not available in your region, its price/limits changed, or a better/cheaper model shipped. Verify availability per region, and don't hard-couple to a single model.

Vendor model catalogs (verify current): Bedrock, Azure OpenAI, Vertex Model Garden, OCI GenAI, Hugging Face →

4. AI Agents Comparison

Managed agent platforms compared - tool/function calling, knowledge grounding, memory, guardrails, human approval, and the enterprise risks that apply to all of them.

Last reviewed: July 2026 Agent features are evolving fast - verify capabilities with current vendor docs.

TL;DR

An agent is an LLM that can call tools/functions, retrieve knowledge, keep memory, and take multi-step actions - beyond a chatbot. Every cloud has one (OCI GenAI Agents, Bedrock Agents, Azure AI Agent Service, Vertex AI Agent Builder) plus enterprise-SaaS agents (IBM watsonx Orchestrate, Salesforce Agentforce, ServiceNow). They are evolving fast and share the same risks: over-permissioning, direct database access, and unsafe dynamic SQL. The non-negotiable rule: agents act through governed APIs with least privilege, human approval for consequential actions, and full audit - never raw production databases.

Chatbot vs workflow bot vs autonomous agent

Type	What it does	Autonomy	Risk
Chatbot	Answers questions (optionally grounded via RAG)	None - responds only	Low (wrong answers)
Workflow bot	Follows defined steps, calls known tools	Bounded - fixed flow	Medium (calls real systems)
Autonomous agent	Plans, chooses tools, takes multi-step actions	High - decides its own path	High (unpredictable actions)

Architect note

Match autonomy to risk. Most enterprise value today is in chatbots (RAG) and bounded workflow bots, which are far easier to govern. Reserve autonomous agents for low-risk, well-audited tasks with human approval gates. More autonomy = more governance, monitoring, and blast-radius control required.

Agent platforms, side by side

Provider	Agent service	Tool calling	Knowledge / RAG	Workflow integration	Human approval	Governance	Best-fit use	Main risk
OCI	OCI Generative AI Agents	Yes	Knowledge bases + DB Vector Search	OCI services / APIs	Design-dependent	IAM + audit	Grounded assistants over Oracle data	Verify current tool/action coverage
AWS	Bedrock Agents	Yes (action groups)	Bedrock Knowledge Bases	Lambda / API	Design-dependent	IAM + Guardrails + CloudTrail	Tool-using assistants on AWS	Over-permissioned action groups
Azure	AI Agent Service (Foundry)	Yes	Azure AI Search	Logic Apps / Functions / APIs	Design-dependent	Entra + Content Safety + logging	Microsoft-ecosystem agents	Grounding + identity scoping
Google	Vertex AI Agent Builder / Agentspace	Yes	Vertex AI Search	Cloud Functions / APIs	Design-dependent	IAM + safety + audit	Search-grounded agents on GCP	Data-access scoping
Enterprise SaaS	watsonx Orchestrate; Salesforce Agentforce; ServiceNow	Yes	Product knowledge + connectors	Native to the SaaS platform	Often built-in	Platform governance	Agents inside a SaaS (CRM/ITSM/HR)	Scope to the SaaS; verify data flows

What to evaluate in an agent platform

Tool / function calling - how tools are defined, scoped, and permissioned (least privilege per tool).
Knowledge grounding - RAG quality + citations; entitlement-filtered retrieval.
Memory - session vs long-term; where it is stored and secured.
Guardrails + human approval - blocking unsafe actions; approval gates for consequential steps.
Enterprise identity - does the agent act as a scoped identity (not a shared super-user)?
Audit logging + monitoring - every tool call, retrieval, and action logged and reviewable.

Strong enterprise warning - applies to every agent platform

AI agents must not directly connect to production OLTP databases or run uncontrolled dynamic SQL. Instead, give agents access only through governed APIs, curated datasets, semantic layers, read-only reporting replicas, or controlled serving layers with least-privilege identities, validated/parameterized queries, human approval for writes or consequential actions, and full audit logging. Treat an agent as an untrusted actor with a scoped, monitored identity - not as a trusted service account with broad access.

AI governance note

Maintain an inventory of deployed agents (owner, purpose, tools, data access, approval gates), monitor their tool calls and actions, and require change review for new tools/permissions. An agent that gains a new tool has a new blast radius.

Vendor agent references: OCI GenAI Agents, Bedrock Agents, Azure AI Agent Service, Vertex Agent Builder →

5. RAG and Knowledge Base Comparison

Retrieval-Augmented Generation - the model retrieves relevant enterprise data first, then answers grounded in it. Compared across providers, with architecture diagrams and the gotchas that decide answer quality.

Last reviewed: July 2026 RAG tooling evolves fast - verify managed features with current vendor docs.

TL;DR

RAG is the same pipeline everywhere: ingest → chunk → embed → store vectors → retrieve (entitlement-filtered) → rerank → ground the model → cite → audit. Managed options: Bedrock Knowledge Bases, Azure AI Search + OpenAI, Vertex AI Search / RAG Engine, OCI Agent knowledge bases + DB Vector Search, plus Databricks and Snowflake Cortex Search. The hard parts - chunking quality, retrieval relevance, security trimming, and index freshness - are the same on every platform and matter more than the model.

RAG options, side by side

Provider	Managed RAG	Vector store options	Reranking	Citations	Access control
OCI	GenAI Agents knowledge bases	Oracle DB 23ai AI Vector Search; OpenSearch; Object Storage	Design/model-dependent	Supported	DB/IAM entitlements
AWS	Bedrock Knowledge Bases	OpenSearch; Aurora pgvector; (others)	Rerank models	Supported	IAM + source ACLs
Azure	Azure AI Search (+ OpenAI)	AI Search vector; Cosmos DB; PostgreSQL pgvector	Semantic ranker	Supported	Security trimming in index
Google	Vertex AI Search / RAG Engine	Vertex Vector Search; AlloyDB/Cloud SQL pgvector; BigQuery	Ranking API	Supported	IAM + data-store ACLs
Other	Databricks; Snowflake Cortex Search; watsonx Discovery	Platform-native vector	Varies	Varies	Platform governance

RAG architectures

Enterprise RAG: offline ingest/embed/store; runtime query → governed serving layer → entitlement-filtered + reranked retrieval → grounded, cited answer; audit + optional human approval. Same shape on every cloud.

RAG with database-backed vectors

Vectors live in the operational DB (Oracle 23ai, AlloyDB, pgvector). Retrieval inherits DB IAM, backups, and row-level security - simplest path to entitlement-filtered retrieval. Best when data already lives in that DB.

RAG with object storage + search index

Docs in object storage, indexed by a search service (AI Search, Vertex AI Search, OpenSearch/Kendra). Best for unstructured document corpora and hybrid (keyword+vector) search with connectors.

RAG gotchas (identical on every platform)

RAG gotchas

Bad chunking creates bad answers - chunk size/overlap and document structure dominate quality.
Retrieval quality matters more than model hype - a great model on poor retrieval still hallucinates.
Security trimming is hard - and must happen before retrieval, not by filtering the answer. Enforce entitlements at index/retrieval time.
Stale indexes create wrong answers - automate re-indexing; track data freshness.
Vector search cost grows with corpus size and query rate - size indexes and monitor.
RAG does not eliminate hallucination - it reduces it; still validate outputs and require citations.
Access control before retrieval, not only after - never rely on the model to withhold data it was given.

AI governance note

Log the retrieved context IDs alongside the prompt and answer so every response is traceable to its sources. Add a human-approval gate for any answer that triggers an action. Without this, you cannot explain or defend an answer to security or compliance.

Vendor RAG references: Bedrock Knowledge Bases, Azure AI Search, Vertex AI Search, OCI GenAI Agents →

6. Vector Search and AI Database Comparison

Where to store and search embeddings - in an operational database, a dedicated vector service, or a specialist vector DB - compared neutrally, with a decision guide by data location and scale.

Last reviewed: July 2026 Verify vector features (esp. Azure SQL / DynamoDB) with current vendor docs.

TL;DR

Two broad choices: vectors in an operational database (Oracle DB 23ai, pgvector on Aurora/AlloyDB/Cloud SQL/Azure PostgreSQL, Cosmos DB) - which inherit existing IAM, backups, and row-level security - or a dedicated vector/ANN service (Azure AI Search, Vertex Vector Search, OpenSearch, or specialist DBs like Pinecone/Weaviate/Milvus/Qdrant) for large-scale, low-latency semantic search. Keep vectors near the governed data when you can; go dedicated when scale/latency demands it.

Vector search options, side by side

Option	Where vectors live	Managed?	Hybrid search	Metadata filter	Security model	Best-fit
Oracle DB 23ai AI Vector Search	In Oracle Database	Yes (managed DB)	Yes (+ SQL)	Yes (SQL WHERE)	DB IAM + row/label security	Vectors next to Oracle relational data
Aurora/RDS pgvector	In PostgreSQL	Yes (managed DB)	Via SQL/extensions	Yes (SQL)	DB IAM	Existing Postgres on AWS
OpenSearch / Kendra	Dedicated index	Yes	Yes	Yes	Fine-grained + source ACLs	AWS-native large-scale search
Azure AI Search	Dedicated index	Yes	Yes (+ semantic)	Yes	Security trimming	Azure OpenAI RAG default
Cosmos DB / PG pgvector	In the database	Yes	Partial	Yes	DB RBAC	Vectors with operational data
Vertex Vector Search	Dedicated ANN	Yes	Filter-based	Yes	IAM	Very large-scale, low-latency
AlloyDB / Cloud SQL pgvector; BigQuery	In DB / warehouse	Yes	Via SQL	Yes	DB/warehouse security	Vectors with GCP data
Databricks / Snowflake Cortex Search	In the data platform	Yes	Yes	Yes	Platform governance	Vectors next to lakehouse/warehouse data
Pinecone / Weaviate / Milvus / Qdrant	Dedicated vector DB	Managed or self	Varies	Yes	Own model	Cloud-neutral / specialist scale

DBA note - in-DB vs dedicated

Keeping vectors in the operational database (Oracle 23ai, pgvector, AlloyDB) means retrieval inherits your existing IAM, backups, DR, and row/label-level security - the simplest path to entitlement-filtered retrieval, and you combine vector distance with ordinary SQL filters. A dedicated ANN service (Vertex Vector Search, Pinecone) wins on very large scale and low latency but adds another data store to secure and keep in sync. Choose by scale, latency, and where the source data already lives.

Decision guide (neutral)

Situation	Neutral guidance
Data is in Oracle Database	Oracle DB 23ai AI Vector Search - vectors + relational data + governance in one place.
AWS-native RAG	OpenSearch or Aurora pgvector with Bedrock Knowledge Bases.
Azure OpenAI pattern	Azure AI Search (hybrid + semantic) is the common default.
GCP data + AI	Vertex Vector Search for scale; AlloyDB/BigQuery vectors to stay near the data.
Large-scale semantic search	Dedicated ANN (Vertex Vector Search, OpenSearch, or specialist DBs).
Existing PostgreSQL users	pgvector (any cloud) - lowest friction; verify performance at scale.
Data-warehouse-integrated search	BigQuery vector search, Snowflake Cortex Search, or Databricks Vector Search.
Cloud-neutral / specialist	Pinecone, Weaviate, Milvus, or Qdrant - portable across clouds.

Common mistake

Standing up a separate vector database when the data already lives in a database that supports vectors - now you have two stores to secure, back up, and keep in sync, plus a harder entitlement-filtering problem. Start in-DB unless scale/latency clearly requires a dedicated service.

Vendor vector references: Oracle AI Vector Search, OpenSearch, Azure AI Search, Vertex Vector Search →

7. Machine Learning Platform Comparison

Full ML/MLOps platforms compared - training, deployment, pipelines, registry, monitoring, and governance - plus neutral guidance on when a managed platform is worth it.

Last reviewed: July 2026 Verify feature depth and pricing with current vendor docs.

TL;DR

The end-to-end ML platforms - SageMaker, Vertex AI, Azure ML, OCI Data Science, plus Databricks Mosaic AI and IBM watsonx.ai - cover notebooks, training, deployment, pipelines, registry, and monitoring. SageMaker and Vertex are generally the broadest; Databricks is a common cross-cloud choice. Use a managed platform when you need reproducible pipelines, governance, and scale; a plain notebook or in-database ML may be enough for smaller work.

ML platforms, side by side

Provider	Platform	Best strength	Training	Deployment	MLOps	Governance	Best-fit users	Main limitation
OCI	Data Science	Oracle-data integration; AI Quick Actions	Jobs	Model Deployment	Pipelines	IAM + audit	Oracle-centric teams	Smaller ecosystem than AWS/GCP
AWS	SageMaker	Breadth + ecosystem	Managed/distributed	Endpoints (real-time/batch/serverless)	Pipelines + Registry + Monitor	Clarify + IAM	Broad ML teams	Complexity/choice overload
Azure	Azure Machine Learning	Microsoft + Responsible AI tooling	Managed/distributed	Managed endpoints	Pipelines + Registry	Responsible AI dashboard	Microsoft-centric teams	Learning curve; naming
Google	Vertex AI	Data + AI integration; TPUs	Managed/distributed (TPU)	Endpoints	Pipelines + Registry + Monitoring	Explainable AI	Data/AI-led teams	Enterprise familiarity varies
Databricks / IBM	Mosaic AI / watsonx.ai	Lakehouse-native (Databricks); governance (IBM)	Yes	Model Serving	MLflow / Workflows	Unity Catalog / watsonx.governance	Lakehouse or governance-led teams	Verify multi-cloud coverage

When to use what (neutral)

Use a managed ML platform when you need reproducible pipelines, a model registry with approvals, managed endpoints, monitoring, and governance at team/enterprise scale.
A simple notebook is enough for exploration, one-off analysis, or a single small model with light serving needs.
Kubernetes-based ML (Kubeflow, KServe) fits teams who want portability and already run Kubernetes - at the cost of more ops.
Data-warehouse ML (BigQuery ML, Redshift ML, Oracle in-DB, Snowflake) fits when the data lives in the warehouse and SQL-based ML is sufficient - minimal data movement.
Do not build ML infrastructure at all when a prebuilt AI service (Document AI, Language, Vision) or a foundation model already solves the task - most "ML projects" are now API calls.

Architect note - reduce lock-in

Use open formats and standards (MLflow for registry/tracking, ONNX for models where practical, OpenTelemetry for monitoring, Kubeflow/containers for portable pipelines) so the ML platform is a productivity layer, not a cage. Deep native pipelines are fine when you are committed to one cloud; keep the portable option open if multi-cloud matters.

Cost note

ML cost is dominated by training compute (GPU/TPU hours) and always-on inference endpoints - and inference often exceeds training cost over a model's life. Right-size and reserve accelerators, use batch or scale-to-zero endpoints where latency allows, and shut down idle notebooks and endpoints (a very common source of waste).

Vendor ML references: OCI Data Science, SageMaker, Azure ML, Vertex AI, Databricks, watsonx.ai →

8. AI Infrastructure and Accelerators

GPUs, custom accelerators (TPU, Trainium/Inferentia), Kubernetes for AI, and the infrastructure gotchas - quota, region availability, and idle-GPU cost - that decide whether an AI project ships.

Last reviewed: July 2026 Verify accelerator SKUs, quota, and region availability with current vendor docs.

TL;DR

NVIDIA GPUs are available on every cloud (portable); the differentiators are custom silicon - TPU (Google) and Trainium/Inferentia (AWS) - and bare-metal GPU breadth (OCI). All offer managed training/inference and Kubernetes for AI. The infrastructure realities that actually block projects are the same everywhere: GPU quota, region availability, data-pipeline bottlenecks, and idle-GPU cost. Verify accelerator availability in your region early.

AI infrastructure, side by side

Provider	Accelerator options	Managed training	Managed inference	Kubernetes for AI	Bare metal GPU	Best-fit workload	Main constraint
OCI	NVIDIA GPU shapes + bare metal; cluster networking (RDMA)	Data Science	Model Deployment	OKE	Broad	Large training on bare-metal GPU clusters	Verify GPU SKU by region
AWS	NVIDIA (P/G) + Trainium + Inferentia	SageMaker	SageMaker Endpoints	EKS	.metal (narrower)	Cost/perf at scale with custom silicon	Quota + custom-silicon lock-in
Azure	NVIDIA N-series (+ Maia)	Azure ML	Azure ML endpoints	AKS	Specialized	Microsoft-ecosystem AI + big GPU	Region/SKU availability
Google	NVIDIA GPU + TPU	Vertex Training	Vertex Endpoints	GKE	(Bare Metal Solution)	Large-scale training (TPU) + inference	TPU ties the serving stack
Other	NVIDIA (CoreWeave, Lambda, etc.)	Databricks	Model Serving	Kubernetes	Varies	GPU-focused / neutral	Verify integration + support

Architect note - portability vs cost

NVIDIA GPUs are the portable choice - the same model-serving stack runs across clouds. Custom silicon (TPU, Trainium/Inferentia) can offer better price/performance but ties your serving/training stack to that provider. If multi-cloud or portability matters, standardize on NVIDIA + open frameworks; if you are committed to one cloud and cost-sensitive at scale, evaluate custom silicon.

AI infrastructure gotchas (universal)

Gotchas

GPU quota can block projects - default quotas are low; request increases early on every cloud.
Region availability matters - the accelerator you want may not exist in your region; verify before designing.
The data pipeline can bottleneck GPUs - slow data loading starves expensive accelerators; size storage/network for the training I/O.
Network and storage can dominate performance - for distributed training, interconnect (RDMA/InfiniBand) and storage throughput often matter more than raw GPU count.
Idle GPUs are expensive - the most common AI-infra waste; auto-stop, schedule, or use spot for interruptible work.
Inference cost can exceed training cost over time - an always-on endpoint runs 24x7; batch or scale-to-zero where latency allows.

Cost note

Reserve or commit accelerator capacity for steady training; use spot/preemptible for fault-tolerant training; use batch or scale-to-zero inference where possible; and monitor idle GPU hours ruthlessly. Managed foundation-model APIs often beat self-hosting a model on GPUs unless you have sustained high volume.

Vendor AI-infra references: OCI GPU, AWS Trainium/Inferentia, Azure GPU, Google TPU →

9. Document, Vision, Speech, and Language AI

Prebuilt AI services for documents, images, audio, and text - close equivalents across clouds, but with differences in custom-model support, human review, language coverage, and privacy constraints.

Last reviewed: July 2026 Verify language coverage, features, and pricing with current vendor docs.

TL;DR

These prebuilt AI services are largely close equivalents across OCI/AWS/Azure/GCP - the same tasks with different names, differing in custom-model support, human-in-the-loop review, language coverage, and privacy handling. Increasingly, general-purpose LLMs overlap with some of these (extraction, classification, summarization) - choose the prebuilt service for accuracy/cost on well-defined tasks, and an LLM when flexibility matters. Verify language and feature coverage for your specific use.

Document AI

Vision

Speech

Language

Document AI (OCR, forms, tables)

	OCI	AWS	Azure	Google
Service	Document Understanding	Textract	AI Document Intelligence	Document AI
OCR / tables / forms	Yes	Yes	Yes	Yes
Custom models	Yes	Yes (custom queries/adapters)	Yes (custom extraction)	Yes (custom processors)
Human review	Design-dependent	A2I	Design-dependent	Human-in-the-loop
Best-fit	Oracle-integrated doc pipelines	AWS doc pipelines, invoices	Microsoft-ecosystem forms	High-volume document extraction

Operations note

All return confidence scores - route low-confidence extractions to a human review queue rather than trusting them blindly. This human-in-the-loop step is what makes document AI production-ready.

Vision AI

	OCI	AWS	Azure	Google
Service	OCI Vision	Rekognition	Azure AI Vision	Vision AI
Classification / detection	Yes	Yes	Yes	Yes
Custom vision	Yes	Custom Labels	Custom Vision	AutoML Vision
Face / moderation	Limited	Yes	Yes (gated)	Yes (gated)

Security note

Face recognition and biometric analysis carry privacy and legal constraints (consent, retention, jurisdiction) and are increasingly gated by vendors. Confirm the legal basis and vendor policy before using face/biometric features; prefer non-biometric approaches where possible.

Speech AI

	OCI	AWS	Azure	Google
Speech-to-text	OCI Speech	Transcribe	Azure AI Speech	Speech-to-Text
Text-to-speech	(Speech)	Polly	Neural TTS	Text-to-Speech
Real-time / batch	Both	Both	Both	Both
Diarization	Yes	Yes	Yes	Yes

Common use - contact center

Speech-to-text + diarization + summarization (LLM) is the standard contact-center pattern on every cloud (see section 12). Verify language/accent coverage and real-time latency for your use, and mind compliance for recording/retaining calls.

Language AI

	OCI	AWS	Azure	Google
Service	OCI Language	Comprehend	Azure AI Language	Natural Language AI
Sentiment / entities / PII	Yes	Yes	Yes	Yes
Classification	Yes (custom)	Yes (custom)	Yes (custom)	Yes (AutoML)
Translation	Yes	Translate (separate)	Translator (separate)	Cloud Translation (separate)

Cost note

For well-defined NLP tasks (sentiment, entity/PII extraction, classification), the prebuilt language services are usually cheaper and more predictable than calling an LLM - and often more accurate on narrow tasks. Reserve LLMs for open-ended language work; use prebuilt NLP for structured extraction at volume.

Vendor applied-AI references: OCI AI Services, AWS Textract/Rekognition/Transcribe/Comprehend, Azure AI, Google AI →

10. AI and Enterprise Search

AI-powered enterprise search - keyword, semantic, and hybrid - with connectors, security trimming, and RAG integration, compared across providers.

Last reviewed: July 2026 Verify connector coverage and security-trimming features with current vendor docs.

TL;DR

Enterprise search now blends keyword + vector (semantic) + hybrid ranking, with connectors to enterprise systems and (critically) security trimming so users only see what they are entitled to. Turnkey options: Kendra/OpenSearch, Azure AI Search, Vertex AI Search / Agentspace, plus OCI Search / AI Vector Search and Elastic/Glean. These are also the retrieval layer for RAG. The hard part - source-level access control preserved in the index - is the same everywhere.

AI search, side by side

Provider	Search service	Keyword	Vector	Hybrid	Connectors	RAG integration	Access control	Best use	Gotcha
OCI	OCI Search (OpenSearch) / AI Vector Search	Yes	Yes	Yes	Custom	Via GenAI Agents	DB/IAM	Oracle-data + open-source search	Assemble connectors
AWS	Amazon Kendra / OpenSearch	Yes	Yes	Yes	Many (Kendra)	Bedrock KB	Token-based ACLs	Turnkey enterprise search (Kendra)	Kendra cost at scale
Azure	Azure AI Search	Yes	Yes	Yes (+ semantic ranker)	Indexers	Azure OpenAI	Security trimming	Azure OpenAI RAG default	Design security trimming carefully
Google	Vertex AI Search / Agentspace	Yes	Yes	Yes	Connectors	Native RAG	IAM + ACLs	Turnkey grounded search	Verify connector coverage
Other	Elastic; Glean; OpenSearch managed	Yes	Yes	Yes	Broad	Varies	Own model	Cloud-neutral / SaaS-wide search	Verify governance model

Security note - the search-specific trap

The defining challenge of enterprise AI search is security trimming: results (and the context fed to an LLM) must reflect each user's permissions across every connected source. Enforce this in the index / at retrieval (document-level ACLs mirrored from the source), not by filtering the final answer - the model must never receive documents the user cannot see. Turnkey services (Kendra, Vertex AI Search) help, but you still map source ACLs correctly.

Architect note

Enterprise search and RAG retrieval are the same capability viewed two ways - one returns links, the other feeds an LLM. Build the governed, security-trimmed search/retrieval layer once and reuse it for both. Prefer hybrid (keyword + vector) ranking; pure vector search misses exact-match and rare-term queries.

Vendor search references: Amazon Kendra/OpenSearch, Azure AI Search, Vertex AI Search, OCI Search →

11. AI Governance, Security, and Responsible AI

The enterprise controls that make AI safe to run on real data - privacy, private networking, access control, logging, guardrails, responsible AI, and a production governance checklist that applies across providers.

Last reviewed: July 2026 AI governance features are new and evolving - verify with current vendor docs.

TL;DR

AI governance is mostly your configuration, not a product. The controls are the same across clouds - private endpoints, native IAM, prompt/output logging, guardrails/content safety, CMK encryption, data-retention terms, human approval, and responsible-AI checks - implemented with each cloud's tools (Bedrock Guardrails, Azure AI Content Safety + Entra + Private Link, Google Model Armor + VPC-SC, OCI IAM/Vault, IBM watsonx.governance). Treat AI as a new attack surface (prompt injection, data exfiltration) and govern it deliberately from day one.

Governance controls, side by side

Control	OCI	AWS	Azure	Google
Identity / access	IAM	IAM	Entra ID + RBAC	Cloud IAM
Private networking	Private endpoints	PrivateLink / VPC	Private Link / Private Endpoints	Private Service Connect
Guardrails / content safety	Guardrails	Bedrock Guardrails	AI Content Safety	Model Armor / safety filters
Key management	Vault (CMK)	KMS	Key Vault	Cloud KMS
Secrets	Vault	Secrets Manager	Key Vault	Secret Manager
Prompt / output logging	Audit + Logging	CloudTrail + Bedrock logs	Activity Log + Foundry logs	Audit Logs + Vertex logs
Data-exfil perimeter	(network + IAM)	(SCP + endpoints)	(Private Link + Policy)	VPC Service Controls
Sensitive-data discovery	Data Safe	Macie	Purview	Sensitive Data Protection
Responsible AI tooling	(guidance)	SageMaker Clarify	Responsible AI dashboard	Explainable AI
Model + use-case governance	(policy + inventory)	(SageMaker + Config)	(Purview + RAI)	(Registry + policy)

AI governance note - the new risks

Generative AI adds attack surface that traditional controls miss: prompt injection (untrusted content hijacking instructions), data leakage (the model revealing context it shouldn't), data exfiltration (an agent copying data out), and over-permissioned agents. Defend with content-safety on input and output, security-trimmed retrieval, least-privilege scoped identities, private networking, a data-exfiltration perimeter for sensitive data, and full prompt/output logging.

Production AI governance checklist (portable)

Approved use case - documented, with a business owner and a risk assessment.
Data classification - what data the AI touches, and its sensitivity level.
Model selection - approved model(s), with data-handling/retention terms verified.
Prompt logging policy - prompts + retrieved context IDs logged (per privacy rules).
Output review policy - outputs logged; human review for consequential answers.
Access control - least-privilege scoped identity; security-trimmed retrieval before generation.
Private networking - private endpoints for model, retrieval, and data services.
Data retention terms - confirmed the provider does not retain/train on your data (or terms accepted).
Human approval requirement - for any write/action or high-impact output.
Abuse / injection monitoring - content safety on input+output; prompt-injection detection.
Cost monitoring - token/GPU/vector spend tracked with budgets/alerts.
Incident response - a plan for a leaked prompt, bad answer, or compromised agent.
Legal / compliance review - completed for the use case and data.
Vendor documentation verified - model, region, features, and terms confirmed current.

Security note

Keep an inventory of AI use cases, models, and agents with owners, data access, and approval status - shadow AI (ungoverned experiments touching real data) is the fastest-growing risk. Encrypt with customer-managed keys, keep secrets in the vault, use private endpoints, and log everything you can defend.

Vendor governance references: Bedrock Guardrails, Azure Content Safety/Responsible AI, Google Model Armor, IBM watsonx.governance →

12. Enterprise AI Architecture Patterns

Common enterprise AI patterns, each mapped to OCI/AWS/Azure/GCP - the pattern shape is portable; the services differ. Every pattern shares the same governed-serving-layer backbone.

Last reviewed: July 2026 Service choices are examples - verify current best practices per vendor.

The portable backbone

Almost every enterprise AI pattern has the same shape: user → governed serving layer (authN/authZ + guardrails + logging) → security-trimmed retrieval → model → validated, audited output, over private networking. What changes per cloud is the model service, the retrieval/vector store, and the applied-AI services. Below, each pattern lists that mapping plus the risks that apply everywhere.

Pattern catalog

Pattern	OCI	AWS	Azure	Google	Key risk
Chat with documents	GenAI Agents + Vector Search	Bedrock KB + OpenSearch	Azure OpenAI + AI Search	Gemini + Vertex AI Search	Chunking/freshness; security trimming
Chat with database	Select AI / DB 23ai	Bedrock + curated views	OpenAI + curated views	Gemini + curated views	Never raw prod OLTP; use serving layer
Natural language to SQL	Select AI (Autonomous DB)	Bedrock + QuickSight Q	Fabric/Copilot + OpenAI	BigQuery + Gemini	Validate/parameterize; read-only
AI assistant for IT ops	Ops Insights + GenAI	DevOps Guru + Bedrock	Azure Monitor + Copilot	Active Assist + Gemini	Human approval before actions
AI assistant for business users	GenAI Agents + curated data	Bedrock + governed data	Copilot + governed data	Agentspace + governed data	Answer only from curated data
Customer support chatbot	Digital Assistant + GenAI	Lex + Bedrock	Bot Service + OpenAI	Dialogflow + Gemini	Grounding + human escalation
Contact center transcription/summary	Speech + GenAI	Connect + Contact Lens + Bedrock	Speech + OpenAI	CCAI + Gemini	Recording compliance; real-time
Invoice / document processing	Document Understanding	Textract + Bedrock	Document Intelligence + OpenAI	Document AI + Gemini	Human review of low-confidence
Enterprise knowledge search	OCI Search / Vector Search	Kendra / OpenSearch	Azure AI Search	Vertex AI Search / Agentspace	Source-level security trimming
RAG over object storage	Object Storage + Vector Search	S3 + Bedrock KB	Blob + AI Search	Cloud Storage + Vertex Search	Index freshness + ACLs
RAG over database	DB 23ai Vector Search	Aurora pgvector	Cosmos / PG pgvector	AlloyDB pgvector	Row-level entitlements
RAG over data warehouse	ADW + Select AI	Redshift ML + Bedrock	Fabric + OpenAI	BigQuery vector + Gemini	Query cost + column security
AI over Oracle EBS / ERP data	Read-only reporting layer + GenAI	Extract to lake + Bedrock	Extract + OpenAI	Extract + Gemini	Never live ERP; performance + governance
AI over CRM data	Governed API + GenAI	Bedrock + governed API	OpenAI + Dataverse/API	Gemini + governed API	PII handling; entitlements
AI code assistant	(GenAI + code models)	Amazon Q Developer	GitHub Copilot / Foundry	Gemini Code Assist	IP / secret leakage in prompts
MLOps train/deploy pipeline	Data Science pipelines	SageMaker Pipelines	Azure ML pipelines	Vertex Pipelines	Reproducibility; drift monitoring
Real-time recommendations	ML + serving	SageMaker + feature store	Azure ML + serving	Vertex + feature store	Latency; feature skew
Forecasting / anomaly detection	Anomaly Detection / ML	SageMaker / Lookout	Azure ML	BigQuery ML / Vertex	Verify current managed service
AI governance & audit	IAM + Audit + inventory	Guardrails + CloudTrail	Content Safety + Purview	Model Armor + audit	Shadow AI; missing audit trail

Featured pattern: governed enterprise RAG

Chat with enterprise documents (governed RAG)

The most common enterprise GenAI pattern - same shape on every cloud

Business use	Employees ask questions and get grounded, cited answers from internal documents they are entitled to see.
Data flow	Ingest docs → chunk + embed → store vectors with ACL metadata → at query time: authN/authZ → entitlement-filtered retrieval + rerank → grounded generation → cited, audited answer.
Identity	User authenticates to the serving layer (native IdP); the app carries the user's entitlements into retrieval.
Security	Private endpoints for model + retrieval + storage; security-trimmed retrieval; content safety on input/output; secrets in the vault; CMK.
Monitoring	Log prompts, retrieved context IDs, and outputs; track answer quality/groundedness and token cost.
Cost drivers	Tokens (context size), embeddings, vector storage/queries, and model choice.
Best-fit provider conditions	Follow the data: Oracle data → OCI; AWS lake → AWS; Microsoft/M365 → Azure; BigQuery/GCP data → Google. Verify models/region.
Common mistakes	Retrieval not entitlement-filtered; stale index; sending whole documents (cost); no citations/audit; connecting the model to raw production data.

Common mistakes across all AI patterns

Connecting the model/agent to raw production OLTP instead of a governed serving layer.
Retrieval not security-trimmed - data leaks across users.
Sending entire documents to the model (cost + context dilution) instead of retrieved chunks.
No citations, no audit trail - answers can't be explained or defended.
Stale indexes; poor chunking - wrong answers regardless of model.
Designing around one model that later changes availability/price.

Pair with the single-cloud deep-dive portals for full per-cloud AI architectures →

13. AI Workload Decision Matrix

By AI workload, which providers are strong candidates and why - balanced wording. A "strong candidate" reflects natural advantages under common conditions, not a verdict; any provider can be valid depending on your data, ecosystem, and skills.

Last reviewed: July 2026 Fit depends on your specifics and current model/region availability - verify.

How to read this - follow the data

The single strongest signal for AI-platform fit is usually where your data already lives and which ecosystem you already operate. Example: Azure may be a strong fit when the enterprise is standardized on Microsoft 365, Entra ID, and Azure OpenAI. AWS may be strong when the org already uses S3, Bedrock, and SageMaker. GCP may be strong where BigQuery, Vertex AI, and analytics are central. OCI may be strong where Oracle Database, AI Vector Search, and Oracle enterprise workloads are central. Apply this balance throughout.

Workload	Strong candidate(s)	Why they fit	Services to evaluate	Main trade-off / risk
Enterprise RAG	All (follow the data)	Every cloud has managed RAG; fit follows where docs/data live	Bedrock KB / AI Search / Vertex Search / OCI Agents	Retrieval quality + security trimming, not model
Chatbot over internal docs	All	Standard RAG pattern everywhere	Managed RAG + vector store	Chunking/freshness
Chat with relational database	Follow the DB	In-DB AI where data lives (Oracle 23ai, BigQuery, AlloyDB)	Select AI / pgvector / BigQuery ML	Never raw prod OLTP; serving layer
Natural language to SQL	OCI (Select AI), GCP (BigQuery+Gemini)	Native NL-to-SQL where the DB/warehouse is	Select AI, BigQuery+Gemini, Databricks Genie	Uncontrolled dynamic SQL
AI over Oracle Database	OCI (others via extract)	Oracle DB 23ai AI Vector Search + Select AI in the DB	OCI GenAI + DB 23ai	Others need data extraction
AI over Microsoft ecosystem	Azure	M365/Entra/Copilot + Azure OpenAI integration	Azure OpenAI, AI Search, Copilot	Ecosystem lock-in
AI over Google data ecosystem	GCP	BigQuery ML + Vertex + Gemini integration	BigQuery ML, Vertex AI	Verify enterprise familiarity
AI over AWS data lake	AWS	S3 + Bedrock + SageMaker + native governance	Bedrock, SageMaker, OpenSearch	Complexity
Document extraction / OCR at scale	All (close equivalents)	Mature document AI on all four	Textract / Doc Intelligence / Document AI / OCI DU	Human review of low-confidence
Contact center AI	AWS (Connect), GCP (CCAI)	Turnkey contact-center platforms	Connect+Contact Lens, CCAI	Recording compliance
Code assistant	All (verify)	Amazon Q, GitHub Copilot, Gemini Code Assist	Q Developer, Copilot, Gemini	IP/secret leakage in prompts
ML model training / deployment	AWS, GCP (all valid)	SageMaker + Vertex breadth; Databricks cross-cloud	SageMaker, Vertex, Azure ML, Databricks	GPU quota; lock-in
Time-series forecasting / anomaly	All (verify managed service)	ML platforms + BigQuery ML; some standalone services deprecated	BigQuery ML, SageMaker, OCI Anomaly	Verify current managed path
Image / video analysis	All (Video: AWS/GCP stronger)	Vision on all; video coverage differs	Rekognition, Vision AI, AI Vision, OCI Vision	Privacy for face/biometric
Speech transcription	All (close equivalents)	Mature speech on all four	Transcribe, Speech, Speech-to-Text, OCI Speech	Accent/domain accuracy
Semantic / enterprise search	All	Kendra/Vertex Search turnkey; AI Search; OCI	Kendra, Vertex Search, AI Search	Security trimming across sources
Real-time AI inference	All	Managed endpoints everywhere; custom silicon differs	Managed endpoints; GPU/TPU/Inferentia	Idle-endpoint + latency cost
Regulated AI workload	All (verify)	Private endpoints + governance on all; compliance varies	Private networking + guardrails + audit	Verify certifications/data terms
On-prem / hybrid AI	Varies (verify)	watsonx (portable), Azure Arc, some on-prem model options	watsonx, Arc, OSS models on-prem	Verify on-prem model support
Multicloud AI platform	Neutral platforms	Databricks, Snowflake Cortex, Hugging Face, OSS run across clouds	Databricks, Snowflake, HF, OTel	Trade deep native features for portability

Architect note

Decide by: where the data lives, the ecosystem you operate, required models + region availability, governance/compliance needs, cost model, and team skills. The right answer is often "the AI platform closest to the data," and sometimes a neutral platform (Databricks, Snowflake, OSS) when portability matters more than deep native integration.

See the single-cloud deep-dive portals for full per-cloud AI depth →

14. Cost Comparison for AI Services

The cost drivers that dominate AI spend, how they map across providers, and neutral optimization guidance. No provider is cheapest overall - it depends on the workload.

Last reviewed: July 2026 AI pricing changes constantly - verify all rates on vendor pricing pages.

TL;DR

AI cost is driven by tokens (input + output), embeddings, fine-tuning, model hosting / endpoint uptime, GPU hours, vector storage + queries, and applied-AI usage (OCR pages, speech minutes) - plus the usual data transfer, logging, and private-networking costs. The biggest surprises are context size (tokens), idle endpoints/GPUs, and vector-search at scale. There is no cheapest AI cloud; the answer depends on volume, model choice, and architecture. Optimize the architecture (retrieval quality, context size, caching) before shopping list prices.

AI cost drivers, side by side

AI workload	Main cost driver	Cost consideration (all clouds)	Cost control	Gotcha
LLM chat / RAG	Input + output tokens	Context size dominates; output tokens often priced higher	Retrieve minimal context; smaller models; cache	Sending whole documents blows up token cost
Embeddings	Tokens embedded	One-time (ingest) + per-query	Batch + cache; re-embed only changed content	Re-embedding everything on each run
Fine-tuning	Training tokens/hours	Upfront cost; may not beat good RAG	Try RAG/prompting first	Fine-tuning when RAG would suffice
Model hosting / endpoints	Endpoint uptime	24x7 endpoints cost even when idle	Scale-to-zero / batch / serverless	Idle endpoints are silent spend
Provisioned throughput	Reserved capacity	Predictable cost + throughput vs on-demand	Match to sustained volume	Over-provisioning for spiky load
GPU / training	Accelerator hours	Dominant for training; custom silicon may cut cost	Spot + reservations; right-size	Idle GPUs; data-pipeline starvation
Vector search	Storage + queries	Grows with corpus + query rate	Right-size indexes; filter early	Unbounded index growth
Document AI / OCR	Pages processed	Per-page pricing	Pre-filter; process only needed pages	Reprocessing whole archives
Speech	Audio minutes	Per-minute; real-time may cost more	Batch where latency allows	Transcribing everything
Logging / monitoring	Ingestion volume	Prompt/output logging adds up	Sample; set retention	Logging full payloads unbounded

AI cost optimization (portable)

Cost note - optimize architecture first

Use smaller models where they suffice - many tasks don't need the largest model.
Cache common responses and embeddings; don't recompute.
Reduce context size - retrieve the smallest sufficient chunks; don't send whole documents.
Improve retrieval quality - better retrieval means fewer tokens and better answers.
Prefer prebuilt AI services (Document AI, NLP) over an LLM for well-defined tasks - cheaper and more predictable.
Batch where real-time isn't required; shut down idle endpoints/GPUs/notebooks.
Right-size vector indexes and control logging volume.
Track cost per user, per document, per workflow, and per business process - not just per service - so you can see which use cases are worth it.

Do not claim a cheapest AI provider

The cheapest option depends on your model choice, token volumes, whether you self-host or use managed APIs, GPU needs, and architecture efficiency. A well-designed RAG app on a mid-size model can cost a fraction of a poorly-designed one on a frontier model - on the same cloud. Fix the architecture, then compare post-discount prices for your actual usage.

Pair with the cloud cost sections in the single-cloud deep-dive portals →

15. AI Risk and Architecture Warnings

The specific ways enterprise AI goes wrong - what can happen, which patterns are affected, how to reduce it, what to monitor, and whether it is production-ready. These risks apply across all providers.

Last reviewed: July 2026 Treat AI as a new attack surface - verify controls with current vendor docs.

TL;DR

Enterprise AI risk is dominated by a handful of failure modes: hallucination, prompt injection, data leakage, over-permissioned agents, direct production-database access, uncontrolled dynamic SQL, poor/stale retrieval, and missing auditability. None are provider-specific - they are architecture and governance problems. The mitigations are consistent: governed serving layer, least privilege, security-trimmed retrieval, human approval for actions, content safety, and full logging. Build these in before going to production.

Risk	What can go wrong	Affected patterns	How to reduce	Monitor	Production-ready?
Hallucination	Confident but wrong answers	All GenAI	RAG grounding + citations; validate output; human review for decisions	Groundedness; user feedback	Yes, with validation + citations
Prompt injection	Untrusted content hijacks instructions	RAG, agents, doc processing	Content safety / prompt shields; isolate + sanitize retrieved/user content; least privilege	Injection attempts; anomalies	Requires strong governance
Data leakage	Model reveals context it shouldn't	RAG, agents	Security-trimmed retrieval before generation; output filtering; DLP	Access anomalies; output scans	Requires strong governance
Over-permissioned agents	Agent does more than intended	Agents	Least-privilege scoped identity; per-tool permissions; approval gates	Tool calls; actions	Requires strong governance
Direct production DB access	Agent/LLM queries live OLTP	Chat-with-DB, NL-to-SQL, agents	Never direct; use governed API / curated views / read replica	DB access source	Not recommended without review
Uncontrolled dynamic SQL	Free-form SQL against production	NL-to-SQL	Validated, parameterized, read-only SQL on curated schema only	Queries executed	Not recommended without review
Poor retrieval quality	Irrelevant context → wrong answers	RAG, search	Better chunking; hybrid + rerank; evaluate retrieval	Retrieval relevance metrics	Yes, with evaluation
Stale data	Answers from outdated index	RAG	Automate re-indexing; track freshness	Index age	Yes, with refresh
Lack of auditability	Can't explain/defend an answer	All	Log prompts + context IDs + outputs	Log completeness	Requires strong governance
No human approval	AI acts without oversight	Agents, ops AI	Approval gates for writes/actions	Actions taken	Requires strong governance
Inconsistent answers	Same question, different answers	All GenAI	Lower temperature; deterministic checks; caching	Answer variance	Good for experimentation
Model version changes	Behavior shifts on model update	All	Pin/version models; re-evaluate on change; abstraction layer	Model version; eval scores	Yes, with eval on change
Region availability changes	Model/service unavailable in region	All	Verify + have fallback; abstraction layer	Availability	Yes, with fallback plan
Vendor lock-in	Hard to switch platform/model	All	Abstraction layer; open formats/standards	Coupling	Manageable with design
Hidden cost growth	Token/GPU/vector spend creeps up	All	Budgets/alerts; per-workflow cost tracking; optimization	Cost per user/workflow	Yes, with monitoring
Compliance gaps	Data/PII handled improperly	All	Data classification; retention terms; legal review	Data flows	Requires review
Shadow AI	Ungoverned AI touching real data	Org-wide	AI use-case inventory + approval; guardrails by policy	New AI usage	Requires strong governance
Weak monitoring	Problems unseen until users complain	All	Quality + safety + cost monitoring from day one	Quality/safety/cost	Yes, with observability

The two non-negotiables

(1) No AI agent or model gets direct access to a production OLTP database or runs uncontrolled dynamic SQL - always through a governed API / curated data / read-only serving layer with validated queries. (2) Access control happens before retrieval, and everything is logged - the model must never receive data the user isn't entitled to, and every answer must be traceable to its sources. These two hold on every provider.

See sections 4, 5, and 11 for the controls that mitigate these risks →

16. Troubleshooting AI Workloads

Runbooks for the failures AI workloads actually hit - symptoms, likely causes, cloud-specific checks, fixes, and prevention. The method is portable; the tools differ by provider.

Last reviewed: July 2026 Verify service-specific checks with current vendor docs.

Portable method

AI failures split into three buckets: infrastructure/access (quota, region, IAM, private endpoint), retrieval/data (chunking, freshness, relevance, ingestion), and quality/safety (hallucination, guardrails, injection). Diagnose in that order. Check the provider's model/endpoint logs, quotas, and region availability first - many "the AI is broken" tickets are a disabled model, an exhausted quota, or a region mismatch.

⚑ GenAI model not responding / endpoint timeout / high latency

Causes: model not enabled/available in the region; quota/throughput exceeded; endpoint cold-start or under-provisioned; oversized context; network/private-endpoint issue. Checks: model availability in your region; quota/throughput limits; endpoint metrics/logs (Bedrock/Azure OpenAI/Vertex/OCI); token count of the request. Fix: enable the model / request region access; raise quota or use provisioned throughput; reduce context; scale/warm the endpoint. Prevention: verify region + quota early; add retries with backoff; cap context size.

⚑ Token limit exceeded

Causes: context (retrieved chunks + history + prompt) exceeds the model's context window. Fix: retrieve fewer/smaller chunks; truncate history; summarize; use a larger-context model if justified. Prevention: budget tokens; measure context size; rerank to fewer, higher-quality chunks.

⚑ RAG answers are poor / vector search returns irrelevant results

Causes: bad chunking; wrong embedding model; no reranking; pure-vector missing keyword matches; stale index; retrieving too few/many chunks. Checks: inspect retrieved chunks for the query; evaluate retrieval relevance separately from generation; index freshness. Fix: improve chunking; add hybrid (keyword+vector) + reranking; refresh the index; tune top-k. Prevention: evaluate retrieval as its own metric; automate re-indexing. (Retrieval quality usually matters more than the model.)

⚑ Document ingestion / embedding job failed

Causes: unsupported file type/size; permission to read source; embedding model quota; malformed content; timeout. Checks: ingestion/pipeline logs; source access (IAM); quota. Fix: convert/split files; grant read access; batch and retry; raise quota. Prevention: validate inputs; batch large corpora; monitor the pipeline.

⚑ Agent calls wrong tool / runs unsafe query

Causes: ambiguous tool descriptions; over-broad tool permissions; no human approval; direct DB access. Checks: agent trace (which tool, which input); the tool's identity/permissions. Fix: sharpen tool descriptions; scope each tool with least privilege; add approval gates; route DB access through a governed read-only API. Prevention: never give agents raw DB access; require approval for writes; test tool selection.

⚑ Guardrail blocks valid response / prompt injection suspected

Causes: over-broad content-safety rule (false positive); or a genuine injection attempt in retrieved/user content. Checks: guardrail/content-safety logs; the blocked content. Fix: tune the rule / add an exception (false positive); or confirm and block the injection (isolate/sanitize retrieved content). Prevention: run content safety in report mode first; sanitize untrusted content; monitor injection patterns.

⚑ Private endpoint / IAM / quota / region issues

Private endpoint: DNS not resolving to the private endpoint; missing route/firewall; verify per cloud (PrivateLink / Private Endpoint+Private DNS / PSC / OCI private endpoint). IAM denied: the workload/agent identity lacks the model/data role; check native IAM (OCI IAM / AWS IAM / Entra RBAC / Google IAM). Quota exceeded: request an increase; use provisioned throughput. Region: the model/service isn't in your region - request access or choose a supported region. Cost spike: check token/GPU/endpoint/vector usage in cost tools; find idle endpoints or oversized context.

⚑ Logs missing / hallucination reported by business users

Logs missing: prompt/output logging not enabled (often off by default for privacy/cost); wrong log destination; retention expired - enable Bedrock/Azure OpenAI/Vertex/OCI logging and route to a central store. Hallucination reported: check whether the answer was grounded (retrieved context) and cited; improve retrieval; add citations + a "not found" path; require human review for high-stakes answers; log the incident for evaluation.

See the per-cloud deep-dive portals for full cloud troubleshooting runbooks →

17. Learning Paths for AI Across Clouds

Learn AI services fastest by building on what you know. For each persona: what transfers, what does not, where to start, hands-on labs, common mistakes, and the outcome.

Last reviewed: July 2026 Pair with the single-cloud deep-dive portals and current vendor training.

The universal AI transfer map

Transfers well: RAG architecture, embeddings, chunking, the governed-serving-layer pattern, prompt engineering, vector-search concepts, and MLOps fundamentals - these are provider-neutral. Transfers poorly: each cloud's model catalog + region availability, the managed RAG/agent tooling, the native vector store, and the governance implementation. Learn the portable model once, then learn each cloud's differences.

Cloud architect (any)

DBA / Data engineer

Security / DevOps

AI eng / Enterprise / Business

Cloud architect learning AI across clouds

(OCI, AWS, Azure, or GCP architect learning the others' AI stacks.)

Already know: the cloud foundation, IAM, networking, and where data lives.
Transfers: RAG/agent architecture, private-endpoint patterns, governance principles - identical shapes.
Doesn't transfer: the specific GenAI platform, model catalog + region availability, managed RAG/agent tooling, and vector store.
Start with: the Matrix (1) → GenAI (2) → RAG (5) → Vector (6) → Governance (11). Use the matrix as your translation table.

Hands-on lab

Build the same governed RAG app (serving layer + vector store + model + audit) on a second cloud. You will hit exactly the differences: model access, managed-RAG tooling, and vector store.

Mistakes to avoid: assuming direct model equivalence; designing around one model's limits; skipping security-trimmed retrieval. Outcome: you can design a governed AI architecture on any of the four and know what to verify.

DBA / Data engineer learning GenAI and AI platforms

DBA - already know

Databases, SQL, access control, backup/DR - which directly enable in-database vector search and NL-to-SQL governance.

Data engineer - already know

Pipelines, SQL, Spark, data governance - which map to embeddings pipelines, retrieval, and warehouse-integrated AI.

Transfers

Vector search as a DB feature (Oracle 23ai, pgvector, BigQuery); entitlement-filtered retrieval as row/label security; in-DB/in-warehouse ML.

Doesn't transfer cleanly

Chunking/embedding quality's effect on answers; the governed-serving-layer requirement; agent/NL-to-SQL safety; model catalogs.

DBA note - your skills are an advantage

Your instincts about access control, query safety, and data governance are exactly what enterprise AI needs. The key new ideas: vectors live next to your data (use in-DB vector search to inherit your security), NL-to-SQL must be read-only/validated/parameterized on curated schemas, and agents never get raw DB access. Start with Vector Search (6) + AI for databases in the Matrix (1).

Hands-on labs: build a RAG app with pgvector or Oracle 23ai over data you control, with entitlement-filtered retrieval; build a governed NL-to-SQL over a curated read-only schema. Outcome: you can add AI to a database safely.

Security engineer / DevOps engineer learning AI

Security - transfers

Least privilege, private networking, key/secret management, audit logging, data-exfiltration control - all apply directly to AI.

Security - new

Prompt injection, data leakage via models, agent permissioning, content safety, and AI-specific audit (prompts/outputs).

DevOps - transfers

CI/CD, containers, IaC, and observability (OpenTelemetry) map to MLOps pipelines and model deployment.

DevOps - new

Model registry + versioning, drift/quality monitoring, prompt/eval management, and GPU/endpoint cost control.

Security note

Treat AI as a new attack surface. Your job: enforce security-trimmed retrieval, scope agent identities, put content safety on input+output, keep AI traffic on private endpoints, and log prompts+outputs. Start with Governance (11) + Risk (15).

Hands-on labs: Security - configure guardrails + private endpoints + prompt logging for a GenAI app; test prompt injection. DevOps - build an MLOps pipeline with a model registry and drift monitoring. Outcome: you can secure and operate AI in production.

AI engineer, enterprise architect, or business analyst

AI engineer

Know: models, prompting, embeddings. Learn: enterprise governance, security-trimmed retrieval, cost control, and each cloud's managed tooling. Start: RAG (5), Agents (4), Governance (11).

Enterprise architect

Learn: the workload decision matrix (13), governance (11), cost drivers (14), and risk (15) - to choose platforms by workload, not hype. Start: Matrix (1) + Workloads (13).

Business analyst

Learn: what each pattern (12) can realistically do, the risks (15), and where value is real vs hype. Start: Home, Patterns (12), Risk (15).

Common to all

Follow the data, insist on governance and auditability, validate outputs, and verify model/region/cost before committing.

Mistakes to avoid

Choosing a platform on a benchmark; skipping governance ("we'll add it later"); connecting AI to production data without a serving layer; ignoring cost until the bill arrives; assuming a demo equals production-readiness.

Outcome: you can evaluate, choose, govern, and cost an enterprise AI use case across providers - and tell real value from hype.

Final note

AI fluency across clouds is learning the portable model - RAG, agents, vector search, governance, and the cost/risk levers - once, then learning each provider's differences and current capabilities. Use the Matrix (section 1) as your translation table and verify everything fast-moving before you build.

Return to expertoracle.com for the per-cloud AI and deep-dive portals →