← expertoracle.com

AI Services Across Cloud Providers

A neutral deep-dive comparison of AI, ML, GenAI, agents, vector search, RAG, document/vision/speech AI, MLOps, governance, and AI infrastructure across OCI, AWS, Azure, Google Cloud, and other AI platforms - a practical reference for enterprise architects and engineers choosing the right AI platform for real workloads.

4+ providers compared Searchable AI matrix Match, maturity & readiness Governance & cost No "best AI cloud" claims
Last reviewed: July 2026 AI services change very fast - verify with current vendor documentation before production use.
THE CORE PRINCIPLE: NEUTRAL & PRACTICAL

This portal does not favor any provider and makes no "best AI cloud" claim. Some AI services are close equivalents, some only partially similar, and some have no direct match - we say which, and we mark anything fast-moving as needing verification. AI is the fastest-changing area in cloud: model availability, regions, pricing, token/context limits, fine-tuning, agent features, and data-handling terms shift constantly. Treat concrete claims here as a starting point to verify, not a guarantee.

How to read this portal

The heart is the AI Service Comparison Matrix (section 1): a searchable, filterable table mapping each AI capability across providers with a match rating, maturity, enterprise-readiness, key difference, and risk. Deep-dive sections (2-11) go further by category; the decision sections (12-14) help you choose by workload and cost; and the reference sections (15-17) cover risk, troubleshooting, and learning paths.

Match types

Exact functionally equivalent Close same purpose, minor differences Partial overlaps part of the capability Conceptual similar goal, different model No direct no real counterpart Specific provider-specific capability Verify fast-moving, confirm current state

Maturity levels

Mature · Strong · Evolving · Limited · Preview / region-dependent · Needs verification

Enterprise readiness

Production-ready · Production-ready with constraints · Good for experimentation · Requires strong governance · Not recommended without review

Providers compared

OCI

OCI Generative AI + Agents, AI Vector Search in Oracle Database 23ai, Data Science, OCI AI Services.

AWS

Bedrock (multi-model), SageMaker, Kendra/OpenSearch, Textract/Rekognition/Comprehend.

Azure

Azure OpenAI / AI Foundry, Azure AI Search, Azure ML, AI Document Intelligence/Vision/Speech.

Google Cloud

Vertex AI / Gemini, Vertex AI Search, Vector Search, Document AI, BigQuery ML.

Other platforms, where relevant
Where they add meaningful comparison, this portal also references IBM watsonx, Databricks Mosaic AI, Snowflake Cortex, Hugging Face, and specialist vector databases (Pinecone, Weaviate, Milvus, Qdrant). Providers are not forced into every row - if there is no meaningful equivalent, it is marked "No direct equivalent" or "Not a primary service in this category."

Reading the callouts

Architect note
Design-time trade-offs.
DBA note
Database and vector-search behavior.
Security note
Exposure, access, and private networking.
Cost note
AI cost drivers and control.
AI governance note
Auditability, guardrails, responsible AI.
Common mistake
Errors teams make, incl. fake equivalency.
Accuracy, neutrality & independence
Independent educational resource, not affiliated with or endorsed by Oracle, Amazon, Microsoft, Google, IBM, or any AI vendor. Mappings are engineering judgments as of July 2026 and deliberately conservative. Because AI moves so quickly, verify model/region availability, pricing, limits, fine-tuning, agent features, and data-handling terms against each vendor's official documentation before any design or purchasing decision.

1. AI Service Comparison Matrix

A searchable, filterable map of AI capabilities across OCI, AWS, Azure, Google Cloud, and other platforms - with match rating, maturity, enterprise readiness, best-fit use, key difference, and risk. Click any row to expand.

Last reviewed: July 2026 AI mappings change fast - verify with current vendor docs.
ExactClosePartialConceptualNo directSpecificVerify
AI Capability OCI AWS Azure Google Cloud IBM / Other Match

Match ratings are conservative and providers are not forced into every row. A "No direct" or "Conceptual" rating means the architecture changes when you move - read the key difference before assuming portability.

2. Generative AI Services Deep Dive

The main managed GenAI platforms compared neutrally - model access, RAG, agents, guardrails, private networking, enterprise controls, and best-fit use.

Last reviewed: July 2026 GenAI features/models change weekly - verify with current vendor docs.
TL;DR

Every major cloud has a managed GenAI platform: OCI Generative AI + Agents, Amazon Bedrock, Azure OpenAI / AI Foundry, Vertex AI / Gemini, plus IBM watsonx.ai, Databricks Mosaic AI, and Snowflake Cortex. They differ most in model catalog (Bedrock is multi-provider; Azure centers on OpenAI/GPT; Vertex on Gemini + open; OCI is curated) and in how you wire private data, agents, and guardrails. Choose by the models you need, region/quota, enterprise controls, and where your data already lives - not by benchmarks.

Platform comparison

ProviderGenAI platformModel accessAgentsRAGGuardrailsPrivate networkingBest-fit useKey gotcha
OCIGenerative AI + GenAI AgentsCurated (Cohere, Llama, etc.) via API; dedicated clustersGenAI AgentsAgent knowledge bases + DB Vector SearchGuardrailsPrivate endpointsOracle-data-centric RAG; enterprises on OCIVerify model catalog + region availability
AWSAmazon Bedrock (+ SageMaker JumpStart)Many providers (Anthropic, Meta, Cohere, Amazon, etc.)Bedrock AgentsBedrock Knowledge BasesBedrock GuardrailsPrivateLink / VPCModel choice + AWS-native governanceModel availability varies by region
AzureAzure OpenAI / AI FoundryOpenAI GPT family + catalogAI Agent Service (Foundry)Azure AI Search + OpenAIAI Content SafetyPrivate EndpointsOpenAI/GPT + Microsoft ecosystemGPT model/region/quota gating
GoogleVertex AI / GeminiGemini + Model Garden (open + partner)Vertex AI Agent BuilderVertex AI Search / RAG EngineSafety filters / Model ArmorPrivate Service ConnectGemini + data/BigQuery integrationFeature/region availability
IBM/Otherwatsonx.ai; Databricks Mosaic; Snowflake CortexGranite + open (IBM); open (Databricks); Cortex (Snowflake)watsonx Orchestratewatsonx Discovery; Cortex Search; Databrickswatsonx.governanceVariesGovernance focus (IBM); data-platform-native AIVerify enterprise coverage per platform

How enterprise data is connected (common to all)

Across every platform, connecting private data safely follows the same shape: ingest → chunk → embed → store vectors → retrieve (entitlement-filtered) → ground the model → audit, all behind a governed serving layer. What differs is the managed convenience (Knowledge Bases, Vertex AI Search, AI Search integration) and where vectors live. See sections 5-6.

AI governance note - the enterprise controls that matter
Evaluate each platform on: IAM/identity integration (does it use the cloud's native identity?), private networking (private endpoints), prompt + output logging and retention, guardrails/content safety on input and output, data-use terms (is your data used to train the base model? - generally no for these enterprise services, but verify), and region availability. These, not raw model quality, usually decide enterprise fit.
Cost note
GenAI cost is driven by input + output tokens, embeddings, optional dedicated capacity/provisioned throughput, and fine-tuning. On-demand is simple; provisioned/dedicated gives predictable throughput at a fixed cost. Retrieve the smallest sufficient context, cache common responses, and use smaller models where they suffice (section 14).
Verify before choosing
The differentiators (available models, regions, quotas, agent/guardrail features, data-handling terms) move fastest here. Do not select a GenAI platform on an announcement or leaderboard - confirm the specific models and terms you need are available in your region today, and that the governance controls meet your requirements.

3. Foundation Model Comparison

How foundation-model access differs across providers - first-party, third-party, and open models, fine-tuning, and private deployment. Model availability changes constantly, so this is a shape, not a live catalog.

Last reviewed: July 2026 Model availability/pricing/limits change frequently - VERIFY before relying on any specific model.
Read this first
Model availability, pricing, token limits, context windows, fine-tuning support, and data-handling terms change frequently and vary by region. Everything below is a general shape to help you reason - treat specific model names as "verify with current vendor documentation." Do not design around a model without confirming it is available (and supported for your use) in your region today.
TL;DR

Providers differ in how open their model access is: Bedrock offers the broadest multi-vendor catalog (Anthropic, Meta, Cohere, Amazon, etc.); Azure centers on OpenAI GPT models; Google on Gemini plus Model Garden (open + partner); OCI offers a curated set (Cohere, Llama, etc.); Hugging Face and Databricks/others give access to a large open ecosystem. Abstract the model behind your own serving layer so you can switch as the landscape shifts.

Model access, side by side (verify current state)

ProviderModel platformExample families (verify)First-partyThird-partyOpen-sourceFine-tuningPrivate deployNotes
OCIOCI Generative AICohere, Meta Llama (verify)CuratedYes (partner)SomeSomeDedicated AI clustersCurated catalog; verify current models
AWSAmazon BedrockAnthropic Claude, Meta Llama, Cohere, Amazon Titan/Nova, Mistral (verify)Titan/NovaBroad (many vendors)Yes (incl. via Bedrock/SageMaker)Yes (varies by model)VPC/PrivateLinkBroadest multi-vendor catalog
AzureAzure OpenAI + Foundry catalogOpenAI GPT family; catalog models (verify)Microsoft/PhiOpenAI + catalogVia catalogYes (select models)Private EndpointsOpenAI-centric; region/quota gated
GoogleVertex AI + Model GardenGemini; open + partner models (verify)Gemini/GemmaPartner + openModel Garden / openYes (select models)PSC / privateGemini + broad Garden
IBM / HFwatsonx.ai / Hugging FaceIBM Granite; large open ecosystem (verify)Granite (IBM)SomeExtensive (HF)YesVariesOpen-model breadth (HF); governance (IBM)
Architect note - avoid model lock-in
Because the model landscape shifts monthly, put an abstraction layer (your serving API) between your application and the model. Standardize your prompts, retrieval, and evaluation around a provider-neutral interface so you can swap models/providers with a config change. Pick the platform for its enterprise controls and data locality, and keep the specific model a replaceable component.
Common mistake
Designing an architecture around one specific model's context window, pricing, or a benchmark result - then discovering it is not available in your region, its price/limits changed, or a better/cheaper model shipped. Verify availability per region, and don't hard-couple to a single model.

4. AI Agents Comparison

Managed agent platforms compared - tool/function calling, knowledge grounding, memory, guardrails, human approval, and the enterprise risks that apply to all of them.

Last reviewed: July 2026 Agent features are evolving fast - verify capabilities with current vendor docs.
TL;DR

An agent is an LLM that can call tools/functions, retrieve knowledge, keep memory, and take multi-step actions - beyond a chatbot. Every cloud has one (OCI GenAI Agents, Bedrock Agents, Azure AI Agent Service, Vertex AI Agent Builder) plus enterprise-SaaS agents (IBM watsonx Orchestrate, Salesforce Agentforce, ServiceNow). They are evolving fast and share the same risks: over-permissioning, direct database access, and unsafe dynamic SQL. The non-negotiable rule: agents act through governed APIs with least privilege, human approval for consequential actions, and full audit - never raw production databases.

Chatbot vs workflow bot vs autonomous agent

TypeWhat it doesAutonomyRisk
ChatbotAnswers questions (optionally grounded via RAG)None - responds onlyLow (wrong answers)
Workflow botFollows defined steps, calls known toolsBounded - fixed flowMedium (calls real systems)
Autonomous agentPlans, chooses tools, takes multi-step actionsHigh - decides its own pathHigh (unpredictable actions)
Architect note
Match autonomy to risk. Most enterprise value today is in chatbots (RAG) and bounded workflow bots, which are far easier to govern. Reserve autonomous agents for low-risk, well-audited tasks with human approval gates. More autonomy = more governance, monitoring, and blast-radius control required.

Agent platforms, side by side

ProviderAgent serviceTool callingKnowledge / RAGWorkflow integrationHuman approvalGovernanceBest-fit useMain risk
OCIOCI Generative AI AgentsYesKnowledge bases + DB Vector SearchOCI services / APIsDesign-dependentIAM + auditGrounded assistants over Oracle dataVerify current tool/action coverage
AWSBedrock AgentsYes (action groups)Bedrock Knowledge BasesLambda / APIDesign-dependentIAM + Guardrails + CloudTrailTool-using assistants on AWSOver-permissioned action groups
AzureAI Agent Service (Foundry)YesAzure AI SearchLogic Apps / Functions / APIsDesign-dependentEntra + Content Safety + loggingMicrosoft-ecosystem agentsGrounding + identity scoping
GoogleVertex AI Agent Builder / AgentspaceYesVertex AI SearchCloud Functions / APIsDesign-dependentIAM + safety + auditSearch-grounded agents on GCPData-access scoping
Enterprise SaaSwatsonx Orchestrate; Salesforce Agentforce; ServiceNowYesProduct knowledge + connectorsNative to the SaaS platformOften built-inPlatform governanceAgents inside a SaaS (CRM/ITSM/HR)Scope to the SaaS; verify data flows

What to evaluate in an agent platform

  • Tool / function calling - how tools are defined, scoped, and permissioned (least privilege per tool).
  • Knowledge grounding - RAG quality + citations; entitlement-filtered retrieval.
  • Memory - session vs long-term; where it is stored and secured.
  • Guardrails + human approval - blocking unsafe actions; approval gates for consequential steps.
  • Enterprise identity - does the agent act as a scoped identity (not a shared super-user)?
  • Audit logging + monitoring - every tool call, retrieval, and action logged and reviewable.
Strong enterprise warning - applies to every agent platform
AI agents must not directly connect to production OLTP databases or run uncontrolled dynamic SQL. Instead, give agents access only through governed APIs, curated datasets, semantic layers, read-only reporting replicas, or controlled serving layers with least-privilege identities, validated/parameterized queries, human approval for writes or consequential actions, and full audit logging. Treat an agent as an untrusted actor with a scoped, monitored identity - not as a trusted service account with broad access.
AI governance note
Maintain an inventory of deployed agents (owner, purpose, tools, data access, approval gates), monitor their tool calls and actions, and require change review for new tools/permissions. An agent that gains a new tool has a new blast radius.

5. RAG and Knowledge Base Comparison

Retrieval-Augmented Generation - the model retrieves relevant enterprise data first, then answers grounded in it. Compared across providers, with architecture diagrams and the gotchas that decide answer quality.

Last reviewed: July 2026 RAG tooling evolves fast - verify managed features with current vendor docs.
TL;DR

RAG is the same pipeline everywhere: ingest → chunk → embed → store vectors → retrieve (entitlement-filtered) → rerank → ground the model → cite → audit. Managed options: Bedrock Knowledge Bases, Azure AI Search + OpenAI, Vertex AI Search / RAG Engine, OCI Agent knowledge bases + DB Vector Search, plus Databricks and Snowflake Cortex Search. The hard parts - chunking quality, retrieval relevance, security trimming, and index freshness - are the same on every platform and matter more than the model.

RAG options, side by side

ProviderManaged RAGVector store optionsRerankingCitationsAccess control
OCIGenAI Agents knowledge basesOracle DB 23ai AI Vector Search; OpenSearch; Object StorageDesign/model-dependentSupportedDB/IAM entitlements
AWSBedrock Knowledge BasesOpenSearch; Aurora pgvector; (others)Rerank modelsSupportedIAM + source ACLs
AzureAzure AI Search (+ OpenAI)AI Search vector; Cosmos DB; PostgreSQL pgvectorSemantic rankerSupportedSecurity trimming in index
GoogleVertex AI Search / RAG EngineVertex Vector Search; AlloyDB/Cloud SQL pgvector; BigQueryRanking APISupportedIAM + data-store ACLs
OtherDatabricks; Snowflake Cortex Search; watsonx DiscoveryPlatform-native vectorVariesVariesPlatform governance

RAG architectures

-- ingestion (offline) -- Docs (object storage/DB) Chunk + embed Vector store (+ metadata/ACL) -- query (runtime) -- User query Serving layer(authz + guardrails) Retrieve top-k(entitlement-filtered + rerank) LLM (grounded)+ citations Answer Audit: prompt + context IDs + output Human approval (if action)
Enterprise RAG: offline ingest/embed/store; runtime query → governed serving layer → entitlement-filtered + reranked retrieval → grounded, cited answer; audit + optional human approval. Same shape on every cloud.
RAG with database-backed vectors

Vectors live in the operational DB (Oracle 23ai, AlloyDB, pgvector). Retrieval inherits DB IAM, backups, and row-level security - simplest path to entitlement-filtered retrieval. Best when data already lives in that DB.

RAG with object storage + search index

Docs in object storage, indexed by a search service (AI Search, Vertex AI Search, OpenSearch/Kendra). Best for unstructured document corpora and hybrid (keyword+vector) search with connectors.

RAG gotchas (identical on every platform)

RAG gotchas
  • Bad chunking creates bad answers - chunk size/overlap and document structure dominate quality.
  • Retrieval quality matters more than model hype - a great model on poor retrieval still hallucinates.
  • Security trimming is hard - and must happen before retrieval, not by filtering the answer. Enforce entitlements at index/retrieval time.
  • Stale indexes create wrong answers - automate re-indexing; track data freshness.
  • Vector search cost grows with corpus size and query rate - size indexes and monitor.
  • RAG does not eliminate hallucination - it reduces it; still validate outputs and require citations.
  • Access control before retrieval, not only after - never rely on the model to withhold data it was given.
AI governance note
Log the retrieved context IDs alongside the prompt and answer so every response is traceable to its sources. Add a human-approval gate for any answer that triggers an action. Without this, you cannot explain or defend an answer to security or compliance.

6. Vector Search and AI Database Comparison

Where to store and search embeddings - in an operational database, a dedicated vector service, or a specialist vector DB - compared neutrally, with a decision guide by data location and scale.

Last reviewed: July 2026 Verify vector features (esp. Azure SQL / DynamoDB) with current vendor docs.
TL;DR

Two broad choices: vectors in an operational database (Oracle DB 23ai, pgvector on Aurora/AlloyDB/Cloud SQL/Azure PostgreSQL, Cosmos DB) - which inherit existing IAM, backups, and row-level security - or a dedicated vector/ANN service (Azure AI Search, Vertex Vector Search, OpenSearch, or specialist DBs like Pinecone/Weaviate/Milvus/Qdrant) for large-scale, low-latency semantic search. Keep vectors near the governed data when you can; go dedicated when scale/latency demands it.

Vector search options, side by side

OptionWhere vectors liveManaged?Hybrid searchMetadata filterSecurity modelBest-fit
Oracle DB 23ai AI Vector SearchIn Oracle DatabaseYes (managed DB)Yes (+ SQL)Yes (SQL WHERE)DB IAM + row/label securityVectors next to Oracle relational data
Aurora/RDS pgvectorIn PostgreSQLYes (managed DB)Via SQL/extensionsYes (SQL)DB IAMExisting Postgres on AWS
OpenSearch / KendraDedicated indexYesYesYesFine-grained + source ACLsAWS-native large-scale search
Azure AI SearchDedicated indexYesYes (+ semantic)YesSecurity trimmingAzure OpenAI RAG default
Cosmos DB / PG pgvectorIn the databaseYesPartialYesDB RBACVectors with operational data
Vertex Vector SearchDedicated ANNYesFilter-basedYesIAMVery large-scale, low-latency
AlloyDB / Cloud SQL pgvector; BigQueryIn DB / warehouseYesVia SQLYesDB/warehouse securityVectors with GCP data
Databricks / Snowflake Cortex SearchIn the data platformYesYesYesPlatform governanceVectors next to lakehouse/warehouse data
Pinecone / Weaviate / Milvus / QdrantDedicated vector DBManaged or selfVariesYesOwn modelCloud-neutral / specialist scale
DBA note - in-DB vs dedicated
Keeping vectors in the operational database (Oracle 23ai, pgvector, AlloyDB) means retrieval inherits your existing IAM, backups, DR, and row/label-level security - the simplest path to entitlement-filtered retrieval, and you combine vector distance with ordinary SQL filters. A dedicated ANN service (Vertex Vector Search, Pinecone) wins on very large scale and low latency but adds another data store to secure and keep in sync. Choose by scale, latency, and where the source data already lives.

Decision guide (neutral)

SituationNeutral guidance
Data is in Oracle DatabaseOracle DB 23ai AI Vector Search - vectors + relational data + governance in one place.
AWS-native RAGOpenSearch or Aurora pgvector with Bedrock Knowledge Bases.
Azure OpenAI patternAzure AI Search (hybrid + semantic) is the common default.
GCP data + AIVertex Vector Search for scale; AlloyDB/BigQuery vectors to stay near the data.
Large-scale semantic searchDedicated ANN (Vertex Vector Search, OpenSearch, or specialist DBs).
Existing PostgreSQL userspgvector (any cloud) - lowest friction; verify performance at scale.
Data-warehouse-integrated searchBigQuery vector search, Snowflake Cortex Search, or Databricks Vector Search.
Cloud-neutral / specialistPinecone, Weaviate, Milvus, or Qdrant - portable across clouds.
Common mistake
Standing up a separate vector database when the data already lives in a database that supports vectors - now you have two stores to secure, back up, and keep in sync, plus a harder entitlement-filtering problem. Start in-DB unless scale/latency clearly requires a dedicated service.

7. Machine Learning Platform Comparison

Full ML/MLOps platforms compared - training, deployment, pipelines, registry, monitoring, and governance - plus neutral guidance on when a managed platform is worth it.

Last reviewed: July 2026 Verify feature depth and pricing with current vendor docs.
TL;DR

The end-to-end ML platforms - SageMaker, Vertex AI, Azure ML, OCI Data Science, plus Databricks Mosaic AI and IBM watsonx.ai - cover notebooks, training, deployment, pipelines, registry, and monitoring. SageMaker and Vertex are generally the broadest; Databricks is a common cross-cloud choice. Use a managed platform when you need reproducible pipelines, governance, and scale; a plain notebook or in-database ML may be enough for smaller work.

ML platforms, side by side

ProviderPlatformBest strengthTrainingDeploymentMLOpsGovernanceBest-fit usersMain limitation
OCIData ScienceOracle-data integration; AI Quick ActionsJobsModel DeploymentPipelinesIAM + auditOracle-centric teamsSmaller ecosystem than AWS/GCP
AWSSageMakerBreadth + ecosystemManaged/distributedEndpoints (real-time/batch/serverless)Pipelines + Registry + MonitorClarify + IAMBroad ML teamsComplexity/choice overload
AzureAzure Machine LearningMicrosoft + Responsible AI toolingManaged/distributedManaged endpointsPipelines + RegistryResponsible AI dashboardMicrosoft-centric teamsLearning curve; naming
GoogleVertex AIData + AI integration; TPUsManaged/distributed (TPU)EndpointsPipelines + Registry + MonitoringExplainable AIData/AI-led teamsEnterprise familiarity varies
Databricks / IBMMosaic AI / watsonx.aiLakehouse-native (Databricks); governance (IBM)YesModel ServingMLflow / WorkflowsUnity Catalog / watsonx.governanceLakehouse or governance-led teamsVerify multi-cloud coverage

When to use what (neutral)

  • Use a managed ML platform when you need reproducible pipelines, a model registry with approvals, managed endpoints, monitoring, and governance at team/enterprise scale.
  • A simple notebook is enough for exploration, one-off analysis, or a single small model with light serving needs.
  • Kubernetes-based ML (Kubeflow, KServe) fits teams who want portability and already run Kubernetes - at the cost of more ops.
  • Data-warehouse ML (BigQuery ML, Redshift ML, Oracle in-DB, Snowflake) fits when the data lives in the warehouse and SQL-based ML is sufficient - minimal data movement.
  • Do not build ML infrastructure at all when a prebuilt AI service (Document AI, Language, Vision) or a foundation model already solves the task - most "ML projects" are now API calls.
Architect note - reduce lock-in
Use open formats and standards (MLflow for registry/tracking, ONNX for models where practical, OpenTelemetry for monitoring, Kubeflow/containers for portable pipelines) so the ML platform is a productivity layer, not a cage. Deep native pipelines are fine when you are committed to one cloud; keep the portable option open if multi-cloud matters.
Cost note
ML cost is dominated by training compute (GPU/TPU hours) and always-on inference endpoints - and inference often exceeds training cost over a model's life. Right-size and reserve accelerators, use batch or scale-to-zero endpoints where latency allows, and shut down idle notebooks and endpoints (a very common source of waste).

8. AI Infrastructure and Accelerators

GPUs, custom accelerators (TPU, Trainium/Inferentia), Kubernetes for AI, and the infrastructure gotchas - quota, region availability, and idle-GPU cost - that decide whether an AI project ships.

Last reviewed: July 2026 Verify accelerator SKUs, quota, and region availability with current vendor docs.
TL;DR

NVIDIA GPUs are available on every cloud (portable); the differentiators are custom silicon - TPU (Google) and Trainium/Inferentia (AWS) - and bare-metal GPU breadth (OCI). All offer managed training/inference and Kubernetes for AI. The infrastructure realities that actually block projects are the same everywhere: GPU quota, region availability, data-pipeline bottlenecks, and idle-GPU cost. Verify accelerator availability in your region early.

AI infrastructure, side by side

ProviderAccelerator optionsManaged trainingManaged inferenceKubernetes for AIBare metal GPUBest-fit workloadMain constraint
OCINVIDIA GPU shapes + bare metal; cluster networking (RDMA)Data ScienceModel DeploymentOKEBroadLarge training on bare-metal GPU clustersVerify GPU SKU by region
AWSNVIDIA (P/G) + Trainium + InferentiaSageMakerSageMaker EndpointsEKS.metal (narrower)Cost/perf at scale with custom siliconQuota + custom-silicon lock-in
AzureNVIDIA N-series (+ Maia)Azure MLAzure ML endpointsAKSSpecializedMicrosoft-ecosystem AI + big GPURegion/SKU availability
GoogleNVIDIA GPU + TPUVertex TrainingVertex EndpointsGKE(Bare Metal Solution)Large-scale training (TPU) + inferenceTPU ties the serving stack
OtherNVIDIA (CoreWeave, Lambda, etc.)DatabricksModel ServingKubernetesVariesGPU-focused / neutralVerify integration + support
Architect note - portability vs cost
NVIDIA GPUs are the portable choice - the same model-serving stack runs across clouds. Custom silicon (TPU, Trainium/Inferentia) can offer better price/performance but ties your serving/training stack to that provider. If multi-cloud or portability matters, standardize on NVIDIA + open frameworks; if you are committed to one cloud and cost-sensitive at scale, evaluate custom silicon.

AI infrastructure gotchas (universal)

Gotchas
  • GPU quota can block projects - default quotas are low; request increases early on every cloud.
  • Region availability matters - the accelerator you want may not exist in your region; verify before designing.
  • The data pipeline can bottleneck GPUs - slow data loading starves expensive accelerators; size storage/network for the training I/O.
  • Network and storage can dominate performance - for distributed training, interconnect (RDMA/InfiniBand) and storage throughput often matter more than raw GPU count.
  • Idle GPUs are expensive - the most common AI-infra waste; auto-stop, schedule, or use spot for interruptible work.
  • Inference cost can exceed training cost over time - an always-on endpoint runs 24x7; batch or scale-to-zero where latency allows.
Cost note
Reserve or commit accelerator capacity for steady training; use spot/preemptible for fault-tolerant training; use batch or scale-to-zero inference where possible; and monitor idle GPU hours ruthlessly. Managed foundation-model APIs often beat self-hosting a model on GPUs unless you have sustained high volume.

9. Document, Vision, Speech, and Language AI

Prebuilt AI services for documents, images, audio, and text - close equivalents across clouds, but with differences in custom-model support, human review, language coverage, and privacy constraints.

Last reviewed: July 2026 Verify language coverage, features, and pricing with current vendor docs.
TL;DR

These prebuilt AI services are largely close equivalents across OCI/AWS/Azure/GCP - the same tasks with different names, differing in custom-model support, human-in-the-loop review, language coverage, and privacy handling. Increasingly, general-purpose LLMs overlap with some of these (extraction, classification, summarization) - choose the prebuilt service for accuracy/cost on well-defined tasks, and an LLM when flexibility matters. Verify language and feature coverage for your specific use.

Document AI
Vision
Speech
Language

Document AI (OCR, forms, tables)

OCIAWSAzureGoogle
ServiceDocument UnderstandingTextractAI Document IntelligenceDocument AI
OCR / tables / formsYesYesYesYes
Custom modelsYesYes (custom queries/adapters)Yes (custom extraction)Yes (custom processors)
Human reviewDesign-dependentA2IDesign-dependentHuman-in-the-loop
Best-fitOracle-integrated doc pipelinesAWS doc pipelines, invoicesMicrosoft-ecosystem formsHigh-volume document extraction
Operations note
All return confidence scores - route low-confidence extractions to a human review queue rather than trusting them blindly. This human-in-the-loop step is what makes document AI production-ready.

Vision AI

OCIAWSAzureGoogle
ServiceOCI VisionRekognitionAzure AI VisionVision AI
Classification / detectionYesYesYesYes
Custom visionYesCustom LabelsCustom VisionAutoML Vision
Face / moderationLimitedYesYes (gated)Yes (gated)
Security note
Face recognition and biometric analysis carry privacy and legal constraints (consent, retention, jurisdiction) and are increasingly gated by vendors. Confirm the legal basis and vendor policy before using face/biometric features; prefer non-biometric approaches where possible.

Speech AI

OCIAWSAzureGoogle
Speech-to-textOCI SpeechTranscribeAzure AI SpeechSpeech-to-Text
Text-to-speech(Speech)PollyNeural TTSText-to-Speech
Real-time / batchBothBothBothBoth
DiarizationYesYesYesYes
Common use - contact center
Speech-to-text + diarization + summarization (LLM) is the standard contact-center pattern on every cloud (see section 12). Verify language/accent coverage and real-time latency for your use, and mind compliance for recording/retaining calls.

Language AI

OCIAWSAzureGoogle
ServiceOCI LanguageComprehendAzure AI LanguageNatural Language AI
Sentiment / entities / PIIYesYesYesYes
ClassificationYes (custom)Yes (custom)Yes (custom)Yes (AutoML)
TranslationYesTranslate (separate)Translator (separate)Cloud Translation (separate)
Cost note
For well-defined NLP tasks (sentiment, entity/PII extraction, classification), the prebuilt language services are usually cheaper and more predictable than calling an LLM - and often more accurate on narrow tasks. Reserve LLMs for open-ended language work; use prebuilt NLP for structured extraction at volume.

10. AI and Enterprise Search

AI-powered enterprise search - keyword, semantic, and hybrid - with connectors, security trimming, and RAG integration, compared across providers.

Last reviewed: July 2026 Verify connector coverage and security-trimming features with current vendor docs.
TL;DR

Enterprise search now blends keyword + vector (semantic) + hybrid ranking, with connectors to enterprise systems and (critically) security trimming so users only see what they are entitled to. Turnkey options: Kendra/OpenSearch, Azure AI Search, Vertex AI Search / Agentspace, plus OCI Search / AI Vector Search and Elastic/Glean. These are also the retrieval layer for RAG. The hard part - source-level access control preserved in the index - is the same everywhere.

AI search, side by side

ProviderSearch serviceKeywordVectorHybridConnectorsRAG integrationAccess controlBest useGotcha
OCIOCI Search (OpenSearch) / AI Vector SearchYesYesYesCustomVia GenAI AgentsDB/IAMOracle-data + open-source searchAssemble connectors
AWSAmazon Kendra / OpenSearchYesYesYesMany (Kendra)Bedrock KBToken-based ACLsTurnkey enterprise search (Kendra)Kendra cost at scale
AzureAzure AI SearchYesYesYes (+ semantic ranker)IndexersAzure OpenAISecurity trimmingAzure OpenAI RAG defaultDesign security trimming carefully
GoogleVertex AI Search / AgentspaceYesYesYesConnectorsNative RAGIAM + ACLsTurnkey grounded searchVerify connector coverage
OtherElastic; Glean; OpenSearch managedYesYesYesBroadVariesOwn modelCloud-neutral / SaaS-wide searchVerify governance model
Security note - the search-specific trap
The defining challenge of enterprise AI search is security trimming: results (and the context fed to an LLM) must reflect each user's permissions across every connected source. Enforce this in the index / at retrieval (document-level ACLs mirrored from the source), not by filtering the final answer - the model must never receive documents the user cannot see. Turnkey services (Kendra, Vertex AI Search) help, but you still map source ACLs correctly.
Architect note
Enterprise search and RAG retrieval are the same capability viewed two ways - one returns links, the other feeds an LLM. Build the governed, security-trimmed search/retrieval layer once and reuse it for both. Prefer hybrid (keyword + vector) ranking; pure vector search misses exact-match and rare-term queries.

11. AI Governance, Security, and Responsible AI

The enterprise controls that make AI safe to run on real data - privacy, private networking, access control, logging, guardrails, responsible AI, and a production governance checklist that applies across providers.

Last reviewed: July 2026 AI governance features are new and evolving - verify with current vendor docs.
TL;DR

AI governance is mostly your configuration, not a product. The controls are the same across clouds - private endpoints, native IAM, prompt/output logging, guardrails/content safety, CMK encryption, data-retention terms, human approval, and responsible-AI checks - implemented with each cloud's tools (Bedrock Guardrails, Azure AI Content Safety + Entra + Private Link, Google Model Armor + VPC-SC, OCI IAM/Vault, IBM watsonx.governance). Treat AI as a new attack surface (prompt injection, data exfiltration) and govern it deliberately from day one.

Governance controls, side by side

ControlOCIAWSAzureGoogle
Identity / accessIAMIAMEntra ID + RBACCloud IAM
Private networkingPrivate endpointsPrivateLink / VPCPrivate Link / Private EndpointsPrivate Service Connect
Guardrails / content safetyGuardrailsBedrock GuardrailsAI Content SafetyModel Armor / safety filters
Key managementVault (CMK)KMSKey VaultCloud KMS
SecretsVaultSecrets ManagerKey VaultSecret Manager
Prompt / output loggingAudit + LoggingCloudTrail + Bedrock logsActivity Log + Foundry logsAudit Logs + Vertex logs
Data-exfil perimeter(network + IAM)(SCP + endpoints)(Private Link + Policy)VPC Service Controls
Sensitive-data discoveryData SafeMaciePurviewSensitive Data Protection
Responsible AI tooling(guidance)SageMaker ClarifyResponsible AI dashboardExplainable AI
Model + use-case governance(policy + inventory)(SageMaker + Config)(Purview + RAI)(Registry + policy)
AI governance note - the new risks
Generative AI adds attack surface that traditional controls miss: prompt injection (untrusted content hijacking instructions), data leakage (the model revealing context it shouldn't), data exfiltration (an agent copying data out), and over-permissioned agents. Defend with content-safety on input and output, security-trimmed retrieval, least-privilege scoped identities, private networking, a data-exfiltration perimeter for sensitive data, and full prompt/output logging.

Production AI governance checklist (portable)

  • Approved use case - documented, with a business owner and a risk assessment.
  • Data classification - what data the AI touches, and its sensitivity level.
  • Model selection - approved model(s), with data-handling/retention terms verified.
  • Prompt logging policy - prompts + retrieved context IDs logged (per privacy rules).
  • Output review policy - outputs logged; human review for consequential answers.
  • Access control - least-privilege scoped identity; security-trimmed retrieval before generation.
  • Private networking - private endpoints for model, retrieval, and data services.
  • Data retention terms - confirmed the provider does not retain/train on your data (or terms accepted).
  • Human approval requirement - for any write/action or high-impact output.
  • Abuse / injection monitoring - content safety on input+output; prompt-injection detection.
  • Cost monitoring - token/GPU/vector spend tracked with budgets/alerts.
  • Incident response - a plan for a leaked prompt, bad answer, or compromised agent.
  • Legal / compliance review - completed for the use case and data.
  • Vendor documentation verified - model, region, features, and terms confirmed current.
Security note
Keep an inventory of AI use cases, models, and agents with owners, data access, and approval status - shadow AI (ungoverned experiments touching real data) is the fastest-growing risk. Encrypt with customer-managed keys, keep secrets in the vault, use private endpoints, and log everything you can defend.

12. Enterprise AI Architecture Patterns

Common enterprise AI patterns, each mapped to OCI/AWS/Azure/GCP - the pattern shape is portable; the services differ. Every pattern shares the same governed-serving-layer backbone.

Last reviewed: July 2026 Service choices are examples - verify current best practices per vendor.
The portable backbone
Almost every enterprise AI pattern has the same shape: user → governed serving layer (authN/authZ + guardrails + logging) → security-trimmed retrieval → model → validated, audited output, over private networking. What changes per cloud is the model service, the retrieval/vector store, and the applied-AI services. Below, each pattern lists that mapping plus the risks that apply everywhere.

Pattern catalog

PatternOCIAWSAzureGoogleKey risk
Chat with documentsGenAI Agents + Vector SearchBedrock KB + OpenSearchAzure OpenAI + AI SearchGemini + Vertex AI SearchChunking/freshness; security trimming
Chat with databaseSelect AI / DB 23aiBedrock + curated viewsOpenAI + curated viewsGemini + curated viewsNever raw prod OLTP; use serving layer
Natural language to SQLSelect AI (Autonomous DB)Bedrock + QuickSight QFabric/Copilot + OpenAIBigQuery + GeminiValidate/parameterize; read-only
AI assistant for IT opsOps Insights + GenAIDevOps Guru + BedrockAzure Monitor + CopilotActive Assist + GeminiHuman approval before actions
AI assistant for business usersGenAI Agents + curated dataBedrock + governed dataCopilot + governed dataAgentspace + governed dataAnswer only from curated data
Customer support chatbotDigital Assistant + GenAILex + BedrockBot Service + OpenAIDialogflow + GeminiGrounding + human escalation
Contact center transcription/summarySpeech + GenAIConnect + Contact Lens + BedrockSpeech + OpenAICCAI + GeminiRecording compliance; real-time
Invoice / document processingDocument UnderstandingTextract + BedrockDocument Intelligence + OpenAIDocument AI + GeminiHuman review of low-confidence
Enterprise knowledge searchOCI Search / Vector SearchKendra / OpenSearchAzure AI SearchVertex AI Search / AgentspaceSource-level security trimming
RAG over object storageObject Storage + Vector SearchS3 + Bedrock KBBlob + AI SearchCloud Storage + Vertex SearchIndex freshness + ACLs
RAG over databaseDB 23ai Vector SearchAurora pgvectorCosmos / PG pgvectorAlloyDB pgvectorRow-level entitlements
RAG over data warehouseADW + Select AIRedshift ML + BedrockFabric + OpenAIBigQuery vector + GeminiQuery cost + column security
AI over Oracle EBS / ERP dataRead-only reporting layer + GenAIExtract to lake + BedrockExtract + OpenAIExtract + GeminiNever live ERP; performance + governance
AI over CRM dataGoverned API + GenAIBedrock + governed APIOpenAI + Dataverse/APIGemini + governed APIPII handling; entitlements
AI code assistant(GenAI + code models)Amazon Q DeveloperGitHub Copilot / FoundryGemini Code AssistIP / secret leakage in prompts
MLOps train/deploy pipelineData Science pipelinesSageMaker PipelinesAzure ML pipelinesVertex PipelinesReproducibility; drift monitoring
Real-time recommendationsML + servingSageMaker + feature storeAzure ML + servingVertex + feature storeLatency; feature skew
Forecasting / anomaly detectionAnomaly Detection / MLSageMaker / LookoutAzure MLBigQuery ML / VertexVerify current managed service
AI governance & auditIAM + Audit + inventoryGuardrails + CloudTrailContent Safety + PurviewModel Armor + auditShadow AI; missing audit trail

Featured pattern: governed enterprise RAG

Chat with enterprise documents (governed RAG)
The most common enterprise GenAI pattern - same shape on every cloud
Business useEmployees ask questions and get grounded, cited answers from internal documents they are entitled to see.
Data flowIngest docs → chunk + embed → store vectors with ACL metadata → at query time: authN/authZ → entitlement-filtered retrieval + rerank → grounded generation → cited, audited answer.
IdentityUser authenticates to the serving layer (native IdP); the app carries the user's entitlements into retrieval.
SecurityPrivate endpoints for model + retrieval + storage; security-trimmed retrieval; content safety on input/output; secrets in the vault; CMK.
MonitoringLog prompts, retrieved context IDs, and outputs; track answer quality/groundedness and token cost.
Cost driversTokens (context size), embeddings, vector storage/queries, and model choice.
Best-fit provider conditionsFollow the data: Oracle data → OCI; AWS lake → AWS; Microsoft/M365 → Azure; BigQuery/GCP data → Google. Verify models/region.
Common mistakesRetrieval not entitlement-filtered; stale index; sending whole documents (cost); no citations/audit; connecting the model to raw production data.
Common mistakes across all AI patterns
  • Connecting the model/agent to raw production OLTP instead of a governed serving layer.
  • Retrieval not security-trimmed - data leaks across users.
  • Sending entire documents to the model (cost + context dilution) instead of retrieved chunks.
  • No citations, no audit trail - answers can't be explained or defended.
  • Stale indexes; poor chunking - wrong answers regardless of model.
  • Designing around one model that later changes availability/price.

13. AI Workload Decision Matrix

By AI workload, which providers are strong candidates and why - balanced wording. A "strong candidate" reflects natural advantages under common conditions, not a verdict; any provider can be valid depending on your data, ecosystem, and skills.

Last reviewed: July 2026 Fit depends on your specifics and current model/region availability - verify.
How to read this - follow the data
The single strongest signal for AI-platform fit is usually where your data already lives and which ecosystem you already operate. Example: Azure may be a strong fit when the enterprise is standardized on Microsoft 365, Entra ID, and Azure OpenAI. AWS may be strong when the org already uses S3, Bedrock, and SageMaker. GCP may be strong where BigQuery, Vertex AI, and analytics are central. OCI may be strong where Oracle Database, AI Vector Search, and Oracle enterprise workloads are central. Apply this balance throughout.
WorkloadStrong candidate(s)Why they fitServices to evaluateMain trade-off / risk
Enterprise RAGAll (follow the data)Every cloud has managed RAG; fit follows where docs/data liveBedrock KB / AI Search / Vertex Search / OCI AgentsRetrieval quality + security trimming, not model
Chatbot over internal docsAllStandard RAG pattern everywhereManaged RAG + vector storeChunking/freshness
Chat with relational databaseFollow the DBIn-DB AI where data lives (Oracle 23ai, BigQuery, AlloyDB)Select AI / pgvector / BigQuery MLNever raw prod OLTP; serving layer
Natural language to SQLOCI (Select AI), GCP (BigQuery+Gemini)Native NL-to-SQL where the DB/warehouse isSelect AI, BigQuery+Gemini, Databricks GenieUncontrolled dynamic SQL
AI over Oracle DatabaseOCI (others via extract)Oracle DB 23ai AI Vector Search + Select AI in the DBOCI GenAI + DB 23aiOthers need data extraction
AI over Microsoft ecosystemAzureM365/Entra/Copilot + Azure OpenAI integrationAzure OpenAI, AI Search, CopilotEcosystem lock-in
AI over Google data ecosystemGCPBigQuery ML + Vertex + Gemini integrationBigQuery ML, Vertex AIVerify enterprise familiarity
AI over AWS data lakeAWSS3 + Bedrock + SageMaker + native governanceBedrock, SageMaker, OpenSearchComplexity
Document extraction / OCR at scaleAll (close equivalents)Mature document AI on all fourTextract / Doc Intelligence / Document AI / OCI DUHuman review of low-confidence
Contact center AIAWS (Connect), GCP (CCAI)Turnkey contact-center platformsConnect+Contact Lens, CCAIRecording compliance
Code assistantAll (verify)Amazon Q, GitHub Copilot, Gemini Code AssistQ Developer, Copilot, GeminiIP/secret leakage in prompts
ML model training / deploymentAWS, GCP (all valid)SageMaker + Vertex breadth; Databricks cross-cloudSageMaker, Vertex, Azure ML, DatabricksGPU quota; lock-in
Time-series forecasting / anomalyAll (verify managed service)ML platforms + BigQuery ML; some standalone services deprecatedBigQuery ML, SageMaker, OCI AnomalyVerify current managed path
Image / video analysisAll (Video: AWS/GCP stronger)Vision on all; video coverage differsRekognition, Vision AI, AI Vision, OCI VisionPrivacy for face/biometric
Speech transcriptionAll (close equivalents)Mature speech on all fourTranscribe, Speech, Speech-to-Text, OCI SpeechAccent/domain accuracy
Semantic / enterprise searchAllKendra/Vertex Search turnkey; AI Search; OCIKendra, Vertex Search, AI SearchSecurity trimming across sources
Real-time AI inferenceAllManaged endpoints everywhere; custom silicon differsManaged endpoints; GPU/TPU/InferentiaIdle-endpoint + latency cost
Regulated AI workloadAll (verify)Private endpoints + governance on all; compliance variesPrivate networking + guardrails + auditVerify certifications/data terms
On-prem / hybrid AIVaries (verify)watsonx (portable), Azure Arc, some on-prem model optionswatsonx, Arc, OSS models on-premVerify on-prem model support
Multicloud AI platformNeutral platformsDatabricks, Snowflake Cortex, Hugging Face, OSS run across cloudsDatabricks, Snowflake, HF, OTelTrade deep native features for portability
Architect note
Decide by: where the data lives, the ecosystem you operate, required models + region availability, governance/compliance needs, cost model, and team skills. The right answer is often "the AI platform closest to the data," and sometimes a neutral platform (Databricks, Snowflake, OSS) when portability matters more than deep native integration.

14. Cost Comparison for AI Services

The cost drivers that dominate AI spend, how they map across providers, and neutral optimization guidance. No provider is cheapest overall - it depends on the workload.

Last reviewed: July 2026 AI pricing changes constantly - verify all rates on vendor pricing pages.
TL;DR

AI cost is driven by tokens (input + output), embeddings, fine-tuning, model hosting / endpoint uptime, GPU hours, vector storage + queries, and applied-AI usage (OCR pages, speech minutes) - plus the usual data transfer, logging, and private-networking costs. The biggest surprises are context size (tokens), idle endpoints/GPUs, and vector-search at scale. There is no cheapest AI cloud; the answer depends on volume, model choice, and architecture. Optimize the architecture (retrieval quality, context size, caching) before shopping list prices.

AI cost drivers, side by side

AI workloadMain cost driverCost consideration (all clouds)Cost controlGotcha
LLM chat / RAGInput + output tokensContext size dominates; output tokens often priced higherRetrieve minimal context; smaller models; cacheSending whole documents blows up token cost
EmbeddingsTokens embeddedOne-time (ingest) + per-queryBatch + cache; re-embed only changed contentRe-embedding everything on each run
Fine-tuningTraining tokens/hoursUpfront cost; may not beat good RAGTry RAG/prompting firstFine-tuning when RAG would suffice
Model hosting / endpointsEndpoint uptime24x7 endpoints cost even when idleScale-to-zero / batch / serverlessIdle endpoints are silent spend
Provisioned throughputReserved capacityPredictable cost + throughput vs on-demandMatch to sustained volumeOver-provisioning for spiky load
GPU / trainingAccelerator hoursDominant for training; custom silicon may cut costSpot + reservations; right-sizeIdle GPUs; data-pipeline starvation
Vector searchStorage + queriesGrows with corpus + query rateRight-size indexes; filter earlyUnbounded index growth
Document AI / OCRPages processedPer-page pricingPre-filter; process only needed pagesReprocessing whole archives
SpeechAudio minutesPer-minute; real-time may cost moreBatch where latency allowsTranscribing everything
Logging / monitoringIngestion volumePrompt/output logging adds upSample; set retentionLogging full payloads unbounded

AI cost optimization (portable)

Cost note - optimize architecture first
  • Use smaller models where they suffice - many tasks don't need the largest model.
  • Cache common responses and embeddings; don't recompute.
  • Reduce context size - retrieve the smallest sufficient chunks; don't send whole documents.
  • Improve retrieval quality - better retrieval means fewer tokens and better answers.
  • Prefer prebuilt AI services (Document AI, NLP) over an LLM for well-defined tasks - cheaper and more predictable.
  • Batch where real-time isn't required; shut down idle endpoints/GPUs/notebooks.
  • Right-size vector indexes and control logging volume.
  • Track cost per user, per document, per workflow, and per business process - not just per service - so you can see which use cases are worth it.
Do not claim a cheapest AI provider
The cheapest option depends on your model choice, token volumes, whether you self-host or use managed APIs, GPU needs, and architecture efficiency. A well-designed RAG app on a mid-size model can cost a fraction of a poorly-designed one on a frontier model - on the same cloud. Fix the architecture, then compare post-discount prices for your actual usage.

15. AI Risk and Architecture Warnings

The specific ways enterprise AI goes wrong - what can happen, which patterns are affected, how to reduce it, what to monitor, and whether it is production-ready. These risks apply across all providers.

Last reviewed: July 2026 Treat AI as a new attack surface - verify controls with current vendor docs.
TL;DR

Enterprise AI risk is dominated by a handful of failure modes: hallucination, prompt injection, data leakage, over-permissioned agents, direct production-database access, uncontrolled dynamic SQL, poor/stale retrieval, and missing auditability. None are provider-specific - they are architecture and governance problems. The mitigations are consistent: governed serving layer, least privilege, security-trimmed retrieval, human approval for actions, content safety, and full logging. Build these in before going to production.

RiskWhat can go wrongAffected patternsHow to reduceMonitorProduction-ready?
HallucinationConfident but wrong answersAll GenAIRAG grounding + citations; validate output; human review for decisionsGroundedness; user feedbackYes, with validation + citations
Prompt injectionUntrusted content hijacks instructionsRAG, agents, doc processingContent safety / prompt shields; isolate + sanitize retrieved/user content; least privilegeInjection attempts; anomaliesRequires strong governance
Data leakageModel reveals context it shouldn'tRAG, agentsSecurity-trimmed retrieval before generation; output filtering; DLPAccess anomalies; output scansRequires strong governance
Over-permissioned agentsAgent does more than intendedAgentsLeast-privilege scoped identity; per-tool permissions; approval gatesTool calls; actionsRequires strong governance
Direct production DB accessAgent/LLM queries live OLTPChat-with-DB, NL-to-SQL, agentsNever direct; use governed API / curated views / read replicaDB access sourceNot recommended without review
Uncontrolled dynamic SQLFree-form SQL against productionNL-to-SQLValidated, parameterized, read-only SQL on curated schema onlyQueries executedNot recommended without review
Poor retrieval qualityIrrelevant context → wrong answersRAG, searchBetter chunking; hybrid + rerank; evaluate retrievalRetrieval relevance metricsYes, with evaluation
Stale dataAnswers from outdated indexRAGAutomate re-indexing; track freshnessIndex ageYes, with refresh
Lack of auditabilityCan't explain/defend an answerAllLog prompts + context IDs + outputsLog completenessRequires strong governance
No human approvalAI acts without oversightAgents, ops AIApproval gates for writes/actionsActions takenRequires strong governance
Inconsistent answersSame question, different answersAll GenAILower temperature; deterministic checks; cachingAnswer varianceGood for experimentation
Model version changesBehavior shifts on model updateAllPin/version models; re-evaluate on change; abstraction layerModel version; eval scoresYes, with eval on change
Region availability changesModel/service unavailable in regionAllVerify + have fallback; abstraction layerAvailabilityYes, with fallback plan
Vendor lock-inHard to switch platform/modelAllAbstraction layer; open formats/standardsCouplingManageable with design
Hidden cost growthToken/GPU/vector spend creeps upAllBudgets/alerts; per-workflow cost tracking; optimizationCost per user/workflowYes, with monitoring
Compliance gapsData/PII handled improperlyAllData classification; retention terms; legal reviewData flowsRequires review
Shadow AIUngoverned AI touching real dataOrg-wideAI use-case inventory + approval; guardrails by policyNew AI usageRequires strong governance
Weak monitoringProblems unseen until users complainAllQuality + safety + cost monitoring from day oneQuality/safety/costYes, with observability
The two non-negotiables
(1) No AI agent or model gets direct access to a production OLTP database or runs uncontrolled dynamic SQL - always through a governed API / curated data / read-only serving layer with validated queries. (2) Access control happens before retrieval, and everything is logged - the model must never receive data the user isn't entitled to, and every answer must be traceable to its sources. These two hold on every provider.

16. Troubleshooting AI Workloads

Runbooks for the failures AI workloads actually hit - symptoms, likely causes, cloud-specific checks, fixes, and prevention. The method is portable; the tools differ by provider.

Last reviewed: July 2026 Verify service-specific checks with current vendor docs.
Portable method
AI failures split into three buckets: infrastructure/access (quota, region, IAM, private endpoint), retrieval/data (chunking, freshness, relevance, ingestion), and quality/safety (hallucination, guardrails, injection). Diagnose in that order. Check the provider's model/endpoint logs, quotas, and region availability first - many "the AI is broken" tickets are a disabled model, an exhausted quota, or a region mismatch.

⚑ GenAI model not responding / endpoint timeout / high latency

Causes: model not enabled/available in the region; quota/throughput exceeded; endpoint cold-start or under-provisioned; oversized context; network/private-endpoint issue. Checks: model availability in your region; quota/throughput limits; endpoint metrics/logs (Bedrock/Azure OpenAI/Vertex/OCI); token count of the request. Fix: enable the model / request region access; raise quota or use provisioned throughput; reduce context; scale/warm the endpoint. Prevention: verify region + quota early; add retries with backoff; cap context size.

⚑ Token limit exceeded

Causes: context (retrieved chunks + history + prompt) exceeds the model's context window. Fix: retrieve fewer/smaller chunks; truncate history; summarize; use a larger-context model if justified. Prevention: budget tokens; measure context size; rerank to fewer, higher-quality chunks.

⚑ RAG answers are poor / vector search returns irrelevant results

Causes: bad chunking; wrong embedding model; no reranking; pure-vector missing keyword matches; stale index; retrieving too few/many chunks. Checks: inspect retrieved chunks for the query; evaluate retrieval relevance separately from generation; index freshness. Fix: improve chunking; add hybrid (keyword+vector) + reranking; refresh the index; tune top-k. Prevention: evaluate retrieval as its own metric; automate re-indexing. (Retrieval quality usually matters more than the model.)

⚑ Document ingestion / embedding job failed

Causes: unsupported file type/size; permission to read source; embedding model quota; malformed content; timeout. Checks: ingestion/pipeline logs; source access (IAM); quota. Fix: convert/split files; grant read access; batch and retry; raise quota. Prevention: validate inputs; batch large corpora; monitor the pipeline.

⚑ Agent calls wrong tool / runs unsafe query

Causes: ambiguous tool descriptions; over-broad tool permissions; no human approval; direct DB access. Checks: agent trace (which tool, which input); the tool's identity/permissions. Fix: sharpen tool descriptions; scope each tool with least privilege; add approval gates; route DB access through a governed read-only API. Prevention: never give agents raw DB access; require approval for writes; test tool selection.

⚑ Guardrail blocks valid response / prompt injection suspected

Causes: over-broad content-safety rule (false positive); or a genuine injection attempt in retrieved/user content. Checks: guardrail/content-safety logs; the blocked content. Fix: tune the rule / add an exception (false positive); or confirm and block the injection (isolate/sanitize retrieved content). Prevention: run content safety in report mode first; sanitize untrusted content; monitor injection patterns.

⚑ Private endpoint / IAM / quota / region issues

Private endpoint: DNS not resolving to the private endpoint; missing route/firewall; verify per cloud (PrivateLink / Private Endpoint+Private DNS / PSC / OCI private endpoint). IAM denied: the workload/agent identity lacks the model/data role; check native IAM (OCI IAM / AWS IAM / Entra RBAC / Google IAM). Quota exceeded: request an increase; use provisioned throughput. Region: the model/service isn't in your region - request access or choose a supported region. Cost spike: check token/GPU/endpoint/vector usage in cost tools; find idle endpoints or oversized context.

⚑ Logs missing / hallucination reported by business users

Logs missing: prompt/output logging not enabled (often off by default for privacy/cost); wrong log destination; retention expired - enable Bedrock/Azure OpenAI/Vertex/OCI logging and route to a central store. Hallucination reported: check whether the answer was grounded (retrieved context) and cited; improve retrieval; add citations + a "not found" path; require human review for high-stakes answers; log the incident for evaluation.

17. Learning Paths for AI Across Clouds

Learn AI services fastest by building on what you know. For each persona: what transfers, what does not, where to start, hands-on labs, common mistakes, and the outcome.

Last reviewed: July 2026 Pair with the single-cloud deep-dive portals and current vendor training.
The universal AI transfer map
Transfers well: RAG architecture, embeddings, chunking, the governed-serving-layer pattern, prompt engineering, vector-search concepts, and MLOps fundamentals - these are provider-neutral. Transfers poorly: each cloud's model catalog + region availability, the managed RAG/agent tooling, the native vector store, and the governance implementation. Learn the portable model once, then learn each cloud's differences.
Cloud architect (any)
DBA / Data engineer
Security / DevOps
AI eng / Enterprise / Business

Cloud architect learning AI across clouds

(OCI, AWS, Azure, or GCP architect learning the others' AI stacks.)

  • Already know: the cloud foundation, IAM, networking, and where data lives.
  • Transfers: RAG/agent architecture, private-endpoint patterns, governance principles - identical shapes.
  • Doesn't transfer: the specific GenAI platform, model catalog + region availability, managed RAG/agent tooling, and vector store.
  • Start with: the Matrix (1) → GenAI (2) → RAG (5) → Vector (6) → Governance (11). Use the matrix as your translation table.
Hands-on lab

Build the same governed RAG app (serving layer + vector store + model + audit) on a second cloud. You will hit exactly the differences: model access, managed-RAG tooling, and vector store.

Mistakes to avoid: assuming direct model equivalence; designing around one model's limits; skipping security-trimmed retrieval. Outcome: you can design a governed AI architecture on any of the four and know what to verify.

DBA / Data engineer learning GenAI and AI platforms

DBA - already know

Databases, SQL, access control, backup/DR - which directly enable in-database vector search and NL-to-SQL governance.

Data engineer - already know

Pipelines, SQL, Spark, data governance - which map to embeddings pipelines, retrieval, and warehouse-integrated AI.

Transfers

Vector search as a DB feature (Oracle 23ai, pgvector, BigQuery); entitlement-filtered retrieval as row/label security; in-DB/in-warehouse ML.

Doesn't transfer cleanly

Chunking/embedding quality's effect on answers; the governed-serving-layer requirement; agent/NL-to-SQL safety; model catalogs.

DBA note - your skills are an advantage
Your instincts about access control, query safety, and data governance are exactly what enterprise AI needs. The key new ideas: vectors live next to your data (use in-DB vector search to inherit your security), NL-to-SQL must be read-only/validated/parameterized on curated schemas, and agents never get raw DB access. Start with Vector Search (6) + AI for databases in the Matrix (1).

Hands-on labs: build a RAG app with pgvector or Oracle 23ai over data you control, with entitlement-filtered retrieval; build a governed NL-to-SQL over a curated read-only schema. Outcome: you can add AI to a database safely.

Security engineer / DevOps engineer learning AI

Security - transfers

Least privilege, private networking, key/secret management, audit logging, data-exfiltration control - all apply directly to AI.

Security - new

Prompt injection, data leakage via models, agent permissioning, content safety, and AI-specific audit (prompts/outputs).

DevOps - transfers

CI/CD, containers, IaC, and observability (OpenTelemetry) map to MLOps pipelines and model deployment.

DevOps - new

Model registry + versioning, drift/quality monitoring, prompt/eval management, and GPU/endpoint cost control.

Security note
Treat AI as a new attack surface. Your job: enforce security-trimmed retrieval, scope agent identities, put content safety on input+output, keep AI traffic on private endpoints, and log prompts+outputs. Start with Governance (11) + Risk (15).

Hands-on labs: Security - configure guardrails + private endpoints + prompt logging for a GenAI app; test prompt injection. DevOps - build an MLOps pipeline with a model registry and drift monitoring. Outcome: you can secure and operate AI in production.

AI engineer, enterprise architect, or business analyst

AI engineer

Know: models, prompting, embeddings. Learn: enterprise governance, security-trimmed retrieval, cost control, and each cloud's managed tooling. Start: RAG (5), Agents (4), Governance (11).

Enterprise architect

Learn: the workload decision matrix (13), governance (11), cost drivers (14), and risk (15) - to choose platforms by workload, not hype. Start: Matrix (1) + Workloads (13).

Business analyst

Learn: what each pattern (12) can realistically do, the risks (15), and where value is real vs hype. Start: Home, Patterns (12), Risk (15).

Common to all

Follow the data, insist on governance and auditability, validate outputs, and verify model/region/cost before committing.

Mistakes to avoid
Choosing a platform on a benchmark; skipping governance ("we'll add it later"); connecting AI to production data without a serving layer; ignoring cost until the bill arrives; assuming a demo equals production-readiness.

Outcome: you can evaluate, choose, govern, and cost an enterprise AI use case across providers - and tell real value from hype.

Final note
AI fluency across clouds is learning the portable model - RAG, agents, vector search, governance, and the cost/risk levers - once, then learning each provider's differences and current capabilities. Use the Matrix (section 1) as your translation table and verify everything fast-moving before you build.