Not All AI Is the Same - Governing Data, Models, and Agents

Not All AI Is the Same
Governing Data, Models, and Agents

Part 3 of a four-part series on AI governance — why traditional model risk management buckled, why data governance is the prerequisite most banks have not built, and how to govern the different categories of AI now sitting inside every banking institution

By: Paul Schaus

May 26, 2026

Part 1 of this series made the strategic case for AI governance as a 2026 board obligation. Part 2 argued that one framework will not fit every bank and laid out the community-versus-regional split. This article goes underneath the framework. Before a bank can operate the governance program, it must understand what it is actually governing — and "AI" is no longer a single thing.

A gradient boosted credit scoring model running in production today, a transformer-based large language model summarizing customer correspondence, and an autonomous multi-agent system triaging compliance alerts are all "AI" in the press. There are three different governance problems. A program that treats them identically will under-govern the riskiest and over-govern the safest. And every one of them sits on top of a data layer the bank has to control before any of the rest of the program can work.

Data Governance Is the Foundation — and Most Banks Have Not Built It

Before a bank can have effective AI governance, it has to have effective data governance. That is not a sequencing suggestion. It is a structural prerequisite. AI consumes data — for training, for retrieval, for inference, for output validation. Bad data does not produce bad AI. It produces bad AI at scale, with the speed and confidence of a production system. Every shortcut a bank takes in data governance shows up later in its AI outputs.

The CSI 2026 survey captures the scale of the gap directly: only 11 percent of community banking leaders rate their institution's data strategy as highly effective. That number is the most underappreciated finding in the survey, because it explains why AI governance feels so hard at most institutions — the foundation is not there yet. An institution that does not know where its customer data lives, how it is classified, who can access it, how long it is retained, or whether it is accurate cannot govern an AI system that consumes that data. The AI program inherits every data weakness underneath it and amplifies the consequences.

Six elements of data governance translate directly into AI governance prerequisites. Data inventory and classification: the bank cannot answer "what AI uses customer PII?" without first answering "where does customer PII live?" Data lineage: when an AI model produces an output — an adverse-action decision, a fraud alert, a personalized offer — the bank has to trace which data fed which decision. Without lineage, there is no fair-lending defense, no model validation, no incident root-cause analysis. Data quality: statistical models tolerate noisy data within bounds; foundation models magnify it. Data retention and minimization: a model trained on five years of customer transcripts is a five-year retention obligation whether the bank wrote it down or not. Access controls: who can query the AI system, who can see the outputs, who can fine-tune the model. Consent and purpose limitation: customer data collected for one purpose cannot be silently repurposed for AI training without disclosure.

For institutions that need a reference framework, the EDM Council's Data Capability Assessment Model (DCAM) and the DAMA Data Management Body of Knowledge (DAMA-DMBOK) are the standard data governance references. For banks above $50 billion in assets, BCBS 239 sets the principle-based standard the largest institutions are already operating under. The point is not which framework the bank adopts. The point is that the bank cannot operate AI governance on top of a data layer that has no governance.

Five Categories of Model and Why the Distinction Matters

There are, broadly, five categories of AI models worth distinguishing at the board level. Each carries a different risk profile, and each demands a different governance posture.

Traditional machine learning — logistic regression, decision trees, gradient-boosted models, random forests — remain the workhorse of bank credit, fraud, and pricing. These models are statistical, deterministic given fixed inputs, and have decades of validation methodology behind them. SR 11-7 was written for this category, and the OCC Bulletin 2025-26 model risk clarification still applies cleanly. Traditional ML is the easiest category to govern. It is also the category where banks already have governance, even if they sometimes need to refresh it.

Deep learning and neural networks sit between traditional ML and foundation models. Image-based check fraud detection, voice-pattern authentication, and certain credit and AML pattern-recognition models fall here. They retain the input-output structure of traditional ML but introduce opacity that explainability tools like SHAP and LIME can only partially resolve. Governance has to add explainability standards and adverse-action narrative requirements that statistical models did not need.

Foundation models and large language models — such as GPT (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta), and the growing open-source ecosystem — are the category that broke traditional MRM. They are non-deterministic. The same prompt can produce different outputs. They hallucinate. They were trained on data sets the bank cannot fully inspect. And they are typically accessed as a service from a third-party provider, which makes them simultaneously a model risk problem, a third-party risk problem, and a data privacy problem. Every LLM-powered customer chatbot, every generative AI document-drafting tool, every internal copilot is in this category. The Treasury FS AI RMF was written largely for this category, which is why it does not look like SR 11-7.

Small language models are emerging as the practical answer for many bank use cases. Trained or fine-tuned on smaller, domain-specific data, they offer most of the language capability of foundation models with materially lower cost, lower latency, better explainability, and the ability to run inside the bank's own infrastructure rather than at a third-party endpoint. For community banks especially, the SLM trajectory is worth tracking — and it is where the cost-to-control ratio is moving in the bank's favor.

Multimodal models combine text, image, voice, and sometimes structured data in a single system. The deepfake-enabled BEC threat is multimodal. So is the next generation of customer service tooling. Governance has to anticipate this convergence rather than govern each modality separately.

Underneath all of this is a concentration question most banks have not asked yet. If the bank's fraud platform uses Vendor A's LLM, its marketing engine uses Vendor B's LLM, and its contact center uses Vendor C's LLM — and all three vendors happen to be wrapping the same underlying foundation model from a single AI provider — the bank has a third-party concentration risk that does not show up on any current vendor risk register. Foundation model concentration is the AI equivalent of the cloud-provider concentration risk regulators began raising in 2023. The board should be asking which foundation models sit underneath the bank's AI stack.

Agents Are a Different Conversation Entirely

An AI agent is not a chatbot. A chatbot produces text in response to a prompt. An agent produces actions. It takes a goal, decomposes it into steps, calls tools, queries systems, and executes — sometimes in a loop, sometimes with hand-offs to other agents, sometimes with no human in between the goal and the result.

This is the category the interagency MRM clarification referenced when it confirmed that traditional model risk management does not cleanly apply to "agentic" AI. It is also where the largest near-term productivity gains live. The Joshi framework submitted to OCC Docket OCC-2025-0669 in February 2026 demonstrates multi-agent systems built on frameworks like LangGraph, CrewAI, AutoGen, and LlamaIndex automating CRA data collection, needs assessment, public comment analysis, and goal tracking — projecting 30 to 45 percent cost reduction and 50 to 70 percent efficiency gains. Vendors across the FCC technology landscape are building agentic capabilities into their platforms, and core providers are following.

The governance challenge with agents is not the technology. It is the authority. The framing for the board is the one we used with junior employees a generation ago: an autonomous agent should never be allowed to take an action that a junior employee would not be allowed to take unsupervised. That principle generates the control set.

Four authority tiers should each carry distinct controls:

Read-only: the agent queries but does not act; lowest risk, lightest controls.
Recommend: the agent produces an output a human reviews and acts on; controls focus on output quality and review cadence.
Act-with-approval: the agent acts only after a human approves the specific action; common in compliance triage, credit, and operations.
Autonomous: the agent acts without per-action review; reserved for low-risk, high-volume tasks where the cumulative authority is bounded and the audit trail is complete. No agent should have autonomous authority over a regulated decision, credit, fair lending, BSA/AML, consumer disclosure, deposit pricing affecting customers without an explicit, documented exception approved by the board's Risk Committee.

Two other controls matter as much as the authority tier.

The audit trail has to reconstruct the full chain of reasoning, what did the agent do, why did it do it, what data did it use, who supervised it, when. Multi-agent systems compound the requirement because the chain crosses agents.
The kill switch is the second non-negotiable: every agentic system has to have an off switch that a defined individual is authorized to pull, at any time, without procedural delay. The CIO, the CISO, the CRO, and the Chief Compliance Officer should each have the authority to pull it independently. It should be tested annually, the way a business continuity plan is tested.

The Bottom Line

Data is the floor. Models are the bank's exposure. Agents are the new authority question. A governance program that does not differentiate between them — and that treats a logistic regression credit model and an autonomous compliance agent under the same control set — will be both under-protective and over-burdensome at the same time.

The institutions that map their governance to the actual category of AI they are operating, and that build the data foundation underneath it before they scale, will turn AI into a defensible, examiner-ready capability. The institutions that govern by generic policy will find at the exam that the policy did not match the system.

Part 4 of this series, next week, closes the loop: what actually goes in the AI governance policy, the four control areas where most banks have the largest unaddressed gaps, the training program the workforce and the board both need, and the eight-metric scorecard the Risk Committee should be seeing every quarter.

CCG Catalyst advises community and regional banks, credit unions, and fintech companies on AI strategy, governance design, and regulatory readiness. If your institution is evaluating its AI governance framework, reach out to our team at www.ccgcatalyst.com.

See our latest announcement: CCG Catalyst's Paul Schaus Named a 2026 Top Consultant by Consulting Magazine

By: Paul Schaus | Founder & Managing Partner, CCG Catalyst Consulting

Disclaimer: The views expressed in this article represent the perspective of CCG Catalyst Consulting based on our direct experience advising financial institutions. This commentary is intended to stimulate industry discussion and does not constitute legal, accounting, or regulatory advice.