Why your AI pilot shouldn't leave the EU -- and what "EU" actually means

Legal landscape as of mid 2026. Key precedent referenced: Latombe v European Commission (Case T-553/23, EU General Court, 3 September 2025). On-the-record vendor statement: Microsoft France testimony before the French Senate Commission of Inquiry on digital sovereignty, 10 June 2025. The law in this area moves -- check current status before making procurement decisions.

A procurement officer at a Danish organisation signs a contract for "Azure OpenAI, EU region." The box marked data stays in Europe is ticked. Legal nods. The pilot ships.

Six months later, a journalist asks where the data actually lives, who can be compelled to hand it over, and under whose laws. The answers turn out to be: Frankfurt, Microsoft Corporation, and the United States -- in that order, and the last one matters most.

This confusion isn't anyone's fault. The vendors have spent years collapsing three different ideas -- where data sits, whose laws apply to it, and who can technically reach it -- into a single marketing phrase. "EU region" sounds like sovereignty. It isn't.

This post is about what the words actually mean, why the distinction matters whether you're a government body, a bank, or a law firm, and what your real options look like in 2026.

How to read this post: if you're in procurement or leadership, the main sections are written for you in plain language. If you're an architect or CTO, the indented blockquotes throughout the post give you the technical detail behind each claim. Skim past them or read them -- the argument lands either way.

The three things people mean by "EU data"

RFPs routinely conflate three separate things into one bullet point. They're not the same, and the gap between them is where the real risk sits.

Data residency is where the bytes physically sit. If your provider operates a data centre in Frankfurt and your data is stored there, you have residency in the EU. This is the easiest box to tick and the one vendors lead with.

Data sovereignty is whose laws apply to those bytes. A US-headquartered company storing your data in Frankfurt is still a US legal entity. The data is under the jurisdiction of whichever government can compel the company to hand it over. Residency in Frankfurt does not change this.

Operational sovereignty is who can technically access the data, under what controls, with what audit trail. This includes the vendor's own engineers, their support staff, their subcontractors, and -- in some cases -- the foreign governments those people might be compelled to cooperate with.

Most "EU region" offerings from US providers give you the first, partially the second, and rarely the third. The marketing copy treats all three as interchangeable. Your legal team almost certainly knows they're not, even if procurement doesn't.

For the architect: operational sovereignty is the hardest of the three to verify because it depends on internal vendor controls you can't directly audit. Even "sovereign cloud" SKUs with local operators usually run on a control plane and supply chain operated by the foreign parent. Microsoft Cloud for Sovereignty, for example, layers governance on top of an Azure substrate still operated by Microsoft Corporation. Better than nothing -- but not the same as a provider whose entire corporate structure sits inside the jurisdiction.

Why "EU region" from a US provider isn't EU jurisdiction

The mechanism is straightforward, and it stopped being a matter of analyst speculation on 10 June 2025.

On that date, before the French Senate Commission of Inquiry on digital sovereignty, Anton Carniaux, Director of Public and Legal Affairs at Microsoft France, was asked under oath whether Microsoft could guarantee that data of French citizens stored in Microsoft's French regions would never be transmitted to US authorities without French consent. His answer, on the record: "No, I cannot guarantee that." He explained that Microsoft must comply with US legal orders, that the company challenges requests it considers unfounded, and that no such request had ever occurred -- but that no contract or technical control could promise it never would.

That was the largest US cloud vendor on the planet conceding, under parliamentary oath, that the Microsoft EU Data Boundary -- a real, multi-year engineering effort -- does not change the underlying legal exposure. Geography is not jurisdiction.

The law behind it is the US CLOUD Act, a 2018 statute that compels American companies to produce data they hold, anywhere in the world. It reaches any company subject to US jurisdiction -- which includes the major US cloud providers and their EU subsidiaries. Frankfurt and Dublin are irrelevant; the parent sits in Redmond or Mountain View, and that's whose legal system gets to ask. The principle isn't even US-specific: in September 2025, an Ontario court ordered OVH's Canadian subsidiary to hand over data held by group entities outside Canada, despite OVH's argument that disclosure would breach French law. Any provider with a corporate footprint in a jurisdiction is reachable by that jurisdiction's courts.

On the EU side, the European Court of Justice has weighed in twice -- striking down Safe Harbour in Schrems I (2015) and Privacy Shield in Schrems II (2020) on the grounds that US surveillance law gave EU citizens no meaningful redress. The current mechanism is the EU-US Data Privacy Framework (DPF), adopted in 2023. It survived its first legal challenge on 3 September 2025, when the EU General Court dismissed French MP Philippe Latombe's annulment action. Latombe appealed to the CJEU on 31 October 2025; the appeal is pending. The DPF is currently valid law -- and it is also the third framework in this lineage, with the cycle of adoption, challenge, and re-examination now repeated three times in fifteen years.

For most organisations, the DPF is enough to keep the lights on. For organisations under sector-specific rules -- DORA for financial services, NIS2 for critical infrastructure, GDPR Article 9 for health data, or national public-records statutes (in Denmark, offentlighedsloven) -- "enough to keep the lights on" is a weak place to build a multi-year platform, especially with the underlying framework under active appeal.

For the architect: the practical compellability test is whether the entity holding your data, or any entity that controls it, can be served with a US legal order it must obey. A US-headquartered provider operating an EU subsidiary fails this test even when the subsidiary is separately registered, because the parent retains operational control. Microsoft's EU Data Boundary -- completed in February 2025 across customer data, pseudonymised personal data, and professional services data -- meaningfully reduces transfers outside the EU during normal operations, but does not change the CLOUD Act analysis. That's what Microsoft's own General Counsel for France confirmed under oath in June 2025. Some vendors have responded by setting up "EU-operated" entities with independent boards, or by forming joint ventures with EU partners (Bleu in France, S3NS for Google Cloud) where the EU partner holds majority control -- read the actual corporate structure before treating these as equivalent to a fully EU-domiciled provider.

The point isn't that US providers are doing anything wrong. They're operating exactly as their legal environment requires. The point is that "EU region" is a statement about geography, and your risk question is about jurisdiction.

Who this actually affects

It's tempting to frame this as a public sector problem. It isn't. The legal exposure shape is the same across a much wider set of organisations; only the regulator that shows up differs.

Public sector. Municipalities, regional bodies, and ministries processing citizen data, case files, or internal correspondence are bound by GDPR and by national administrative law (in Denmark, forvaltningsloven and offentlighedsloven; equivalents exist across the EU). The question of who can be compelled to access citizen records under foreign law is not academic -- it's an audit finding waiting to happen.

Financial services. DORA, in force since January 2025, requires financial entities to maintain operational resilience and to know their third-party dependencies in detail. "We rely on a US provider whose transfer mechanism is on its third generation and currently under appeal at the CJEU" is a sentence DORA auditors will want explained, with a transfer impact assessment to back it up.

Healthcare. GDPR Article 9 treats health data as special category data with stricter handling. Patient records and clinical notes are exactly where jurisdictional clarity matters.

Legal. Privileged client material can't sit under foreign jurisdiction. A US court order reaching into a European law firm's case management system is the kind of scenario that ends careers.

Any organisation with works council agreements restricting employee data flows, NIS2 obligations, or sector-specific residency rules.

The legal exposure shape is identical across all of them. Only the regulator that turns up differs -- and the size of the fine when they do.

This isn't a fringe position any more. The EU adopted a non-binding Declaration for European Digital Sovereignty on 18 November 2025. The European Commission's Digital Omnibus Regulation Proposal followed on 20 November 2025. And in April 2026, the EU's SEAL-3 sovereign cloud tender awarded €180M to providers explicitly chosen because they sit outside the reach of the CLOUD Act. EU institutions are now treating the residency-vs-sovereignty distinction as a strategic procurement question, not an academic one.

What the real options look like

There are four realistic tiers. Each has its place, and each has a cost.

Tier 1: US provider, "EU region." Data residency in the EU, jurisdiction in the US. Path of least resistance, cheapest to set up, best supported. The DPF makes it a legally valid transfer mechanism as of May 2026, and the hyperscalers' EU residency offerings (Microsoft EU Data Boundary, AWS European Sovereign Cloud, Google's EU-localised services) are real engineering work. Fine for workloads where the data isn't special-category, privileged, or citizen-record material -- and where the platform's lifespan is short enough that a future framework change is a problem you can absorb. Problematic for the categories above.

For the architect: this tier includes Azure OpenAI in EU regions, AWS Bedrock with EU model endpoints, Google Vertex AI with EU residency commitments, and the major SaaS layers built on top of them. The good news is they're all easy to use and well-documented. The bad news is the jurisdictional question is identical across all of them, and the DPF is the load-bearing legal mechanism. Plan accordingly.

Tier 2: EU-jurisdiction API provider. Companies like Mistral and Aleph Alpha operate under EU law, with infrastructure and corporate structure inside the EU. Sovereignty comes as a property of the vendor, not as a feature you have to negotiate. One caveat on Aleph Alpha specifically -- since April 2026 it has been folding into a merger with Canada's Cohere, so check the resulting corporate structure before treating it as a purely EU-domiciled provider. You give up some control -- you don't pick when models are upgraded, you depend on their uptime, the catalogue is narrower than the US providers'. For most use cases this is the sensible middle ground. The quality gap with frontier models has closed enough that "EU providers can't deliver good results" is no longer a credible objection.

For the architect: Mistral Medium 3.5 (128B, open weights, 256K context) lands around 77.6% on SWE-Bench Verified on Mistral's own figures -- ahead of the previous-generation proprietary flagship and within a couple of points of the current frontier on that benchmark, though Mistral didn't publish the standard general-reasoning suites (MMLU, GPQA) alongside it, so the coding strength doesn't automatically transfer to reasoning-heavy work. The open weights also mean you can self-host the same model later without a re-implementation. Part 2 takes the capability and cost picture apart in detail.

Tier 3: Hybrid. Embeddings, retrieval, and document storage stay inside your perimeter. The generation step calls out to an EU-jurisdiction API. Sensitive content never crosses your network boundary in raw form -- only retrieved chunks, which you can scope, redact, or filter before they go anywhere. This is the sweet spot for most organisations. It covers the data-protection requirements without committing you to running frontier-quality inference locally, which is still expensive and operationally heavy in 2026.

For the architect: the architectural pattern is straightforward: vector store inside your perimeter (Qdrant, Weaviate, pgvector, whatever your team is comfortable operating), embedding model running locally on commodity GPU hardware or even CPU for smaller models, generation via Tier 2 API. The retrieval call is the only place where any of your content leaves your network, and you can apply chunk-level controls before it does. Build the integration through an abstraction like Microsoft.Extensions.AI's IChatClient -- or the equivalent in your language -- so that swapping the generation provider doesn't require touching application code. Sovereignty insurance is cheap if you buy it on day one.

Tier 4: Fully on-premises. Your hardware, your network, your inference. Maximum control, maximum operational burden. Justified when the data genuinely cannot leave the building -- some healthcare workloads, defence work, certain categories of legal material. The hardware story has improved a lot: a single workstation with enough VRAM now hosts models that would have needed a small cluster in 2024. But you're taking on model upgrades, GPU procurement, capacity planning, and the operational tail of keeping inference reliably available. For most organisations, Tier 3 covers the same risk at a fraction of the cost.

For the architect: Tier 4 is increasingly viable for the 30B-70B range on a 128GB unified-memory-class workstation (as of 2026, the Framework Desktop with Strix Halo is one current example) or comparable Nvidia builds. A 30B-class model runs on a single such machine and serves a small team. Frontier-quality generation still implies serious hardware investment -- but for the agent and retrieval workloads where most of the cost lives, it's no longer exotic. Part 2 breaks down the hardware classes and their costs.

A legitimate counter-argument, taken seriously

Before we get to the action items, the counter-argument deserves a serious hearing.

Plenty of large EU organisations -- including ones running mature compliance functions -- have looked at this analysis and reached a defensible position: the DPF is currently valid law, the EU Data Boundary is a real engineering effort, our workloads don't involve special-category data, and the operational benefits of using a hyperscaler outweigh the marginal jurisdictional risk. They are not being naive. They have done the calculation, and the answer came out in favour of Tier 1.

This post isn't arguing those organisations are wrong. It's arguing the calculation is more complex than the marketing copy suggests, and the answer is specific to which workload, which data class, which time horizon. Three things worth holding in mind:

Operational certifications are not jurisdictional sovereignty. ISO 27001, SOC 2, BSI C5, Spain's ENS, France's SecNumCloud -- these are meaningful and well-implemented security and operational standards. Microsoft, AWS, and Google all hold an impressive set of them, and the certifications are not theatre. They address security posture, access controls, incident response, and operational discipline. They do not address the question of which government's courts can compel the provider to produce data. That's a different axis, and an organisation that conflates the two is missing the actual risk.

The risk is workload-specific. A code completion tool reading open-source repositories has approximately zero CLOUD Act exposure. A municipality running a chatbot over case files in a child protection unit has a great deal. A bank using AI to triage suspicious activity reports (SARs) sits in regulated territory where the audit trail itself becomes evidence. A law firm indexing privileged client documents for semantic search has, in one step, made every one of those documents discoverable under a jurisdiction that has nothing to do with their clients. The right question is not "do we use US providers" -- it's "for this specific data class, what happens in the worst case, and how likely is that scenario over the platform's lifespan?"

The risk is time-horizon-specific. A six-month pilot has a different exposure profile than a ten-year case management system. The DPF survived its first challenge in September 2025; the appeal is pending. Even if it survives the appeal, the cycle has now repeated three times. A platform you expect to run for a decade is making a longer bet on US-EU legal stability than a tactical pilot is.

The honest version of the argument is: for most organisations, for most workloads, Tier 1 is currently a defensible choice. For some organisations, for some workloads, it is not -- and the gap between those two cases is exactly what this post is trying to make visible.

What this means for your next AI project

Three concrete moves, regardless of which tier you end up choosing.

Map your data classes before picking a model. Not all workloads need the same tier of sovereignty. A chatbot on office hours is not a system summarising patient records. A small classification exercise up front saves you from over-engineering the easy cases and under-engineering the hard ones.

Push back on "EU region" claims in vendor conversations. Ask about corporate jurisdiction, who can be compelled to access the data, and what happens if the DPF is struck down. Good vendors will answer straight. Vendors who get defensive at the question are telling you something.

Build your integration at a layer that lets you swap providers. In .NET, Microsoft.Extensions.AI gives you an IChatClient abstraction that hides the specific provider behind a stable interface. Other languages have equivalents. Don't bake a vendor's SDK into your application code. The cost is a few hours of design work; the benefit is that "move off this provider" becomes a configuration change, not a rewrite. Sovereignty insurance is cheap on day one and expensive to retrofit.

What's next in this series

This post was about why the distinction matters. The next two go deeper:

Part 2 unpacks what "on-premises AI" actually means in 2026 -- the four tiers from this post, with honest assessments of cost, control, and operational burden for each.
Part 3 is the reference architecture: a working RAG system on EU-jurisdiction components, focused on the generation boundary -- the point where your choice of tier becomes a configuration decision rather than an architectural commitment -- with audit logging and the integration points that matter in real procurement.

The harder truth is that the residency-vs-sovereignty gap is now in the public record. Microsoft's General Counsel for France confirmed it under oath. The EU Commission is treating it as a strategic procurement question. The DPF is on its third generation and under appeal. None of this means US providers are off-limits -- many organisations will read the same evidence and reasonably stay on Tier 1. But the choice should be made knowingly, with the data classes mapped and the time horizon understood, not because the marketing copy said "EU region" and procurement assumed that settled the question.

The fix isn't dramatic. It's procedural. Map the data, push back on the vocabulary, build the abstraction, pick the tier that matches the actual risk. If your organisation is working through this -- public body, regulated private-sector business, or somewhere in between -- I'd be interested to hear what's showing up in your RFPs and design reviews. Email's in the footer.