Limits of Controller-Centric Data Governance
The Proxy Trap: Why Federated Queries and Obfuscation Cannot Fix the Trust Deficit in Data Governance
Introduction
The modern data landscape is witnessing a massive push toward privacy-preserving architectures. In health tech and clinical research networks, recent announcements like the rollout of the Cohort Discovery service on the Health Data Research Gateway promise a faster, safer way for researchers to identify relevant patient cohorts. Underlying infrastructures like Hutch’s "Bunny" component exemplify this shift: they are open-source, run safely behind a controller’s local firewall to mitigate direct database exposure, and use result obfuscation to simplify technical compliance with frameworks like the GDPR.
Yet, as security and privacy failures continue to mount, a critical structural question emerges: Does securing the data controller’s pipeline actually resolve individual privacy risk?
The answer is no. This architecture relies on a "Controller-Centric Proxy" model. While it solves localized engineering challenges, it fails to fix the deep, architectural erosion of public trust because it leaves the individual entirely out of the control loop.
1. Federated Querying, Trust Boundaries, and the Limits
Health data infrastructure is rapidly shifting toward federated analytics and secure research environments (TREs/SDEs), where data remains local while queries move across systems. Projects such as HDR UK’s Cohort Discovery service aim to enable researchers to identify relevant datasets without direct data access.
However, beneath the surface of these improvements lies a deeper governance question: who controls the computation layer itself?
2. From Cohort Discovery to Federated Execution
HDR UK’s Cohort Discovery service allows researchers to determine whether relevant patient cohorts exist before requesting access to datasets. This reduces unnecessary data requests and improves research efficiency [1].
However, in federated systems, the actual execution of queries is often delegated to distributed components embedded within Secure Data Environments. These components operate inside institutional boundaries and enforce local policies while executing remote queries.
3. The Hutch “Bunny” Component and the Hidden Trust Boundary
Technical documentation from Hutch describes a lightweight query execution system—commonly referred to as Bunny—designed to run inside secure environments behind institutional firewalls [2].
Bunny acts as a local execution agent capable of:
- Receiving queries from upstream task APIs
- Executing cohort queries on OMOP-compatible databases
- Returning aggregated or obfuscated outputs
- Operating within federated or standalone deployments
It is also designed for containerized deployment and scalable execution across multiple instances (sometimes informally referred to as a “fluffle” architecture).
A “fluffle” refers to multiple Bunny instances operating in parallel across distributed environments to process federated queries efficiently.
3. What Bunny Solves — and What It Does Not
| Dimension | What Bunny Does | What It Does Not Do |
|---|---|---|
| Task | Executes federated cohort discovery queries | Does not assess necessity or proportionality of query intent |
| Risk | Applies obfuscation to reduce re-identification risk | Does not evaluate contextual risk for individuals |
| Control | Provides secure execution within institutional firewall | No patient-level authorization over data usage |
| Governance | Supports compliance via pseudonymization and access control | Does not replace legal accountability of controllers |
| Architecture | Enables scalable federated querying | Does not implement user-centric or per-element consent models |
4. Anatomy of the Controller-Centric Proxy
To understand why tools like Bunny are an incomplete solution for true data sovereignty, we must analyze what they do versus what they cannot do.
| Dimension | Technical Action (What Bunny Does) | Architectural Blind Spot (What It Leaves Unresolved) |
|---|---|---|
| Task Execution | Automates Cohort Discovery queries against structured databases (e.g., OMOP CDM). | Fails to assess whether the incoming query is genuinely necessary or proportionate. |
| Risk Mitigation | Obfuscates aggregated query results to reduce re-identification risks. | Fails to evaluate contextual risk—the specific real-world vulnerability of the individual whose data is queried. |
| Data Control | Grants data controllers a secure, auditable local execution layer (via scalable parallel arrays or "fluffles"). | Offers zero agency or visibility to the patients regarding which specific data elements are exposed. |
| Governance | Automates technical compliance constraints (access logs, local network boundaries). | Operates purely as a proxy, shifting the ethical burden of justifying data use entirely back to the institution. |
This reveals the core problem: Localizing query execution does not change who defines purpose, necessity, and acceptable exposure. The architecture remains completely centralized around institutional logic.
5. The Core Structural Issue: Controller-Centric Proxy Governance
Systems like Bunny are best understood as controller-centric execution proxies. They operationalize lawful processing under frameworks such as GDPR and national PDPA regimes, but they do not alter the fundamental governance model.
Even with obfuscation, local deployment, and secure execution, the controller still determines:
- What constitutes “necessary processing”
- Which queries are permitted
- What outputs are acceptable
- How risk is interpreted institutionally
This aligns with the findings seen in enforcement actions such as the CNIL’s decision against IQVIA, where pseudonymisation and formal authorization did not eliminate structural risks of identifiability and downstream processing exposure [3].
6. The Missing Layer: Contextual and Individual Risk
Current systems largely operate under institutional risk models:
- Risk is assessed by controllers
- Consent is generalized and static
- Outputs are governed by policy, not by individual context
However, contextual harm is not institutional. It is personal. The same dataset can be:
- Operationally benign for a hospital
- Highly sensitive or dangerous for an individual
7. Toward Smart Data: A User-Centric Architecture
A Smart Data model fundamentally repositions control away from institutional proxies toward individuals themselves.
| Principle | Description |
|---|---|
| Per-element authorization | Users control specific data fields per use-case |
| Process-specific consent | Consent is tied to each processing event |
| Real-time cryptographic control | Data access requires user-issued tokens |
| Contextual risk awareness | Risk evaluated with user context, not only institutional logic |
| Revocability & auditability | Users can withdraw access dynamically with full trace logs |
8. The Fallacy of "Federated Means Safe"
A dominant myth in modern data engineering states that if data remains localized behind a firewall and queries are federated across an ecosystem, privacy risk drops to zero.
This view confuses computation with sovereignty. Changing where a query is processed does not protect an individual from the systemic harms of:
- Institutional over-collection justified under the corporate banner of being technically “compliant.”
- Context loss when data elements are cross-referenced across downstream systems.
- The Re-Identification Threshold: The false comfort of pseudonymization.
This exact blind spot was shattered by the French data protection authority in CNIL Deliberation n° SAN-2026-008 (May 26, 2026). In its landmark ruling against IQVIA Operations France, the CNIL rejected claims that massive, pseudonymized health data warehouses qualified as anonymous data. The regulator highlighted that despite technical protections, unique identifiers, deep data categories, and the ability to cross-reference datasets meant the line between pseudonymized and re-identifiable data remains razor-thin.
When a system relies on a controller-centric view, risk levels are evaluated based on business liability and regulatory check-boxes, rather than the user’s personal safety. Consequently, high-risk data elements routinely slip through, leaving the individual exposed while the platform confidently claims full compliance.
9. Shifting to a True "Smart Data" Architecture
To truly restore user trust, data protection must transition away from tools that merely optimize the controller’s workflows and adopt a Smart Data model. As argued in the architectural thesis "Why Control Must Sit With the User," true data sovereignty requires moving the access control mechanism entirely to the data subject.
Unlike a proxy query system, a Smart Data infrastructure fundamentally transforms the data movement loop:
[Controller-Centric Model]
User Data ──> [Controller Database] ──(Blind Query Execution)──> Automated Results
[Smart Data Model]
User Data ──> [Encrypted Vault] ──(Requires Real-Time Cryptographic Token)──> Selective Disclosure
▲
[User Authorization Gateway]
- Per-Element & Per-Process Authorization: Instead of blanket institutional approvals to search a database, the individual determines exactly which data elements are visible for a single, specific processing operation.
- Real-Time Cryptographic Tokens: The data controller cannot move, analyze, or execute queries against an individual's data block without a real-time, user-side cryptographic authorization token.
- Contextual Risk Assessment: Risk is evaluated from the user’s context, not the corporate view. Users are dynamically informed about the unique vulnerabilities associated with disclosing specific data categories.
- Dynamic Revocation: Consent isn’t a one-time event trapped in an audit log; permissions can be dynamically and selectively revoked in real time.
10 The Collapse of MyIdentity Also can be attributed to controller-centric proxy model (Trust)
The theoretical limitations of the controller-centric proxy model are no longer confined to academic debate[4]; they have played out in real-time with the collapse of major public-private identity frameworks. A definitive example is the 2026 suspension of the UK’s MyIdentity scheme—a multi-million-pound initiative designed as a reusable "digital identity passport" for the housing, financial, and legal sectors.
Despite significant public funding and multi-sector backing, organizers were forced to put the project on hold following a damning UK Home Affairs Committee report titled "Mandatory to Manageable: The Government's Plans for Digital ID." While political mismanagement played a role, the deeper commercial and social rejection of MyIdentity stems entirely from a flawed, proxy-based architecture that fundamentally lacked the means to build genuine user trust.
The Fatal Architectural Flaws of the Federated Proxy Model:
- The "Identity-Passing" Illusion: While marketed as a passport, the framework merely automated the task of the data controller. Once a user authorized an identity token, the underlying attributes were pushed into the relying organization's local database. Because it lacked a decentralized Smart Data loop, users completely lost downstream visibility and real-time revocation power the moment the data left their app.
- The Liability Paradox for Businesses: Under a controller-centric framework, private firms remain ultimately liable for Anti-Money Laundering (AML) and Customer Due Diligence (CDD) compliance, even when using third-party proxies. Because the system was not a cryptographically sovereign user-to-service handshake, organizations were hesitant to assume the staggering legal liabilities of trusting a proxy’s data assessment.
- The Synthetic Identity Loophole: Because MyIdentity and its parallel frameworks rely on legacy architecture—initial physical document scans matched against biometric facial recognition—they are structurally defenseless against modern AI threats. The rapid rise of deepfakes and automated video injection scams meant identity providers could no longer guarantee the integrity of the passport itself, breaking the chain of trust for professional indemnity insurers.
“The core lesson of the MyIdentity failure is clear: optimizing corporate compliance pipelines under the guise of an 'identity passport' does not equal user empowerment. When users realize that a digital identity scheme is just a faster proxy for institutional data collection, trust dissolves.”
By attempting to create a state-sanctioned, centralized mirror of private sector data-gathering instead of abstracting verification directly to the individual, the framework managed to increase transaction friction, raise user costs, and spark widespread public opposition. It stands as empirical proof that a federated proxy model cannot survive. Trust cannot be achieved by optimizing a system that leaves the user powerless.
11. Conclusion: A Fundamental Rebalancing of Power
Tools like Hutch's Bunny represent necessary engineering progress for local infrastructure optimization, helping data custodians handle high concurrent query volumes securely. However, optimizing a flawed, controller-centric power dynamic does not fix the underlying trust crisis.
The real-world limits of this approach are no longer hypothetical; they are actively playing out in the structural collapse of major public-private identity initiatives. A definitive example is the market failure of the UK’s MyIdentity trust framework. Despite years of multi-sector backing to streamline high-friction housing and financial verification, the system fragmented under the weight of the "Proxy Trap." Because it merely automated compliance workflows rather than establishing a cryptographically sovereign user-to-service loop, private firms were left legally exposed—ultimately forcing MyIdentity to advise over 250 property and tech firms to suspend further investment entirely.
This operational breakdown was mirrored at the state level when the UK Home Affairs Committee released its scathing May 2026 report, "Mandatory to Manageable: The Government's Plans for Digital ID." The report highlighted a catastrophic collapse of public trust, marked by an anti-digital-ID petition that amassed nearly 3 million signatures. Both failures serve as empirical proof that when an identity framework is built as a state-sanctioned or corporate-driven proxy to optimize institutional pipelines—leaving users without per-element control or real-time revocation power—the ecosystem will be soundly rejected by both businesses and citizens.
True structural data protection is not achieved by building a cleaner proxy for the institution. It is achieved by embedding a "personal data passport" layer directly into the architecture. By shifting to a user-centric model where the individual controls data decryption and movement via cryptographic verification, the digital economy can finally achieve a framework where data mobility is enabled, but absolute data sovereignty is retained.
Read the deeper architectural look at why this control loop must sit directly with the user here:
👉 Why Control Must Sit With the User




