SHARE — Spanish Health Data Space for Cancer Research
Acuratio's Federated Learning Module delivers compute-to-data training across the SHARE health data space — prognosis and survival models on 30,000+ lung-cancer patients from 197 Spanish hospitals, with clinical records never leaving each institution.
-
30,000+
Patients covered
-
197
Hospitals (GECP)
-
Hospital U. Puerta de Hierro
Coordinator
-
EDC · DID:WEB · DSP/DCP
Standards
The SHARE project (SpanisH cAnceR data spacE) is building Spain’s first federated, EU-interoperable health data space dedicated to lung-cancer research — pulling clinical, genomic, quality-of-life and ambulatory-monitoring data from more than 30,000 patients across the 197 hospitals of the Spanish Lung Cancer Group (GECP). It is coordinated by Hospital Universitario Puerta de Hierro Majadahonda (HUPHM) and funded by the Spanish Ministry for Digital Transformation and Public Function through the EU NextGeneration-EU Recovery Plan. Lung cancer is the leading cause of oncological mortality in Spain — more than 23,000 deaths a year — and the practical obstacle to better prognostic models has never been the science; it is that no hospital can hand over its data.
Acuratio is the federated-learning partner for the project. The piece we built and brought to SHARE is the Federated Learning Module that plugs into each hospital’s Eclipse Dataspace Components (EDC) connector and turns the dataspace into a working training fabric — without moving a single clinical record.
Architecturally, the module is a Master–Slave system that lives natively inside the dataspace. A Federated Learning Slave is deployed alongside every hospital’s EDC connector and executes training inside the hospital’s network boundary, reading local data through the CLARIFY Data Proxy. A Federated Learning Master, integrated with its own connector, discovers participating nodes via the federated catalog, negotiates contracts under a dedicated “Federated Model Creators” use case, validates each participant’s credential (DID:WEB + Verifiable Credentials over DSP and DCP) and only then distributes the global model. Slaves train locally, return only model parameters or gradients, and the master aggregates round after round until convergence. Data never crosses the hospital boundary; every authorisation, contract, and access event is logged for audit.
On top of the FL primitives, Acuratio ships precompiled Rust binaries inside the analytics container that implement Private Set Intersection, private identifier matching, and the decryption routines used to operate over shared identifiers and encrypted datasets without exposing the underlying records — the building blocks of cross-institution cohort analysis where even patient IDs are sensitive.
The integration tests are already green: the CLARIFY Proxy, the CLARIFY Connector (with the Federated Slave) and the Acuratio Federated Master operate synchronously end-to-end, with data staying inside HUPHM’s infrastructure and only model parameters travelling. The next phase scales the same pattern from one provider to a federated set of GECP hospitals as both providers and consumers — laying a concrete precedent for Spain’s emerging National Health Data Space (ENDS) and for federated AI in the European Health Data Space.