A new study has found that electrocardiogram (ECG) signals – often shared publicly for medical research – can be linked back to individual people, challenging assumptions about how health data is protected and shared.
In tests using data from 109 participants across several public datasets, the system correctly matched ECG signals to individuals 85 percent of the time, and it remained effective even when noise was added to the data. The researchers used a two-stage approach: first, matching each ECG sample to a known individual; second, deciding whether the sample came from someone outside the known group. The work drew on a Vision Transformer model to process time-series patterns in the signals, and a public version of the study is available here: arXiv PDF.
“Current practices often assume that health data, including ECG, becomes safe once it has been ‘de-identified’ by stripping names or obvious identifiers. Our findings show that this assumption no longer holds. Because ECG signals carry stable, individual-specific patterns, they can act as biometric identifiers,” said Ziyu Wang, a co-author of the research.
The researchers note that ECG data retain distinctive patterns over time. Unlike demographic data, these patterns cannot be removed or generalized without diminishing the medical value of the dataset, which creates a difficult challenge for health organizations seeking to share data for research while protecting patient identities.
Wang and his team outline four steps for policymakers and data stewards to address these risks:
- Reclassify ECG as biometric data. ECG should receive the same sensitivity as other biometric identifiers, requiring higher protection than ordinary clinical data.
- Mandate risk assessment and informed consent. Providers should estimate re-identification risk before sharing data, and patients should acknowledge potential linkage risks in consent forms. Data consumers must be bound by policies that prohibit cross-linking beyond approved uses.
- Enforce cross-institution safeguards. Research collaborations should operate under controlled-access agreements and audited environments, with minimized or generalized metadata such as exact timestamps or device identifiers.
- Strengthen patient and data consumer awareness. Patients should be informed that ECG carries biometric re-identification risks, and researchers and companies must adhere to explicit restrictions on reuse and cross-linking.
Technically, the team employed a Vision Transformer model to analyze time-series patterns. The attack operates in two stages: first, to identify a match with a known individual; second, to determine whether the data originate from someone outside the known set. Even when the attacker did not know exactly which individuals were in the public dataset, the approach remained highly effective. At a defined confidence threshold, only about 14 percent of signals were misclassified.
Beyond ECG, the researchers noted that similar linkage risks could affect other biosignals. They highlighted that PPG, which encodes stable cardiovascular features, showed comparable vulnerability, and they warned that as consumer devices grow more capable, data such as voice and EEG could face rising re-identification risks as well.
For healthcare security, the authors argue that stronger privacy protections for biosignals are essential. “For telehealth providers and wearable companies, the first step is to acknowledge that ECG and similar biosignals should be treated as biometric data, with the same sensitivity as fingerprints or voice,” Wang said. “This means implementing privacy-focused consent and ensuring that patients understand how these signals may be used and shared.” He also noted that privacy measures should target the most identity-revealing regions of the signal and that the field is exploring generative AI approaches to selectively modify or regenerate these personal patterns while preserving medically valuable features for research and model training.
Experts say the findings underscore a broader warning: biometric data is expanding beyond traditional identifiers, demanding evolving protections as healthcare data-sharing and analytics advance.