Researchers propose observational audit to detect label leakage in machine learning models

by

Researchers have introduced an observational auditing framework designed to measure whether machine learning models reveal training labels, and they describe the approach in a new paper. The method aims to lower the engineering burden of privacy testing by avoiding changes to training datasets.

Previous privacy audits commonly relied on inserting canaries – artificial records added to training data – to see if models memorized and later revealed them. That tactic exposed privacy problems but created operational hurdles because many production training pipelines limit dataset changes and require additional review.

The observational audit works by giving an auditor access to a mixed set of labels after the model is trained: some labels are the originals used in training and others are produced by a proxy model. An attacker attempt to guess which records kept their original labels; success above chance indicates the trained model leaked label information. The auditor converts the attacker’s score into a privacy metric that can be compared across methods.

The study notes the proxy labels do not need to match training labels exactly and that an earlier checkpoint of the same model can serve as the proxy, which avoids extra model training. This design is intended to make the audit practical for systems that cannot be modified for each test.

Tests on two datasets – a small image collection used in research and a large click dataset collected over 24 days – showed a consistent pattern: stronger label-privacy settings reduced the auditor’s ability to distinguish training labels from proxy labels, while looser settings increased detectability. The authors also compared the new observational method to the planted-record approach and found both surfaced similar privacy issues.

The researchers argue the approach lowers the complexity of privacy auditing and can be applied where altering training data is impractical, allowing firms to test for label leakage without planted records or additional training. The summary here does not provide full numeric metrics or code availability from the study; readers should consult the paper for detailed results and experimental parameters.