New Approach Effectively Protects Sensitive AI Training Data
Data privacy comes at a cost. While there are security techniques that safeguard sensitive user data, like customer addresses, from attackers attempting to extract them from AI models, these measures often decrease the models' accuracy.

Figure 1. Innovative Method Safeguards Sensitive AI Training Data Efficiently.
MIT researchers recently developed a framework based on a new privacy metric, PAC Privacy, which helps maintain an AI model's performance while ensuring sensitive data—such as medical images or financial records—remain safe from potential attackers. Now, they've enhanced their technique to be more computationally efficient, improving the balance between accuracy and privacy, and creating a formal template that can be applied to privatize virtually any algorithm, even without access to its inner workings. Figure 1 shows Innovative Method Safeguards Sensitive AI Training Data Efficiently.
The team applied this updated version of PAC Privacy to privatize several classic algorithms used in data analysis and machine-learning tasks.
They also found that more "stable" algorithms are easier to privatize with this method. A stable algorithm's predictions remain consistent even when its training data is slightly altered. This greater stability helps the algorithm make more accurate predictions on previously unseen data.
According to the researchers, the new PAC Privacy framework's increased efficiency and its four-step template for implementation make it more practical for real-world applications.
“We often see robustness and privacy as unrelated or even conflicting with creating high-performance algorithms. First, we build a working algorithm, then we make it robust, and finally, we add privacy. Our work shows that this is not always the right approach. If you make your algorithm perform better in various settings, you can essentially achieve privacy for free,” says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework.
She is joined in the paper by Hanshen Xiao, PhD ’24, who will start as an assistant professor at Purdue University in the fall, and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering at MIT. The research will be presented at the IEEE Symposium on Security and Privacy.
Estimating Noise
To protect sensitive data used in training an AI model, engineers often add noise—randomness—to make it harder for adversaries to reverse-engineer the original training data. However, this noise can degrade a model's accuracy, so minimizing the noise is desirable.
PAC Privacy automatically estimates the smallest amount of noise required to achieve a specified level of privacy.
The original PAC Privacy algorithm runs an AI model multiple times on different dataset samples, measuring variance and correlations among these outputs to estimate how much noise to add.
The new version of PAC Privacy operates in the same manner but eliminates the need to represent the entire data correlation matrix. It only needs the output variances, making it much faster and better suited for larger datasets.
“Because the thing you are estimating is much smaller than the entire covariance matrix, you can do it much faster,” Sridhar explains, enabling scalability for large datasets.
Adding noise can reduce the utility of the results, so minimizing this loss is crucial. The original PAC Privacy algorithm was limited to adding isotropic noise (uniform in all directions). The new version, however, estimates anisotropic noise tailored to the specific characteristics of the training data, allowing for less noise while achieving the same level of privacy and improving the accuracy of the privatized algorithm.
Privacy and Stability
Sridhar hypothesized that more stable algorithms would be easier to privatize using PAC Privacy. She tested this hypothesis with the more efficient version of PAC Privacy on several classical algorithms.
Stable algorithms exhibit less variance in their outputs when their training data is slightly modified. PAC Privacy divides the dataset into chunks, runs the algorithm on each chunk, and measures the variance among outputs. The greater the variance, the more noise is required to privatize the algorithm.
By employing stability techniques to reduce the variance in an algorithm’s outputs, one can reduce the amount of noise needed to privatize it, Sridhar explains.
The team showed that the privacy guarantees remained strong across different tested algorithms, and the new PAC Privacy version required far fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees withstood state-of-the-art attacks.
“We want to explore how algorithms could be co-designed with PAC Privacy from the start to ensure they are more stable, secure, and robust,” says Devadas. The researchers also plan to test the method with more complex algorithms and further explore the privacy-utility tradeoff.
"The question now is: When do these win-win situations occur, and how can we make them happen more often?" Sridhar concludes.
Source: MIT NEWS
Cite this article:
Priyadharshini S (2025), New Approach Effectively Protects Sensitive AI Training Data, AnaTechMaz, pp.123