Hugging Face: AI/ML Hub Vulnerable to Training Model Corruption


Hundreds of thousands of downloads could be corrupted or attacked, According to HiddenLayer Researchers

In a groundbreaking study by HiddenLayer, a leading security firm for artificial intelligence (AI) models, a critical vulnerability has been uncovered on Hugging Face, the renowned platform likened to the “GitHub of AI/ML.” This platform is celebrated for its role in propelling AI projects forward by offering a space for developers to share open-source code, models, and data. However, this new research highlights a glaring security flaw that poses a risk to the vast number of AI models hosted on the site, potentially affecting hundreds of thousands of users.

The issue centers around Hugging Face’s SFconvertbot, a tool designed to enhance security by converting machine learning models into the supposedly more secure Safetensors format. The investigation, titled “Silent Sabotage: Hijacking Safetensors Conversion on Hugging Face,” reveals that this well-intentioned service has inadvertently become a gateway for security breaches. Malicious actors have discovered a way to exploit the conversion process, allowing them to submit harmful code or compromised models through pull requests to any public repository on the platform. This vulnerability extends to private repositories as well, where entering a user token during the conversion process could result in token theft and unauthorized access to private models and datasets.

The implications of such vulnerabilities are profound and far-reaching. Chris “Tito” Sestito, Co-Founder and CEO of HiddenLayer, said: “The compromise of the conversion service has the potential to rapidly affect the millions of users who rely on these models to kick-start their AI projects, creating a full supply chain issue. Users of the Hugging Face platform place trust not only in the models hosted there but also in the reputable companies behind them, such as Google and Microsoft, making them all the more susceptible to this type of attack. This vulnerability extends beyond any single company hosting a model.” 

The research points out that among the top 10 most downloaded models from Google and Microsoft, those converted by the compromised bot accounted for over 16 million downloads in just the past month. Considering the platform hosts over 500,000 models, the scale of the issue is staggering. The bot itself has made over 42,657 pull requests to repositories on the site to date, any of which have the potential to be compromised.

The research conducted by HiddenLayer not only exposed the method by which attackers could steal tokens from the official Safetensors conversion bot but also demonstrated how they could leverage this access to manipulate the service. This could lead to the distribution of malicious models across numerous repositories or even allow attackers to access and modify private repositories and datasets.

The potential consequences for such an attack are enormous, as an adversary could implant their model in its stead, push out malicious models to repositories en-masse, or access private repositories and datasets. Moreover, where a repository has already been converted, a malicious actor could still submit a new pull request, or in cases where a new iteration of a PyTorch binary is uploaded and then converted using a compromised conversion service, repositories with hundreds of thousands of downloads could be affected.

Hugging Face plays a pivotal role in the AI/ML community, offering resources that facilitate model sharing, accelerate training, and minimize environmental impact. Yet, despite the platform’s efforts to secure its ecosystem, this vulnerability underscores the challenges in protecting AI models from sophisticated cyber threats. It reveals a potential for widespread supply chain attacks that could compromise the integrity of countless AI projects. The findings serve as a stark reminder of the importance of robust security measures in the rapidly evolving field of artificial intelligence.


Follow Brilliance Security Magazine on Twitter and LinkedIn to ensure you receive alerts for the most up-to-date security and cybersecurity news and information.