US commerce unit expands AI model testing agreements with Google, Microsoft and xAI

by

The Center for AI Standards and Innovation, a unit of the US Department of Commerce, has signed agreements with Google DeepMind, Microsoft and xAI that would let it evaluate frontier AI models before public release, according to a NIST release on Tuesday.

KEY FACTS

  • Scope CAISI will conduct pre-deployment evaluations and targeted research on frontier AI capabilities.
  • Companies Google DeepMind, Microsoft and xAI joined earlier agreements with Anthropic and OpenAI.
  • Earlier deals Similar arrangements were announced in August 2024 under the former US Artificial Intelligence Safety Institute name.
  • Government move Bloomberg reported the White House is preparing an executive order for a vetting system for new AI models.

The agreements give the agency a role in testing advanced systems before they are released to the public. The disclosure says the work is intended to better assess frontier AI capabilities and advance AI security.

Microsoft said the arrangement and others like it are important for building trust in advanced AI systems. It said testing and safeguards need to become more rigorous as model capabilities improve.

Industry analyst Fritz Jean-Louis said the approach points to more proactive security for agentic AI by allowing government-led testing before and after deployment. He said the move could improve visibility into autonomous behavior, though questions remain about intellectual property protection.

The report follows earlier agreements with Anthropic and OpenAI, and it comes as Washington weighs broader federal oversight of new models. Carmi Levy said the recent announcements suggest a significant policy shift toward tighter partnerships with AI vendors and more defined cybersecurity and safety rules.

WHY IT MATTERS

The new agreements could give federal officials earlier insight into model behavior before public release, while also shaping how major AI vendors test and document safety. They may also become part of a wider US system for assessing security risks in advanced AI.