AI Safety Cooperation: The Collective Action Problem That Could Undermine the Whole Effort

In 2019, OpenAI published a policy research paper making an argument that would grow more urgent with every subsequent model release: that individual AI companies operating in competitive markets will systematically under-invest in safety unless something forces coordination.

The paper identified what economists call a collective action problem. A single company that slows development to do thorough safety testing loses market share to rivals who do not. The rational response — if you are optimizing for competitive position — is to match your rivals’ pace and cut safety corners at the margin. Multiply that incentive across the industry and you get a race to the bottom, even if every participant would prefer a world where everyone invested heavily in safety.

To counter this dynamic, the paper proposed four broad strategies:

Communicating risks and benefits. Companies should be transparent about what their systems can and cannot do, including where they fail. The goal is to build shared understanding of the risk landscape so that companies, regulators, and the public are working from the same factual baseline.

Technical collaboration. Joint safety research — sharing evaluation methodologies, red-teaming results, and interpretability tools — reduces the cost of safety work across the industry and prevents duplicated effort on problems that affect everyone.

Increased transparency. Publishing information about training data, evaluation benchmarks, and safety testing lets external parties scrutinize claims and identify gaps. Transparency also creates reputational pressure: a company that publicly commits to a safety standard faces higher costs if it visibly fails to meet it.

Incentivizing standards. Industry-wide technical standards for AI safety — similar to how ISO/IEC standards operate in other sectors — create a shared floor that all companies must clear. Standards backed by certification reduce the competitive incentive to cut corners because cutting corners becomes a certification liability.

What Changed After 2019

The 2019 paper was written before the large-language-model wave transformed the competitive landscape. GPT-2 had just been released; the race dynamics the paper described were largely theoretical. By 2023 they were operational, playing out in real time across OpenAI, Google DeepMind, Anthropic, Meta, and dozens of smaller entrants.

The voluntary coordination mechanisms the paper proposed showed their limits under that pressure. Multi-stakeholder bodies like the Partnership on AI produced principles and frameworks but no binding commitments. Companies published safety cards and system cards, but the content was self-reported and the format varied widely. Technical collaboration remained limited by competitive sensitivity around training runs and alignment research.

Regulators noticed the gap. The EU AI Act, which entered force in August 2024, imposes mandatory conformity assessments on high-risk AI systems and requires general-purpose AI model providers above a certain compute threshold to conduct adversarial testing and report serious incidents to the EU AI Office. The Act does not rely on voluntary industry cooperation; it mandates outcomes through law, with enforcement teeth in the form of fines up to 3% of global annual turnover for non-compliance (Article 99).

In the United States, NIST published the AI Risk Management Framework in January 2023 — a voluntary framework structured around four functions: Govern, Map, Measure, and Manage. The NIST AI RMF is not a regulation, but federal agencies increasingly reference it as an expected baseline, and it is shaping state-level legislative drafts.

What neither the EU Act nor the NIST framework fully resolves is the original problem the 2019 paper identified: the incentive structure for frontier model development still rewards speed. Mandatory frameworks impose floors, but floors are not the same as optimal safety investment. A company can clear a conformity assessment and still be running faster than its risk-management processes can catch up.

What the Strategies Look Like in Practice

Seven years after the paper was published, each of the four strategies is partially implemented — none is fully realized.

Communicating risks and benefits has improved, but inconsistently. Major labs publish system cards and model cards, though the depth of disclosure varies and there is no standardized format that permits cross-company comparison.

Technical collaboration exists primarily through academic channels and multi-stakeholder bodies. The commitments are real but the scope remains limited compared to what frontier-model safety requires.

Increased transparency has advanced through mandatory incident reporting under the EU AI Act for high-risk systems, but voluntary transparency disclosures remain uneven. The compute thresholds in the Act mean the largest models face the most scrutiny — a defensible design choice, but one that leaves a substantial middle tier under-examined.

Incentivizing standards is the least mature of the four. ISO/IEC 42001 (AI management systems) was published in 2023 and is beginning to appear in enterprise procurement requirements. Harmonised standards under the EU AI Act are still under development by CEN-CENELEC. The standards infrastructure that would create durable safety floors does not yet exist at scale.

What AI Product Teams Should Do This Quarter

Given where the regulatory and voluntary frameworks actually stand, product teams at AI companies face a practical question: what coordination and disclosure work is genuinely required, and what remains aspirational?

Three concrete steps:

1. Map your system against the EU AI Act’s risk categories. If you are placing a system in the EU market, you need to know whether it falls into a prohibited use, a high-risk category under Annex III, or the GPAI tier. The consequences — conformity assessment requirements, transparency obligations, incident reporting timelines — differ substantially across those categories. This mapping is legal’s job, with engineering input on actual system behavior, and it should be documented before a regulator asks for it.

2. Adopt the NIST AI RMF Govern and Map functions as your internal baseline. Even though the framework is voluntary, GRC teams at your enterprise customers are asking for evidence of AI risk management. The Govern function covers organizational policies and accountability structures; the Map function covers risk identification and classification. If you do not have documented processes for both, build them before the next customer audit cycle, because the question is coming regardless of whether a law requires it.

3. Track harmonised standards development under the EU AI Act. CEN-CENELEC is developing the technical standards that will create a presumption of conformity for AI systems under the Act. Companies that participate in standards development shape the requirements; companies that wait to read the final text are playing catch-up with rules they had no voice in writing. Assign someone on engineering or policy to track this and engage in the comment process.

The collective action problem the 2019 paper identified has not gone away. Mandatory frameworks have raised the floor, but the floor is not the ceiling, and competitive pressure is as strong as it has ever been. The four strategies that paper proposed are still the right framework. They are just now being built by regulators, because the industry did not build them voluntarily fast enough.

Sources

Why responsible AI development needs cooperation on safety — OpenAI’s 2019 policy research paper identifying the collective action problem in AI safety investment and proposing four coordination strategies for industry. https://openai.com/index/cooperation-on-safety ↗

NIST AI Risk Management Framework — NIST AI RMF 1.0, published January 2023, provides a voluntary structured approach to AI risk management across Govern, Map, Measure, and Manage functions; increasingly referenced as a baseline in federal and state AI governance contexts. https://airc.nist.gov/Home ↗

OECD AI Principles — The OECD Principles on AI, first adopted in 2019 and updated in 2024, provide an international policy baseline that has influenced both the EU AI Act’s framing and the NIST AI RMF. https://oecd.ai/en/ai-principles ↗