Rand兰德：2024年评估人工智能对国家安全和公共安全的影响报告（英文版）

211.27 KB 12 页 0 下载 22 浏览 0 评论 0 收藏

语言	格式	评分
中文（简体）	.pdf	3
概览
ANTON SHENK Evaluating Artificial Intelligence for National Security and Public Safety Insights from Frontier Model Evaluation Science Day Conference Proceedings 2 ables, thresholds for dangerous AI capabilities, and voluntary risk management policies for scaling AI capabilities. The workshop proceedings synthesize insights from these sessions, outline the complexities of eval- uating AI for dangerous capabilities, and highlight the collaborative effort required to formulate effec- tive policy. Track 1: Chemistry and Biology The chemistry and biology (chem-bio) track illu- minated the intersection of AI with chem-bio risks, incorporating insights from evaluations of general- purpose and domain-specific models. This section details lessons learned from completed model evalu- ations, needs and priorities for subsequent rounds of evaluations, and considerations for wet lab validation of model outputs. Lessons Learned from Completed Model Evaluations Embracing Complexity in Chem-Bio Model Assessments This session highlighted the persistence of threat actors and the complex evolution of chem-bio threats. During the discussion, one participant observed a potential limitation of existing evaluation meth- ods, suggesting that marking an entire task as failed because of early setbacks might not fully capture the resilience and adaptability of threat actors. This cri- tique posits that a more nuanced approach accounting for threat progression and troubleshooting—such as knowing the proportion of sub-steps that succeed— could provide a more comprehensive and continu- ous understanding of the threat landscape. Tabletop exercises were proposed to explore the dynamics of troubleshooting and iteration further; however, their effectiveness in this context remains to be tested. Navigating the Complexities of Dual-Use Dangers and Domain-Specific Models In this session, participants noted that formulat- ing concrete, detailed threat models is crucial for F rontier Model Evaluation Science Day assem- bled more than 100 leading experts in artifi- cial intelligence (AI), national security, and policy to address the emerging challenges of evaluating threats from advanced AI systems. The day’s agenda was structured around four tracks, each focusing on a unique aspect of AI evaluation science and policy. These tracks were developed to address fundamental issues in the field while keeping the meeting agenda and invitation list manageable. The meeting’s focus on evaluation methodology provided a specialized forum for in-depth discussion, distin- guishing it from broader AI security topics covered in other venues. The four tracks were as follows: • The chemistry and biology track focused on the intersection of AI with chemical and biological risks. This track utilized insights from previous evaluations of general-purpose and domain-specific AI models and aimed to identify current and future evaluation needs, including integrating wet lab validation and automated lab processes. • The loss of control track explored scenarios in which AI systems could operate beyond the intended boundaries set by their develop- ers or users—including AI systems deceiv- ing humans or acting autonomously. These discussions aimed to identify early warning signs and explore strategies to prevent loss of control of AI systems. • The risk-agnostic methods track sought to outline comprehensive and universal approaches to evaluating AI models, spanning such topics as red teaming, automated bench- marking, and task design. Its objective was to forge a versatile framework for assessing AI systems’ capabilities, applicable across varied risk scenarios, to ensure that evaluations are consistently rigorous and at the forefront of the science. • The collaboration and coordination track aimed to connect stakeholders in government, industry, and civil society to develop a shared understanding of the objectives of evaluation science. Discussions in this track centered on establishing key policy timelines and deliver- 3 interaction.1 Currently, legal and contractual frame- works governing data sharing and the conduct of evaluations are significant barriers to such access; nondisclosure agreements create opacity over study designs, evaluations already performed on models, and evaluation outputs. This lack of transparency inhibits the ability of the research community to thoroughly assess model capabilities and potential risks. To bridge this gap, participants highlighted the importance of establishing mechanisms for greater visibility into the model development phase— particularly for models planned to be open-sourced because of the inability to control their diffusion once deployed. The formation of a consortium or another independent body can play a crucial role in coordinat- ing on each of these challenges. Such an entity could facilitate discussions, mediate among stakeholders, and help clarify the legal and contractual aspects of running dangerous capability evaluations, thereby streamlining the process for all involved. Identifying Risks A core component of this session was dedicated to identifying and categorizing AI-enabled chem-bio 1 For more information on various forms of access and their implications for AI audits, see Casper et al., 2024. understanding the landscape of potential AI-enabled chem-bio threats and the actors behind them. This process might involve exploring new capabilities that malicious actors previously did not have access to or accelerating and simplifying existing processes, making them more accessible to a broader variety of actors. Once threat models are identified and the appropriate evaluation undertaken, a notable chal- lenge still remains, as identified by one participant: the difficulty for evaluators to accurately embody malicious actors. Although not unique to AI/chem- bio red teaming, this challenge arises, according to the participant, from a tendency to underestimate the likelihood that certain actions could succeed, lead- ing evaluators to potentially overlook the full extent of what a malicious actor might attempt and achieve. Moreover, this difficulty sets the stage for address- ing an even more formidable challenge: developing countermeasures against dual-use threats. These threats embody the inherent difficulty in differen- tiating between chem-bio knowledge that is helpful or benign and that which can be misused or pose a significant threat, such as designing or reconstitut- ing pathogens more severe and deadly than those found in nature. Addressing these threats, which might involve masking malicious objectives behind seemingly benign actions, necessitates a broader approach to mitigation—including such measures as know-your-customer rules—than model-level inter- ventions. Building on this complexity, session partici- pants underscored the critical need to evaluate both domain-specific models (e.g., biological design tools) and general-purpose foundation models meticulously. The emphasis on domain-specific models stems from the unique risk profile associated with their misuse. Needs and Priorities for Next Round of Model Evaluations Access to Models and Evaluation Tools The next session underscored the crucial need for independent researchers to have access to both pro- prietary (closed-source) models and robust evaluation tools. This access could take various forms, such as black-box testing, white-box testing, fine-tuning, or support from model developers to facilitate researcher Participants highlighted the importance of establishing mechanisms for greater visibility into the model development phase— particularly for models planned to be open- sourced. 4 tral to the discussions were concerns about the fine line between enhancing the understanding of model capabilities and the potential misinterpretation of such validation efforts as steps toward creating harm- ful substances. Moreover, the session highlighted apprehensions regarding the validation of models’ potential to facilitate chem-bio threats