Stanford Team Develops a Blueprint for a National Research Cloud

(Originally published by the Stanford Institute for Human-Centered Artificial Intelligence on October 6, 2021)

Scholars and students across disciplines outline how to build a resource to level computing and data barriers for academic AI research.

This month, researchers affiliated with the Stanford Institute for Human-Centered Artificial Intelligence released a blueprint for how to build a National Research Cloud (NRC), a system that would allow the broader AI academic research community to access the expensive computing and data resources to conduct fundamental and non-commercial AI research.

The report, a culmination of a multidisciplinary two-quarter practicum offered at Stanford Law School and based on dozens of interviews with leading computer scientists, government officials, and policy experts, outlines the necessary steps to create this resource. It focuses on key issues such as the technical infrastructure, data access model, and the organizational structure of such an effort, and outlines important considerations including privacy, intellectual property, and ethics.

“Throughout our research, we saw three primary themes,” says Stanford Law Professor Daniel E. Ho, an associate director of Stanford HAI. “We saw the need to rebalance AI research for long-term, academic, and non-commercial research; we saw systemic challenges in how non-industry researchers access compute and data resources; and we saw a range of legal and policy challenges that need to be addressed for implementing the NRC.”

The final report, which received and incorporated feedback from a wide range of government officials, academics, civil society, and industry representatives, offers a series of recommendations for the task force formed this year to study the feasibility of an NRC, part of the National Defense Authorization Act.

In this conversation, the paper’s authors — Dan Ho; HAI Privacy and Data Policy Fellow Jennifer King; and HAI Director of Policy Russell Wald, who helped shepherd the NRC proposal into legislation — discuss their main findings.

The National Research Cloud is a twofold project — offering not only compute power for scholars to conduct resource-heavy AI research, but also datasets, the basis of AI projects. How significant is the challenge of building this cloud? 

Daniel Ho
Daniel Ho, William Benjamin Scott and Luna M. Scott Professor of Law

Ho: In theory, the concept of the National Research Cloud is simple — to provide greater access to compute power and data to enable basic academic and non-commercial research. In practice, it is a massive undertaking. How should such a resource be built? Who should have access to it? How can government data be provided in a secure, privacy-preserving fashion? Government can play an important role in providing such a resource. Designed well, an NRC could foster long-term AI innovation, shift the center of gravity toward non-commercial domains, improve accountability by broadening who has access to resources, and strengthen public sector capacity and oversight. Our report delves into each of these challenges at some length to consider how to best achieve these lofty goals.

Your practicum analyzed what it would take to launch such a venture. What are the most significant hurdles your team discovered?

King: There are two major challenges. The first is how to build such a resource. On the one hand, the federal government has extensive experience building some of the most powerful supercomputers that are the envy of the world. On the other hand, many researchers have gravitated toward the commercial cloud, which would be able to scale more easily in the near term. We suggest a hybrid investment that builds on programs to provide academic access to commercial cloud providers, such as via the CloudBank program run by the National Science Foundation (NSF), while also investing in projects to build out infrastructure via grants to academic institutions (as the NSF does) or via contract as federal agencies do. It is generally recognized that owning infrastructure is cheaper if demand is nearly continuous, and these investments can help inform what the optimal path is going forward. This approach also allows for flexibility, depending on how the NRC ultimately is used by both researchers and government agencies.

The second major challenge is that there are serious privacy concerns with making government data about people more broadly available, such as health-related data from agencies like Veterans Affairs. Our report points to important initiatives like the proposal for a National Secure Data Service that can work in tandem with the NRC to figure out the right institutional framework that is secure and privacy-preserving, but also enables AI research to address some of the socially important questions posed by the public sector.

The U.S. government is a data treasure trove — weather data, housing prices, agriculture production, energy use, voting records, health care trends. Do you suggest all government data be available to scholars under an NRC?

Ho: Dissemination of government data has to be conducted in a secure, privacy-protecting way. We suggest a tiered access approach that builds on existing government frameworks for secure data access based on risk. Some datasets, such as weather and agricultural data, are lower risk and the NRC can be a great platform to enable broad access and analysis of such data, fueling research advances. Take the example of satellite imagery. It used to be that the U.S. government charged around $600 for each single satellite image from what is known as the Landsat Program. When the U.S. government made this imagery available for free in 2008, it fueled innovation, generating, by one estimate, $3 billion to $4 billion in annual economic benefits and providing critical insights on climate change, poverty, and habitat modification.

For access to higher risk data, we recommend that the NRC establish a streamlined data access application process, with potential input by government agencies, and host the data in a secure environment with technical privacy measures where appropriate. This proposal is closely connected to the important calls for the “National Secure Data Service,” which aims to streamline data access to address critical questions for evidence-based policy.

Conceivably, if any scholar has access to the NRC’s compute power, some may develop AI projects that raise ethical concerns. How would the NRC protect itself from being used by bad actors?

Jen King
HAI Privacy and Data Policy Fellow

King: We thought extensively about how to ensure that the usage of the NRC abides by ethical standards. One of the main countervailing concerns is about the role of government in reviewing research for ethics when AI ethics standards are, at this point, still quite vague. We make three recommendations for how the NRC can improve the current state of affairs.

First, when researchers apply for access beyond base-level compute or data, they should provide an ethics impact statement. HAI’s experiment with its grant proposal process showed promising results on this front. Second, the NRC should develop a petition process for handling complaints of unethical conduct or research practices. We caution that the review process should be subject to a high standard, given the concerns about government review of academic speech, and be conducted by an independent panel. Consider, for instance, whether you would want a political appointee reviewing whether an NRC user should be able to conduct research on a contested topic. Third, one of the reasons why we recommend that the NRC begin with academic researchers is that there is at least some baseline protection for research that includes human subjects and peer review processes. We ultimately recommend, however, that more investment is needed in alternative models for embedding ethical considerations, such as NIH’s approach to funding bioethicists as part of a research team.

How did HAI get involved in this effort?

Wald: HAI’s core mission is to promote human-centered AI, which fundamentally requires broadening who has access, input, and oversight of AI resources. In October 2019, HAI Co-Directors Fei-Fei Li and John Etchemendy proposed the idea of a public cloud that utilizes federal government datasets to offset the lack of resources researchers have in AI research. Later that year, they began reaching out to universities across the country to explore their computing needs and socialize the idea of a national research cloud. In March 2020, they announced that 21 other universities were joining Stanford in a call to create a National Research Cloud. The National Security Commission on Artificial Intelligence also included the concept in its first set of recommendations in March of 2020. HAI later worked with a coalition including academia, civil society, and industry to encourage Congress to pass the National AI Research Resource Task Force Act, which became law on Jan. 1, 2021. We are hopeful that if structured properly, the NRC will broaden access to resources for students and faculty at U.S. colleges and universities, allowing more people to scrutinize the impacts of AI. Our report emphasizes that it is critical for researchers across disciplines — from the humanities and social sciences, to engineering, medicine, law and beyond — to be involved in this work in order to ensure that the technology is being examined critically from diverse perspectives. An NRC can play a key role in ensuring this broad access.

One criticism that led to the concept of the NRC is that right now only wealthy private companies have access to the computing power needed for AI projects and dollars to pay for the datasets. Would private corporations have a role in the NRC?

Ho: One of the concerns motivating the NRC is the so-called “brain drain” of AI researchers away from academia into industry, weakening the infrastructure for basic and noncommercial research. Our White Paper hence focuses on expanding access for academic researchers.

That said, we expect that the NRC will involve industry in some capacity, be it in a contract to build high-performance computing centers, supplying hardware, or in the direct provision of cloud computing resources, as in NSF’s CloudBank program. As Jen mentioned, we recommend exploring both building public infrastructure and creating public-private partnerships. Numerous other countries have built the equivalent of national research clouds — e.g., Compute Canada, Japan’s Fugaku system — and the National Science Foundation has invested significantly in building high-performance computing centers across universities. A promising example of a public-private partnership in the United States was the COVID-19 HPC Consortium, a partnership across 43 academic, industry, and federal government entities, which provisioned 50K GPUs and 6.8 million cores to combat COVID-19.

What are the next steps? What do you want the task force to do with this information?

Wald: The task force has a series of benchmarks that are set by law. In June, the task force was formally constituted and announced. During the spring of 2022, the task force will submit an initial report, and then by fall of 2022, they will share a final report with Congress and the president. Many government boards are subject to transparency requirements, so the task force is holding open meetings that the public can attend, and anyone can access minutes of the proceedings from prior meetings at the task force website. The Office of Science and Technology Policy and NSF also issued a Request for Information, which sought comments from the public on this project; we submitted this White Paper as part of that process.

By convening a stellar group of engineering, business, and law students, interviewing a wide range of stakeholders, and having engaged in this research, we hope the White Paper is useful to the task force in spelling out the key NRC design decisions.