Jan 10 2024
Software

Agencies Exploring AI Use Cases Should Pay Close Attention to Governance

To deploy artificial intelligence, the government must protect data and training models.

There’s a hilarious scene in season four of the TV show The Office featuring in which main character Michael Scott willfully steers his rental car into Lake Scranton after his GPS tells him to make a right turn. The moment the lake water crests over the windshield is comedy gold.

The show is trying to make a point about the value of human-directed versus machine-directed decision-making, and as agencies begin evaluating use cases for artificial intelligence, they’re at risk for similar kinds of wrong turns if they don’t pay close attention to governance and assurance.

Popular generative AI chatbots have created excitement around large language models. Civilian, military and intelligence agencies are all considering ways they can apply LLMs to serve constituents better, operate more efficiently, and achieve their missions with greater speed and precision.

As agencies embark on their AI journeys, both the government and the public have rightly become concerned about the ethical use of AI. In response, President Joe Biden issued his Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence on Oct. 30 to help guide them as they leverage the technology.

Click the banner below to learn how to modernize your agency's digital experience.

The order builds on the administration’s Blueprint for an AI Bill of Rights and the Department of Defense’s earlier documentation outlining ethical AI principles. But while these efforts are well-intentioned and necessary, they risk emphasizing general guidelines for responsible AI over more immediate and practical solutions for safeguarding AI against error and misuse.

AI governance and assurance are necessary for the technology to become truly trustworthy, democratized and ubiquitous. The former covers the security and privacy of data used to train AI models, and the latter covers policies for the use of the technology and whether users can trust its outputs. Together, these efforts help ensure that AI models and outputs from agencies are secure, accurate and trustworthy.

How to Secure the Complete AI Lifecycle

AI is a lot like an iceberg. The part people are most aware of — dazzling, lightning-fast content, decisions or predictions — represents the visible tip, but it is only a small fraction of what makes AI work.

Enabling ChatGPT to produce a grocery list in the style of Shakespeare required years of data curation and model training.

Steven Orrin
Training with the wrong data set, or with data that has been compromised, can result in bias and inaccuracies.”

Steven Orrin Federal CTO and Senior Principal Engineer, Intel

Creation of an AI solution follows a complex lifecycle, and governance and assurance must cover it end to end. In particular, AI models require security and transparency through three phases:

Data ingestion: The data corpus used to train an AI model has a significant impact on whether outputs are accurate and trustworthy. For example, an AI solution intended to provide guidance on medications for the general population must be diverse across the board; it can’t be trained only with data on Caucasian males under age 25. Training with the wrong data set, or with data that has been compromised, can result in bias and inaccuracies.

It helps to have teams of data scientists with diverse backgrounds and experiences working on the data sets. They can help ensure unbiased data, adherence to AI ethical principles and the creation of trusted and responsible AI models at the beginning of the lifecycle.

Model implementation: Many AI solutions today are black box implementations, where the AI algorithm and the data used to train it remain under wraps. For many government use cases, this process can erode public trust.

DISCOVER: FedTech Influencers identify AI security issues.

Civilian agencies that don’t deal with more privileged data will typically use an open-source AI model trained on broad data sets taken from the internet. If they further train the model with agency-specific data, they’ll need to make sure the data is anonymized or that personally identifiable information is otherwise protected.

If the solution is intended to make recommendations about, say, which citizens should have their tax returns audited, then the agency should be transparent about how those decisions are made.

Model optimization: A characteristic of LLMs is that they’re continually fine-tuned with new data. Ideally these updates will make them more accurate, but they can also cause outputs to drift or degrade over time if the data becomes less representative of real-world conditions.

This reality also introduces security concerns because AI models can be poisoned with false or junk data, so it’s imperative that organizations carefully manage the data being used to refine their AI models.

New AI Capabilities Require New Security Protections

AI promises a wealth of new capabilities, but it also introduces a range of new cybersecurity threats, including:

Poisoning: Poisoning introduces false data into model training to trick the AI solution into producing inaccurate results.

Fuzzing: Malicious fuzzing presents an AI system with random “fuzz” of both valid and invalid data to reveal weaknesses.

Spoofing: Spoofing presents the AI solution with false or misleading information to trick it into making incorrect predictions or decisions.

Prompt injection: With prompt injection, attackers input malicious queries with the intention of generating outputs that contain sensitive information.

Protecting against these AI-specific threats involves cyber practices such as strong identity and access controls, penetration testing and continuous monitoring of outputs. Purpose-built solutions, such as input validation and anti-spoofing tools, are also increasingly available and valuable.

Another powerful way to protect AI data, models and outputs is through encryption while data is at rest (stored on a drive), in transit (traversing a network) and in use (being queried in a CPU). For many years, encryption was practical only when data was at rest or in transit; with new technology, however, it’s now available and ready for data in use.

RELATED: The White House wants agencies using quantum cryptography by 2035.

Such confidential computing is enabled by technology that sets aside a portion of the CPU as a secure enclave. Data and applications in the enclave are encrypted with a key that’s unique to the CPU, and the data remains encrypted as users access it.

Confidential computing is available in the latest generation of microprocessors and can encrypt data at the virtual machine or container level. Public cloud providers are also beginning to offer confidential computing services.

For agencies, confidential computing addresses a core tenet of zero-trust security because it applies protections to the data itself.

Encryption of data at rest, in transit and in use strengthens security across the AI lifecycle, from data ingestion to model implementation and optimization. By securing the building blocks of AI systems, organizations can achieve governance and assurance to make models and outputs more accurate for the agencies that use them and more trustworthy for the constituencies they serve.

Just_Super/Getty Images
Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT