Research Guides: Generative AI Guide: Evaluating AI Outputs

VALID AI Test

Critically Evaluating GenAI Outputs

When it comes to evaluating text, information and other outputs from GenAI, there are a few key factors to consider. One important aspect is the accuracy of the tool. How reliable are the results it produces? Can it consistently identify patterns and make predictions that are correct? Another important factor is the transparency of the tool. Can you understand how it works and how it makes decisions? Is it clear what data it is using and how it is being processed?

An important consideration is the fairness and bias of the tool. Is the tool treating all individuals equally, or is it exhibiting bias towards certain groups of people? GenAI trained with biased datasets can continue to perpetuate discrimination and inequality.

Finally, it's essential to consider the ethical implications of using GenAI tools. Are there potential risks or unintended consequences of relying on this technology? Is the tool being used in a way that respects privacy and human dignity? By carefully evaluating AI tools on these different criteria, we can ensure that we are using them in a responsible and effective way.

A great way to quickly and critically evaluate generated content is by using the acronym:

VALID-AI

V: Validate data
- Assess the quality and reliability of the data used to train and test the AI model
- Is the data representative, unbiased, and relevant to the problem at hand?
A: Analyze algorithms
- Closely examine the algorithms used in the AI model
- How much human-reinforced learning is present in the AI model?
- Does human-reinforced learning result in biased results?
L: Legal and ethical considerations
- Evaluate whether the GenAI tool complies with legal regulations and ethical guidelines
- Is the GenAI model creator following laws regarding data privacy and consent?
I: Interpret how it works
- Assess whether the GenAI's "decisions" can be explained and understood by humans
- When using the GenAI model, is it clear how the AI is operating in the backend?
- Does querying with certain prompts result in a refusal to generate?
D: Diversity and bias
- Evaluating whether the training data and the GenAI tool itself are diverse enough to handle a wide range of inputs and scenarios.
- Is there obvious bias in the AI outputs that needs to be corrected?
- Does the tool respond well to certain prompts and produce ineffective results with others?
A: Accuracy check
- Assessing how accurate the GenAI model's predictions or classifications are compared to ground truth or human judgments.
- Check GenAI-generated results against real-world examples where relevant, does the generated content reflect certifiable truths?
I
- Are you using GenAI tools in an ethical way?
- Does your use of these tools affect any of the original creators when prompted with their names/projects/styles?

Adapted from the University of Toronto Libraries https://guides.library.utoronto.ca/image-gen-ai/critical-evaluation
Artificial Intelligence for Image Research © 2023 by Cathryn Copper is licensed under CC BY-NC-SA 4.0