Bias & Fairness
Addressing bias in AI systems
Bias & Fairness
AI agents learn from data, and data reflects the history, prejudices, and inequalities of the real world. Without careful intervention, agents can perpetuate or even amplify these biases.
Types of Bias
1. Training Data Bias
If the data used to train the model is not representative, the model will skew.
- Example: A facial recognition system trained mostly on light-skinned faces performing poorly on dark-skinned faces.
2. Historical Bias
Data might accurately reflect reality, but that reality is unjust.
- Example: An algorithm predicting recidivism might be biased against certain demographics because historical arrest rates are biased, not because the individuals are inherently more criminal.
3. Evaluation Bias
The benchmarks used to test the model might themselves be narrow or culturally specific, hiding the model's failures in other contexts.
Fairness Metrics
How do we measure if an agent is "fair"? There is no single mathematical definition, and metrics often conflict.
- Demographic Parity: The acceptance rate (e.g., for a loan) should be equal across all groups.
- Start Learning: Ensuring the error rate (false positives/negatives) is equal across groups.
- Calibration: If the model predicts a 70% risk, it should be correct 70% of the time for all groups.
Mitigation Strategies
Pre-processing (Data)
- Oversampling: Deliberately adding more data from underrepresented groups.
- Debiasing: Reweighting data or removing sensitive attributes (though capabilities often correlates with other features).
In-processing (Model)
- Adversarial Training: Training a "critic" network to punish the main model if it relies on protected characteristics (like race or gender) to make predictions.
Post-processing (Output)
- Thresholding: Adjusting the decision threshold for different groups to achieve fairness metrics.
- Safety Filters: Layers that intercept and block biased or toxic outputs before they reach the user.
The Challenge of Nuance
Language models can exhibit subtle biases, such as professional stereotypes (e.g., assuming a "doctor" is male and a "nurse" is female). Detecting and correcting these in open-ended generation is an ongoing research challenge.