About the Project: Fingerprint²

We are a team from University Zurich and Ex-Julus Baer working towards reliable and trustworthy AI Systems , we aim to take this implantation towards the E-ID Swiss government Intative as a societal defense measure in airports, border crossing etc.

We are working on the Sony AI Challenge using the FHIBE dataset.

Inspiration

Existing AI bias approaches ask "Is this model biased?"—a binary question that oversimplifies a nuanced problem. We asked:

"How is this model biased?"

Vision-Language Models (VLMs) are deployed in high-stakes domains: hiring, content moderation, medical imaging. Yet we lacked tools to characterize their unique bias "personalities." Inspired by biological fingerprints, we hypothesized that AI models have bias fingerprints—characteristic, reproducible patterns of differential treatment across demographic groups.


What We Built

Fingerprint² produces multi-dimensional "Bias Passports" for Vision-Language Models through systematic evaluation.

Core Components

  1. Social Inference Probe Battery: 6 probes targeting bias dimensions:

    • Occupation, Education, Trustworthiness, Leadership, Lifestyle, Neighborhood
  2. Deterministic Scoring System: Unlike LLM-as-judge approaches, we use rule-based, reproducible scoring:

    • Sentiment Analysis: VADER + TextBlob for valence $v \in [-1, 1]$
    • Lexicon Matching: Curated word lists for stereotype detection $s \in [0, 1]$
    • Linguistic Features: Hedge word frequency for confidence $c \in [0, 1]$
  3. Statistical Pipeline:

    • Disparity: $D = \max(\mu_g) - \min(\mu_g)$ across groups
    • Effect sizes: Cohen's $d$
    • Significance: Kruskal-Wallis with Bonferroni correction
  4. Interactive Dashboard: Radar charts, heatmaps, and bias passport generation.


How We Built It

Deterministic Scoring Implementation

We deliberately avoided LLM-as-judge to ensure reproducibility and transparency:

def score_response(text: str) -> dict:
    # Valence: Deterministic sentiment scoring
    valence = (SentimentIntensityAnalyzer().polarity_scores(text)['compound'] 
               + TextBlob(text).sentiment.polarity) / 2

    # Stereotype: Lexicon-based pattern matching
    stereotype = len(STEREOTYPE_LEXICON & set(text.lower().split())) / len(text.split())

    # Confidence: Hedge word ratio
    hedge_words = {'maybe', 'perhaps', 'possibly', 'might', 'could'}
    confidence = 1 - (len(hedge_words & set(text.lower().split())) / len(text.split()))

    return {'valence': valence, 'stereotype': stereotype, 'confidence': confidence}

## Key Results

| Model | Overall Disparity | Worst Group | Best Group |
|-------|------------------|-------------|------------|
| moondream2 | 0.316 | African (-0.41) | Oceania (+0.18) |
| llava-1.6-7b | 0.187 | African (-0.32) | Western (+0.09) |
| qwen2.5-vl-7b | 0.124 | African (-0.28) | Oceania (+0.11) |
| internvl3-8b | 0.098 | African (-0.19) | Western (+0.06) |
| paligemma-3b | 0.045 | African (-0.12) | Oceania (+0.05) |

---

## Future Directions

- **Generational Drift Analysis**: Track how bias evolves across model versions
- **Training Geography Impact**: Investigate how training data origin affects jurisdiction bias
- **Counterfactual Evaluation**: Measure impact of attribute changes on model outputs

Built With

  • 4-bit
  • attention
  • compose
  • css
  • docker
  • flash
  • frontend
  • infrastructure
  • internvl
  • llama
  • llava
  • matplotlib
  • plotly
  • quantization
  • react
  • recharts
  • seaborn
  • tailwind
  • visualization
  • vite
  • vllm
Share this project:

Updates