Implementation and extended Replication: Unbiased Look at Dataset Bias

Last updated on Jun 20, 2024

CNN Architecture used for image classification

Introduction

The project revolves around understanding dataset bias in image classification tasks, inspired by the seminal work of Torralba and Efros (2011). It explores biases in datasets such as Caltech101, MSRC, and VOC-2007 using traditional methods like Histogram of Gradients (HOG) with Support Vector Machines (SVM) and modern Convolutional Neural Networks (CNN).

Literature Review

Biases in Image Datasets

Selection Bias: Occurs during dataset creation, leading to overrepresentation or underrepresentation of certain categories (e.g., too many racing cars).
Framing Bias: Arises from how images are composed or presented (e.g., all “car_side” images in Caltech101 are black and white).
Label Bias: Results from errors in image labeling or ambiguous semantic categories, impacting algorithm performance.

Methods

Datasets Description

Caltech101: Created in 2003 with 9000 images across 101 categories, including “car_side” images.
MSRC: Microsoft Research Cambridge dataset with around 4300 labeled images.
PASCAL VOC-2007: Contains around 5000 annotated images with 20 object classes, including “car”.

Preprocessing

Standardized all images to 224x224 pixels.
Applied binary labeling for car versus non-car images across datasets.

SVM Approach

Implemented HOG feature extraction followed by SVM classification.
Used 2-way SVM for binary classification.
Examined F1 scores for cross-dataset generalization, observing significant performance drops.

CNN Approach

Developed a CNN architecture with convolutional layers (32, 64, 128 filters), max-pooling layers, dense layers (128 neurons), and dropout layers.
Trained using Adam optimizer with binary cross-entropy loss.
Implemented Early Stopping to mitigate overfitting.
Evaluated F1 scores and observed performance degradation across different datasets.

Findings

SVM Results

Demonstrated lower F1 scores in cross-dataset testing, indicating susceptibility to dataset biases.
Highlighted the need for multi-label datasets to improve classification accuracy.

CNN Results

Showed similar performance trends as SVM but with different magnitudes of performance drops.
Noted instances of complete failure (F1 score of 0) in specific cross-dataset tests, suggesting deeper biases or model limitations.

Discussion

Discussed limitations such as computational constraints affecting model capacity.
Proposed future research directions including using more recent datasets and advanced object detection models.
Reflected on methodological biases introduced by the algorithms themselves, impacting bias assessment.

Authors

Christian Hobelsberger
Erwin Spoelstra
Sanjit Dasgupta