Bias and Fairness Detection in Dataset using CNN
DOI:
https://doi.org/10.70670/sra.v3i2.828Abstract
As artificial intelligence (AI) continues to play a growing role in decision making processes across sensitive do- mains such as healthcare, finance, recruitment, and law en- forcement, concerns regarding algorithmic bias and fairness have become increasingly critical. These concerns often originate from imbalanced or biased training data, which can lead to discriminatory outcomes and reduced trust in AI systems. This research presents a CNN based system designed to detect bias and evaluate fairness in datasets before they are used for model training. The proposed system analyzes class distribution and applies statistical fairness metrics to assess whether a dataset is balanced or skewed toward specific outcomes or demographic groups. At its core, the system employs a Convolutional Neural Network (CNN) trained to identify imbalances within the data, particularly in multi class classification scenarios. The model is supported by additional fairness metrics, such as demographic parity and equal opportunity, which provide a comprehensive evaluation of potential bias.
To ensure robustness and adaptability, the system was tested on a variety of public datasets as well as two custom designed datasets developed during the course of the project. These custom datasets include encrypted files to reflect real world complexities, such as privacy preserving data formats and secure data handling. The model successfully processed these inputs and provided accurate predictions of fairness and bias.
The user friendly interface allows users to upload datasets, view predictions, and understand fairness scores visually, mak- ing the tool suitable for both technical and non technical stakeholders. The system aims to support AI practitioners by offering an early stage evaluation method that improves dataset transparency, increases trust in AI outcomes, and reduces the risk of unintended discrimination.
Overall, this research contributes a practical, scalable, and ethical solution for bias detection at the dataset level, serving as a step forward in the broader effort to promote fairness and accountability in artificial intelligence.