TY - JOUR
T1 - Weighing the benefits and risks of collecting race and ethnicity data in clinical settings for medical artificial intelligence
AU - Fiske, Amelia
AU - Blacker, Sarah
AU - Geneviève, Lester Darryl
AU - Willem, Theresa
AU - Fritzsche, Marie Christine
AU - Buyx, Alena
AU - Celi, Leo Anthony
AU - McLennan, Stuart
N1 - Publisher Copyright:
© 2025 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license
PY - 2025/4
Y1 - 2025/4
N2 - Many countries around the world do not collect race and ethnicity data in clinical settings. Without such identified data, it is difficult to identify biases in the training data or output of a given artificial intelligence (AI) algorithm, and to work towards medical AI tools that do not exclude or further harm marginalised groups. However, the collection of these data also poses specific risks to racially minoritised populations and other marginalised groups. This Viewpoint weighs the risks of collecting race and ethnicity data in clinical settings against the risks of not collecting those data. The collection of more comprehensive identified data (ie, data that include personal attributes such as race, ethnicity, and sex) has the possibility to benefit racially minoritised populations that have historically faced worse health outcomes and health-care access, and inadequate representation in research. However, the collection of extensive demographic data raises important concerns that include the construction of intersectional social categories (ie, race and its shifting meaning in different sociopolitical contexts), the risks of biological reductionism, and the potential for misuse, particularly in situations of historical exclusion, violence, conflict, genocide, and colonialism. Careful navigation of identified data collection is key to building better AI algorithms and to work towards medicine that does not exclude or harm marginalised groups.
AB - Many countries around the world do not collect race and ethnicity data in clinical settings. Without such identified data, it is difficult to identify biases in the training data or output of a given artificial intelligence (AI) algorithm, and to work towards medical AI tools that do not exclude or further harm marginalised groups. However, the collection of these data also poses specific risks to racially minoritised populations and other marginalised groups. This Viewpoint weighs the risks of collecting race and ethnicity data in clinical settings against the risks of not collecting those data. The collection of more comprehensive identified data (ie, data that include personal attributes such as race, ethnicity, and sex) has the possibility to benefit racially minoritised populations that have historically faced worse health outcomes and health-care access, and inadequate representation in research. However, the collection of extensive demographic data raises important concerns that include the construction of intersectional social categories (ie, race and its shifting meaning in different sociopolitical contexts), the risks of biological reductionism, and the potential for misuse, particularly in situations of historical exclusion, violence, conflict, genocide, and colonialism. Careful navigation of identified data collection is key to building better AI algorithms and to work towards medicine that does not exclude or harm marginalised groups.
UR - http://www.scopus.com/inward/record.url?scp=105000715313&partnerID=8YFLogxK
U2 - 10.1016/j.landig.2025.01.003
DO - 10.1016/j.landig.2025.01.003
M3 - Review article
AN - SCOPUS:105000715313
SN - 2589-7500
VL - 7
SP - e286-e294
JO - The Lancet Digital Health
JF - The Lancet Digital Health
IS - 4
ER -