Using Data Science to Understand the Film Industry's Gender Gap

Dima Kagan, Thomas Chesney, Michael Fire

Research output: Contribution to journalArticlepeer-review

16 Scopus citations
26 Downloads (Pure)


Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women‘s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular—albeit flawed—measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies.

Original languageEnglish
Article number92
JournalPalgrave Communications
Issue number1
StatePublished - 1 Dec 2020


  • Computer Science - Social and Information Networks
  • Computer Science - Computers and Society
  • Physics - Data Analysis
  • Statistics and Probability

ASJC Scopus subject areas

  • General Arts and Humanities
  • General Social Sciences
  • General Psychology
  • Economics, Econometrics and Finance (all)


Dive into the research topics of 'Using Data Science to Understand the Film Industry's Gender Gap'. Together they form a unique fingerprint.

Cite this