Using Data Science to Understand the Film Industry's Gender Gap

Dima Kagan, Thomas Chesney, Michael Fire

Research output: Working paper/PreprintPreprint

24 Downloads (Pure)


Data science can offer answers to a wide range of social science questions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subtitles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women`s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular--albeit flawed--measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportunities in the research and analysis of movies.
Original languageEnglish GB
StatePublished - 1 Mar 2019

Publication series

NamearXiv preprint


  • Computer Science - Social and Information Networks
  • Computer Science - Computers and Society
  • Physics - Data Analysis
  • Statistics and Probability


Dive into the research topics of 'Using Data Science to Understand the Film Industry's Gender Gap'. Together they form a unique fingerprint.

Cite this