TY - CHAP
T1 - Towards Automatic Textual Summarization of Movies
AU - Liu, Chang
AU - Last, Mark
AU - Shmilovici, Armin
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021/7/11
Y1 - 2021/7/11
N2 - With the rapidly increasing number of online video resources, the ability of automatically understanding those videos becomes more and more important, since it is almost impossible for people to watch all of the videos and provide textual descriptions. The duration of online videos varies in a extremely wide range, from several seconds to more than 5 h. In this paper, we focus on long videos, especially on full-length movies, and propose the first pipeline for automatically generating textual summaries of such movies. The proposed system takes an entire movie as input (including subtitles), splits it into scenes, generates a one-sentence description for each scene and summarizes those descriptions and subtitles into a final summary. In our initial experiment on a popular cinema movie (Forrest Gump), we utilize several existing algorithms and software tools for implementing the different components of our system. Most importantly, we use the S2VT (Sequence to Sequence—Video to Text) algorithm for scene description generation and MUSEEC (MUltilingual SEntence Extraction and Compression) for extractive text summarization. We present preliminary results from our prototype experimental framework. An evaluation of the resulting textual summaries for a movie made of 156 scenes demonstrates the feasibility of the approach—the summary contains the descriptions of three out of the four most important scenes/storylines in the movie. Although the summaries are far from satisfactory, we argue that the current results can be used to prove the merit of our approach.
AB - With the rapidly increasing number of online video resources, the ability of automatically understanding those videos becomes more and more important, since it is almost impossible for people to watch all of the videos and provide textual descriptions. The duration of online videos varies in a extremely wide range, from several seconds to more than 5 h. In this paper, we focus on long videos, especially on full-length movies, and propose the first pipeline for automatically generating textual summaries of such movies. The proposed system takes an entire movie as input (including subtitles), splits it into scenes, generates a one-sentence description for each scene and summarizes those descriptions and subtitles into a final summary. In our initial experiment on a popular cinema movie (Forrest Gump), we utilize several existing algorithms and software tools for implementing the different components of our system. Most importantly, we use the S2VT (Sequence to Sequence—Video to Text) algorithm for scene description generation and MUSEEC (MUltilingual SEntence Extraction and Compression) for extractive text summarization. We present preliminary results from our prototype experimental framework. An evaluation of the resulting textual summaries for a movie made of 156 scenes demonstrates the feasibility of the approach—the summary contains the descriptions of three out of the four most important scenes/storylines in the movie. Although the summaries are far from satisfactory, we argue that the current results can be used to prove the merit of our approach.
UR - http://www.scopus.com/inward/record.url?scp=85088470651&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-47124-8_39
DO - 10.1007/978-3-030-47124-8_39
M3 - Chapter
AN - SCOPUS:85088470651
SN - 978-3-030-47123-1
T3 - Studies in Fuzziness and Soft Computing
SP - 481
EP - 491
BT - Studies in Fuzziness and Soft Computing
PB - Springer
CY - Cham
ER -