Eddie Elliott | Time Data

RCDb (now named Watchwith) is tagging thousands of movies and TV shows with metadata that identifies time spans of actors/characters, products, scenes, music, and more. A typical feature length movie could have several thousand of these tags describing its content. RCDb uses the term Time Data for this timeline metadata. I spent a couple years at RCDb analyzing and visualizing Time Data for QC purposes and to help evaluate various data collection methods. I also got to work with faculty and students at USC's School of Cinematic Arts for a semester, exploring ways Time Data might can be used in scholarly study of films (Movie Tagger Alpha).

A few samples of my Time Data visualization work...

MetaBubbles

I started playing with visualizing Time Data just out of curiosity (while I was busy doing my "real" job at the time, writing code for Blu-ray discs). I was lucky to have access to the metadata early in the Time Data effort, where almost any visualization work offered unexpected insight into the collection/QC process. Some byproducts of my tinkering with visuals of the metadata were tools to complement and aid human review of the metadata. One of those tools was MetaBubbler, a utility that presents Time Data synchronized in time over the corresponding video. I used the MetaBubble video to check anomalies that would appear in my visualizations. For example, there is a long stretch in The Matrix where no actors appear in the film's scene stack. That turned out to be an animation sequence. In contrast, Mamma Mia!'s scene stack has spans with a couple dozen actors or more. Viewing the MetaBubbled video, I learned that all the actors in dance scenes had been tagged. Other times, an oddity would appear in a visualization, and double checking the MetaBubble video would verify that there was a flaw in the data. MetaBubbler started as a utility to augment visualizations, and soon became a mainstay of the Time Data QC process.

In this Mad Men MetaBubbles screenshot, actors appear in yellow and products in orange, complete with references back to the database entry for each item...

MetaBubbles screenshot for Mad Men, showing Time Data for four actors and three products

MetaBubbles view of a Mad Men episode

The Wild Hogs Time Data contains extensive product information. In this shot, each motorcycle model, as well as the helmet model and the sound track has been tagged...

MetaBubbles screenshot for Wild Hogs, showing Time Data for three actors and seven products

MetaBubbles view from Wild Hogs

More MetaBubbles samples from the Movie Tagger Alpha project (click below to view):

Scene Stacks

Starting with a movie timeline, Time Data partitions it by scenes:

scene partitions timeline

When we stack the scenes first to last, starting at the top, we quickly gain a sense of the rhythm of the narrative by comparing scene lengths. For example, in Pulp Fiction, it appears that about half the scenes are relatively short, and there is one exceptionally long scene about a third of the way into the film. In most movies, the credit roll appears as a long "scene" at the tail of the film. blank Pulp Fiction scene stack

Next, we add the actor and product data to the scenes. Time Data defines the span of each distinct actor or product appearance as an event with in- and out-points, and a description. Actor spans appear in blue. The darker the blue, the more actors that appear simultaneously at that point in the film. The magenta spans along the tops of scene strips indicate the presence of products. We can see that Pulp Fiction typically has very few actors appearing at once, but there are a few brief spots packed with more actors. Also, most scenes in Pulp Fiction include identifiable products, but there are a handful with no products. Pulp Fiction scene stack with actor and product information

More examples of scene stacks... (click below for details)

Big Picture

This poster shows the 1,637 titles in the dataset at the time the poster was produced, giving a sense of the historical span of the titles, and a summary glimpse of the data collected for each. The shades of blue delineate decades. We can see that the 1990s (the darkest blue, at the bottom of the poster) contained about 35% of the Time Data titles when this poster was made.

4' x 8' Time Data Big Picture poster, visualizint data for over 1,600 films

The following is a detail from the top-left of the poster. The earliest films in the Time Data collection date back to 1921. Each film is represented as a timeline strip, with marks indicating scene breaks. Each strip is labeled with the film's title, year, and scene/actor/product counts.

detail of the Time Data Big Picture poster, showing the heads of some films from 1921 to 1962

Detail of upper-left corner of the poster

Some counts from this set of films:

Titles

1,637 titles
min duration - 0:45:46, Everest
med duration - 1:49:24, The Wrestler
max duration - 4:11:15, Cleopatra

Scenes

85,864 scenes
min - 14 scenes, Tarzan - The Ape Man
med - 50 scenes, Brothers
max - 107 scenes, The Lord of the Rings: The Return of the King

Actors

1,846,174 actor events
min - 72 events, Richard Pryor - Live on Sunset Strip
med - 1,055 events, Duplicity
max - 3,892, D2 - The Mighty Ducks

Products

370, 433 product events
min - 9 events, Cleopatra, Tarzan - The Ape Man, Mulan
med - 181 events, In the Valley of Elah
max - 1,561 events, The Blind Side

That represents more than 4 months of continuous video, with the in- or out-point of some event occurring every 2.3 second on average. That's a whole lot of activity that has been meticulously collected and painstakingly verified.

Overview

Early in the Time Data project, when we had tagged only 87 films, I sorted them by duration and plotted each film's timeline, with marks at scene breaks. In this set, the durations range from around 75 minutes (Pokemon: The First Movie) to nearly 161 minutes (Out of Africa). As you might expect, children's films tend to appear on short end of this list.

overview graph of 87 films, sorted by duration

The surprising thing thing with this graph is how nearly linear the progression of film durations is for most of the films. Then, at around 130 minutes durations, the final 10% of the films burst into longer durations.

detail of the overvier graph, showing the exceptionally long films - longest: Out of Africa, at 2:40:58

Co-occurrence Diagrams

One indicator of relationships among characters is how much time they appear simultaneously in a film, which is easy to calculate from Time Data. The following network diagram shows co-occurrences of actors in Minority Report. Actor nodes are sized according to their total screen-time in the film. John Anderton, played by Tom Cruise, is the central character, appearing more than twice as often than any other character, so he ends up as a big blue dot in the center of the diagram. The lines between nodes indicate two actors that appeared simultaneously, and the line thickness reflects how much time the characters appear together. I used Gephi to compose the diagram. Gephi's layout and clustering algorithms positioned the nodes and colored the nodes and edges. I'm not sure how well the clustering and coloring represent actual cliques of characters in the film.

co-occurence diagram for Minority Report, showing Tom Cruise at the center, with far more screen time than all other actors

detail of the Minority Report co-occurence diagram, showing some of the connects to Tom Cruise

Titles

Scenes

Actors

Products

more Time Data visuals: Movie Tagger Alpha