The final part of this blog series shows two illustrative examples of measuring character prominence in broadcast TV. We start with a simple, more intuitive example to understand the measurement of screen time. It shows 30 seconds of an episode of the TV show, Mock the Week (BBC2), along with a chart showing which individuals are relatively more and less prominent.
Demo of measuring prominence through screen time
We apply a face detector, a type of machine learning model, to identify faces on screen. Faces are indicated with a rectangular ‘bounding box’. The sequence of detected faces are called ‘face tracks’. The same person can have multiple face tracks in a clip, for example if the camera cuts to them multiple times. Each contains additional data like timestamps and face sizes. This is then aggregated and processed to compute an overall measure of prominence.
We define prominence as time spent on screen (as a clear and big enough face), with longer duration indicating higher relative prominence. This ‘relative prominence score’ is mostly driven by screen time, but also combines different aspects of prominence: e.g. positive weights were added for a larger face in a sea of faces, as well as for longer periods of screen time when the face is a solo face on-screen. The number value is not meaningful on its own, but allows for an approximation of who the most (and least) prominent people are in a particular video clip or episode.
Demo where measuring prominence is more difficult
While panel shows usually have many clear faces towards the camera, other programmes can pose more of a challenge. We use an episode from the American sitcom Black-ish Season 1 (ABC) to test and discuss the feasibility of generating character prominence metrics when there is a greater variety of camera angles and face sizes.
The short video clip is similarly processed to create a relative prominence score for faces that appear on screen. The video is broken down into even smaller components called ‘scenes’, as shown in the bottom-left chart. In this short clip, the grandma (played by Jenifer Lewis) was the most prominent, followed by the grandchildren next to her. This ranking of prominence is based entirely on the visual information from the short video. It takes quicker than real-time to process when using only one image per second.
Naturally, there are limitations around measuring prominence according to a computer model: faces that are partially out of view or faces as viewed from the side are sometimes not identified, for example. Missed detections can also be due to a face being too small or blurry. Also, a character can be a scene stealer with limited screen time, or say or do something which is highly impactful. The prominence scores can be expanded to include information like who is speaking. To identify all occurrences (face tracks) of the same character, we manually annotated these for the short demo clips we show here. But recent techniques such as face clustering with unknown numbers of characters can help scale up this analysis.
Despite the limitations, the illustrative demo shows how computer vision can be used to measure relative prominence on screen. The method provides a ranking of more (and less) prominent characters which can be incorporated with further analysis to generate new insights about representation on-screen.
How to widen the evidence base with computer vision
The methods here can be extended in many directions: we propose potential applications to widen the evidence base around on-screen representation for four groups.
First, for diversity leads and monitors, computer vision could be used, supplemented with manual review, to generate more frequent and richer data about representation. Through plugging evidence gaps (beyond presence, and across under-represented groups), computer vision can generate real-time insights of on-screen representation in broadcasts. Measurements can prompt rethinking around the stories which are told and funded.
Second, for content producers, computer vision may represent new opportunities to create new features for viewers to look for major and minor characters.
It is not just the responsibility of the regulator or diversity groups to request and compile data. The content producer can potentially use richer data around character prominence to create new production features. This can generate additional value for viewers and fans, e.g. allowing viewers to interact with a visual summary of more and less prominent characters across a series.
Third, for editors and commissioners, computer vision can be used to analyse character prominence before a show airs. Currently, evidence is gathered long after the broadcast date. As processing an episode for prominence metrics is quicker than real time, especially with downsampling, it is possible to run it post-production, so some concerns around representation can be addressed upstream. These methods can be helpful during commissioning, or during screen-writing or editing in between series.
Finally, researchers can form partnerships to better understand the models under which content rights holders can open up broadcast data for research, so that collections can be more commonly treated as data, and responsibly opened up to answer important social research questions.
For these areas of applications to develop, a great deal more research is needed addressing the ethical and logistical barriers to wide deployment across different types of programmes. Current benchmark datasets, which research methods are evaluated against, often include just a few TV programmes. Faces on screen have great variation in viewpoint, head pose, face size, skin reflectance and lighting. We need a better understanding of how these methods scale to a range of programmes. More annotated datasets can be shared and data standards for on-screen representation can be developed.
Inclusion is more than just numbers
Increasingly, screen industry bodies are formally addressing diversity. In 2020, for example, the new BAFTA diversity steering group was established and several broadcasters (the BBC, Channel 4, ITV, and Sky) renewed their inclusion and diversity commitments, all referencing the global anti-racism movement.
In this series of blogs, we focus on diversity data and computer vision. This is because a big part of evidencing progress revolves around effective measurement. But it bears repeating that representation is just one part of inclusion. And inclusion is of course far more than a numbers game.
If measurements are used for box-ticking, resulting representations will be tokenistic. Truly embedding inclusion into cultural production requires pushing systemic levers: such as re-evaluating creative risk, giving opportunities to and funding storytellers from different backgrounds, and regularly re-thinking what stories are worth telling and how these should be told. The ultimate goal of the measurements discussed here is that no group of people persistently feel mis-/under-represented by our mass media. As the BFI says, inclusion “fuels creativity, engages new audiences and makes good business sense.” A measurably more representative broadcast landscape is one step towards that goal.
The broadcast content we used was made available via the Learning on Screen’s BoB archive with permission from the Educational Recording Agency.