AI Lund Lunch seminar: Human and automated audio description for media accessibility
Title: Human and automated audio description for media accessibility
Speaker: Jana Holsanova, Senior lecturer, Cognitive Science. Lund University
When: 20 April at 12.00-13.15
Spoken language: English
Audio description provides access to audiovisual materials for people with visual impairments and blindness (BVI), offering a richer and more detailed understanding and experience of film and video. The task of the audio describer is to select relevant information from the visual scene (environments, objects, characters, their appearance, facial expressions, gestures and body movements, action) and verbalise it by using vivid descriptions in order to evoke inner images for the target audiences (Holsanova et al. 2020).
The audio describer must consider which information is conveyed by the sounds, music and dialogues (and can be perceived by the BVI audiences) and which information is expressed only visually (and cannot be perceived by the BVI audiences) in order to decide what needs to be described from the visual scene, how and when (Holsanova 2022).
In my presentation, I will illustrate some of the important activities of an audio describer and summarise the challenges. On the basis of the results from the MeMAD project (Braun et al. 2020, Starr et al 2020), I will then compare the performance of a human audio describer with the computer generated video descriptions and illustrate what today's automatic systems can handle and what they have difficulties with. For instance, visual saliency and narrative relevance, contextualisation and inferential capacity, coherence and temporal and narrative continuity. Finally, I will suggest that a model of event segmentation (the human ability to conceive the boundaries of when a narrative event starts and ends and what it contains) could be used to achieve a more human-like automated video description.