Abstract | ||
---|---|---|
Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters - in a video or image - combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3470482.3479617 | PROCEEDINGS OF THE 27TH BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA '21) |
Keywords | DocType | Citations |
accessibility, deep learning, blind people, audio description | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Itamar Rocha Filho | 1 | 0 | 0.34 |
Felipe Honorato | 2 | 0 | 0.34 |
J. Wallace Lucena | 3 | 0 | 0.34 |
J. Pedro Teixeira | 4 | 0 | 0.34 |
Tiago Maritan | 5 | 0 | 0.34 |