0% found this document useful (0 votes)
12 views

Multimodal Content Processing Across Media Sources

Uploaded by

Ariane Vincent
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Multimodal Content Processing Across Media Sources

Uploaded by

Ariane Vincent
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Multimodal Content Processing across

Media Sources
INTRODUCTION
● The Visual Audio Text Summarizer is introduced to meet the increasing demand for
efficient tools in online education by streamlining the process of distilling essential
insights from educational videos.

● It integrates visual and audio elements seamlessly, utilizing advanced natural


language processing and machine learning techniques to generate concise text
summaries, thus enhancing the accessibility and utility of video content.

● With a user-friendly design, this system aims to revolutionize how users interact
with educational videos, contributing to a more effective and personalized learning
experience in the digital education era.
OBJECTIVE
● Develop a system that can automatically extract key information from videos and to
create concise and informative summary of the video.

● Provide viewers with a more user-friendly and time-efficient way to navigate and
access video content, reducing the need to watch lengthy videos in their entirety.
PROBLEM STATEMENT
● In today's digital landscape, the vast and ever-growing volume of video content poses a
significant challenge for users seeking to efficiently access and comprehend
information.
● Videos often require a substantial time investment to watch in full. Furthermore,
content creators frequently publish lengthy videos, making it impractical for users
with time constraints or limited attention spans to consume the entire content.
● The rapid growth of online audio and video content presents a dual challenge and
opportunity for product teams working in this space. To tackle this, we propose an
innovative Visual Audio Text Summarization approach.
LITERATURE SURVEY
TITLE TECHNOLOGY/ ADVANTAGES DISADVANTAGES
PROTOCOL

Learning To Generate Headline Popularity


High-quality headlines. Lack of Explainability.
Popular Headlines Prediction Model.
Amin Omidvar and
Aijun An BART,ProphetNet,T5.

Ai-based video FFmpeg Small database for


Identifying different
summarization using summarization.
ffmpeg and speakers
nlp,Hansaraj
Wankhede1 R Bharathi
Kumar2 Ramtekkar4
Rachana
Chawke5 ,2023
TITLE TECHNOLOGY/ ADVANTAGES DISADVANTAGES
PROTOCOL

Video summarization Automatic Speech Handling Unpunctuated Extractive summarization


using speech Recognition(ASR), Transcripts
recognition and text NLP
summarization,Tirath
Tyagi, Lakshaya Dhari,
Yash Nigam, Renuka
Nagpal,2023

Abstractive Summarizer Natural Language Summarizing videos It does not make use of
for Youtube Processing , Machine based on its subtitle is sentences from the original
Videos,Sulochana Learning , Abstractive
the fastest way of content.
Devi,Rahul summarization .
Nadar,Alfredprem generating summary..
Lucas,2023
PROPOSED SYSTEM
PREPROCESSING
● The system includes a preprocessing module dedicated to categorizing videos
into two distinct paths: subtitle extraction or video to audio conversion.
● As part of its initial processing step, the system determines the presence of
subtitles within the video content.
● Upon detecting the presence of subtitles within the video, the system proceeds
to route the video towards the path designated for subtitle extraction.
● Conversely, in cases where no subtitles are identified, the system directs the
video towards the path designated for audio extraction.
● The decision-making process regarding video categorization and path
assignment is facilitated through the utilization of ffmpeg.
AUDIO EXTRACTION MODULE
● Process of extracting the audio from video
● Performed using MoviePy.
● MoviePy is a Python library used for video editing, processing, and
manipulation tasks. When it comes to extracting audio from a video file,
MoviePy offers a convenient and straightforward solution.
● After extracting the audio, the output is then forwarded to a text conversion
module to transcribe the spoken words into written text.
SUBTITLE EXTRACTION MODULE
● Extraction of subtitle from video.
● Performed by using Youtube Transcript API if the user input is a YouTube URL.
● Youtube Transcript API: Access and retrieve transcripts of videos hosted on the
YouTube platform.
● If the .srt file is available in the video then it is directly fetched and subtitle is
extracted.
TEXT CONVERSION MODULE
● Process of converting audio into text.
● The process can be done using any of the two models : IBM Watson and
Whisper
● IBM Watson: Extraction of text from audio using IBM Watson model.
● Whisper: Extraction of text using a pretrainde openai Whisper model.
SUMMARIZATION MODULE

● The extracted text undergoes summarization, which involves condensing the


content into shorter, more concise versions.
● There are two approaches to summarization: abstractive and extractive.
● Abstractive Summarization: This method utilizes pre-trained models like BART
and T5 to generate summaries by rewriting and synthesizing the content in a
new form.
● Extractive Summarization:Extractive summarization involves using pre-trained
models such as TF-IDF and NLTK to select and compile key sentences directly
from the original text without altering their content.
CONCLUSION
● Visual Audio Text Summarization is an innovative approach to tackle the challenge
of efficiently accessing and comprehending the vast amount of video content
available today. It can provide users with a concise and informative summary of
lengthy videos, saving them time and effort.

● It presents a promising solution for product teams working in the online audio and
video content space, enabling users to quickly grasp the key points of lengthy
videos.
REFERENCES
[1] Zuzana Cernekova, Ioannis Pitas, Senior Member, IEEE, and Christophoros Nikou, Member,
IEEE“] Information Theory-Based Shot Cut/Fade Detection and Video Summarization”.

[2] Muhammad Bagus Andra Department of Computer Science and Kumamoto University
Kumamoto, Japan “Automatic Lecture Video Content Summarization with Attention-based
Recurrent Neural Network”.

[3] Sandra E. F. de Avila†, Antonio da Luz Jr.†‡, Arnaldo de A. Araujo ́†, and Matthieu
Cord§†Computer Science Department — Federal University of Minas Gerais. “VSUMM: An
Approach for Automatic Video Summarization and Quantitative Evaluation” In IEEE Conference,
2016.

[4]Purnendu Banerjee Society for Natural Language Technology Research Module 130, SDF
Building Kolkata-700091, India. “Automatic Detection of Handwritten Texts from Video Frames of
Lectures”.

[5] S. Sah et al. “Semantic Text Summarization of Long Videos”. In: 2017 IEEE Winter Conference
on Applications of Computer Vision (WACV). 2017, pp. 989–997. DOI: 10.1109/WACV.2017.115.
[6] A. Dilawari and M. U. G. Khan. “ASoVS: Abstractive Summarization of Video Sequences”. In: IEEE
Access 7 (2019), pp. 29253–29263. DOI: 10. 1109/ACCESS. 2019.2902507.

[7] T. M. Moses and K. Balachandran. “A classified study on semantic analysis of video


summarization”. In: 2017 International Conference on Algorithms, Methodology, Models and
Applications in Emerging Technologies (ICAMMAET). 2017, pp. 1–6. DOI: 10 . 1109 /
ICAMMAET.2017.8186684.

[8] H. Li et al. “Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous
Text, Image, Audio and Video”. In: IEEE Transactions on Knowledge and Data Engineering 31.5
(2019), pp. 996– 1009. DOI: 10.1109/TKDE.2018.2848260.

[9]Chenyang Zhang and Yingli Tian. “Automatic video description generation via LSTM with joint two-
stream encoding”. In: 2016 23rd International Conference on Pattern Recognition (ICPR). 2016, pp.
2924–2929. DOI: 10.1109/ICPR.2016.7900081.

[10]David Ten. Keyword and Sentence Extraction with TextRank (pytextrank). 2018 (accessed
November 7, 2020). URL: https://xang1234.github.io/textrank/.

[11]K. Davila and R. Zanibbi, ‘‘Whiteboard video summarization via spatiotemporal conflict
minimization,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), Nov. 2017, pp. 355–
362.
THANK YOU

You might also like