Multimodal Content Processing Across Media Sources
Multimodal Content Processing Across Media Sources
Media Sources
INTRODUCTION
● The Visual Audio Text Summarizer is introduced to meet the increasing demand for
efficient tools in online education by streamlining the process of distilling essential
insights from educational videos.
● With a user-friendly design, this system aims to revolutionize how users interact
with educational videos, contributing to a more effective and personalized learning
experience in the digital education era.
OBJECTIVE
● Develop a system that can automatically extract key information from videos and to
create concise and informative summary of the video.
● Provide viewers with a more user-friendly and time-efficient way to navigate and
access video content, reducing the need to watch lengthy videos in their entirety.
PROBLEM STATEMENT
● In today's digital landscape, the vast and ever-growing volume of video content poses a
significant challenge for users seeking to efficiently access and comprehend
information.
● Videos often require a substantial time investment to watch in full. Furthermore,
content creators frequently publish lengthy videos, making it impractical for users
with time constraints or limited attention spans to consume the entire content.
● The rapid growth of online audio and video content presents a dual challenge and
opportunity for product teams working in this space. To tackle this, we propose an
innovative Visual Audio Text Summarization approach.
LITERATURE SURVEY
TITLE TECHNOLOGY/ ADVANTAGES DISADVANTAGES
PROTOCOL
Abstractive Summarizer Natural Language Summarizing videos It does not make use of
for Youtube Processing , Machine based on its subtitle is sentences from the original
Videos,Sulochana Learning , Abstractive
the fastest way of content.
Devi,Rahul summarization .
Nadar,Alfredprem generating summary..
Lucas,2023
PROPOSED SYSTEM
PREPROCESSING
● The system includes a preprocessing module dedicated to categorizing videos
into two distinct paths: subtitle extraction or video to audio conversion.
● As part of its initial processing step, the system determines the presence of
subtitles within the video content.
● Upon detecting the presence of subtitles within the video, the system proceeds
to route the video towards the path designated for subtitle extraction.
● Conversely, in cases where no subtitles are identified, the system directs the
video towards the path designated for audio extraction.
● The decision-making process regarding video categorization and path
assignment is facilitated through the utilization of ffmpeg.
AUDIO EXTRACTION MODULE
● Process of extracting the audio from video
● Performed using MoviePy.
● MoviePy is a Python library used for video editing, processing, and
manipulation tasks. When it comes to extracting audio from a video file,
MoviePy offers a convenient and straightforward solution.
● After extracting the audio, the output is then forwarded to a text conversion
module to transcribe the spoken words into written text.
SUBTITLE EXTRACTION MODULE
● Extraction of subtitle from video.
● Performed by using Youtube Transcript API if the user input is a YouTube URL.
● Youtube Transcript API: Access and retrieve transcripts of videos hosted on the
YouTube platform.
● If the .srt file is available in the video then it is directly fetched and subtitle is
extracted.
TEXT CONVERSION MODULE
● Process of converting audio into text.
● The process can be done using any of the two models : IBM Watson and
Whisper
● IBM Watson: Extraction of text from audio using IBM Watson model.
● Whisper: Extraction of text using a pretrainde openai Whisper model.
SUMMARIZATION MODULE
● It presents a promising solution for product teams working in the online audio and
video content space, enabling users to quickly grasp the key points of lengthy
videos.
REFERENCES
[1] Zuzana Cernekova, Ioannis Pitas, Senior Member, IEEE, and Christophoros Nikou, Member,
IEEE“] Information Theory-Based Shot Cut/Fade Detection and Video Summarization”.
[2] Muhammad Bagus Andra Department of Computer Science and Kumamoto University
Kumamoto, Japan “Automatic Lecture Video Content Summarization with Attention-based
Recurrent Neural Network”.
[3] Sandra E. F. de Avila†, Antonio da Luz Jr.†‡, Arnaldo de A. Araujo ́†, and Matthieu
Cord§†Computer Science Department — Federal University of Minas Gerais. “VSUMM: An
Approach for Automatic Video Summarization and Quantitative Evaluation” In IEEE Conference,
2016.
[4]Purnendu Banerjee Society for Natural Language Technology Research Module 130, SDF
Building Kolkata-700091, India. “Automatic Detection of Handwritten Texts from Video Frames of
Lectures”.
[5] S. Sah et al. “Semantic Text Summarization of Long Videos”. In: 2017 IEEE Winter Conference
on Applications of Computer Vision (WACV). 2017, pp. 989–997. DOI: 10.1109/WACV.2017.115.
[6] A. Dilawari and M. U. G. Khan. “ASoVS: Abstractive Summarization of Video Sequences”. In: IEEE
Access 7 (2019), pp. 29253–29263. DOI: 10. 1109/ACCESS. 2019.2902507.
[8] H. Li et al. “Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous
Text, Image, Audio and Video”. In: IEEE Transactions on Knowledge and Data Engineering 31.5
(2019), pp. 996– 1009. DOI: 10.1109/TKDE.2018.2848260.
[9]Chenyang Zhang and Yingli Tian. “Automatic video description generation via LSTM with joint two-
stream encoding”. In: 2016 23rd International Conference on Pattern Recognition (ICPR). 2016, pp.
2924–2929. DOI: 10.1109/ICPR.2016.7900081.
[10]David Ten. Keyword and Sentence Extraction with TextRank (pytextrank). 2018 (accessed
November 7, 2020). URL: https://xang1234.github.io/textrank/.
[11]K. Davila and R. Zanibbi, ‘‘Whiteboard video summarization via spatiotemporal conflict
minimization,’’ in Proc. 14th IAPR Int. Conf. Document Anal. Recognit. (ICDAR), Nov. 2017, pp. 355–
362.
THANK YOU