0% found this document useful (0 votes)
51 views

Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization

This Audio lectures and speeches contain a wealth of valuable information, but reviewing and extracting the key points can be tedious and time- consuming. This paper presents an automated system that uses speech recognition and text summarization techniques to identify and summarize the most salient content from spoken presentations. Audio is first transcribed to text via a speech recognition engine.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization

This Audio lectures and speeches contain a wealth of valuable information, but reviewing and extracting the key points can be tedious and time- consuming. This paper presents an automated system that uses speech recognition and text summarization techniques to identify and summarize the most salient content from spoken presentations. Audio is first transcribed to text via a speech recognition engine.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Automated Extraction and Augmentation of


Key Information from Audio using Speech
Recognition and Text Summarization
S Swaroop Kaushik1 Sanjit Kangovi2
Bangalore, India Bangalore, India

Abstract:- This Audio lectures and speeches contain a format, allowing for accurate translation of verbal
wealth of valuable information, but reviewing and communication into textual form. It is important to note that
extracting the key points can be tedious and time- ASR differs from voice recognition, which focuses on
consuming. This paper presents an automated system identifying an individual's specific voice.
that uses speech recognition and text summarization
techniques to identify and summarize the most salient Meanwhile, text summarization condenses lengthy
content from spoken presentations. Audio is first documents into concise overviews by extracting main points.
transcribed to text via a speech recognition engine. The By generating summarized content, it becomes easier to
resulting text is then processed by an extractive review online lectures and workshops.
summarization algorithm based on term frequency-
inverse document frequency (TF-IDF) to extract the most By integrating ASR and text summarization, this paper
important points. These summarized points can leverages NLP to mitigate the difficulties of online learning.
optionally be used to generate relevant supplementary Automated transcription of classes combined with
URLs that provide additional context or resources related summarized notes provides students and professionals with
to the topics covered. This system was developed to streamlined, digestible information. This assists
enable quick review of lectures and speeches by comprehension and retention as education continues adapting
automatically delivering condensed, relevant summaries. to a virtual landscape.

Keywords:- Speech Recognition, Extractive Summarization, The paper exemplifies how NLP and AI can enhance
TF-IDF, URLs. remote collaboration and learning during an unprecedented
shift to online platforms. With customized tools to target
I. INTRODUCTION unique challenges, technology can facilitate engagement,
understanding, and memory despite the limitations of distance
The COVID-19 pandemic had compelled a widespread learning.
transition from traditional in-person classrooms to online
education platforms. However, many students struggle to There are two main methods in order to summarize a
attend or consistently stay focused during virtual classes. given text, that is
Missing sessions or having difficulty concentrating can hinder
learning and retention when classes are held remotely. The  Abstractive Summarization
unique challenges posed by online learning environments In abstractive summarization, the given source text
require adapted solutions to support students as education document is paraphrased and shortened as required. With an
moved online amid the pandemic. abstraction algorithm, grammatical inconsistencies can be
avoided as compared to extractive summarization methods.
Amidst the global shift to online education, the impact The abstractive summarization uses trained data to create new
has been significant, affecting approximately 1.2 billion phrases and sentences that provide the most important
children across 186 countries, as highlighted by a report from information from the text.
UNESCO. To address these issues, this paper presents a
system that utilizes natural language processing (NLP)  Extractive Summarization
techniques to enhance the online education experience. Extractive summarization algorithms include extracting
the key phrases from the source text document and then
NLP allows computer algorithms to analyze and integrating them to generate a summary. There are no changes
comprehend human language. Its applications range from made to the words or phrases in the source text and the
handwriting recognition and speech recognition to creating summary is generated according to the given metrics.
chatbots and automatic text summarization. Extractive summarization algorithms do not require an
exhaustive set of training data and are comparatively less
In the context of online education, automatic speech complex than abstractive summarization techniques and thus
recognition (ASR) plays a crucial role. ASR algorithms are widely popular.
possess the ability to convert spoken speech into a written

IJISRT23OCT1674 www.ijisrt.com 1811


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
This paper introduces a method wherein real-time audio to-sequence RNNs in abstractive text summarization,
is converted into textual data by a speech-to-text module. specifically in the domain of food reviews.
Further, the textual data is used as input to the text
summarizer. The text summarizer uses TF-IDF to generate a In addition to these approaches, other methods have
shortened or summarized form of the textual data using been proposed. In [3], a supervised learning approach using
extractive summarization. Furthermore, based on the K-nearest neighbors (KNN) for extractive text summarization
keywords from the summary, a web scraping algorithm is is presented. The authors modify the traditional KNN model
used to search and collect data off the web related to the to consider feature and feature value similarity, aiming to
keywords and fetch the respective URLs. This method would enhance the summarization process. By mapping texts to
be considerably advantageous to minimize human effort in numerical vectors and assigning different weights to selected
time-crunch situations like speeches or lectures where neighbors, the model generates summaries based on the
summarizing the information on the go might pose a semantic relations between words. However, this approach
challenge. requires pre-trained tables for each domain during training,
adding manual effort to the process. Nonetheless, it offers a
This paper is organized into 5 sections. Section II promising technique for extractive summarization with
provides a literature review, examining existing research improved similarity measures and a modified KNN algorithm.
related to the paper's domain. This is followed by Section III,
which outlines the theoretical underpinnings of the algorithms Furthermore, [4] proposes a Pointer-Generator
used. The system procedure is then explained in Section IV. Abstractive Model with Part of Speech Features, combining
Section V describes the simulation methodology, while word vectors, parts of speech, CNN, and bi-directional LSTM.
Section VI discusses the results obtained. Finally, the paper TF-IDF is explored for keyword relevance in summarization
concludes by considering potential applications and directions [5], and K-means clustering with TF-IDF is introduced for
for future work. sentence clustering and summarization [6]. Each method has
its own strengths and weaknesses, with abstractive summaries
II. LITERATURE SURVEY being more human-like but requiring extensive training data,
while extractive methods are more adaptable but often lack
The objective of this literature survey is to evaluate the natural phrasing. To enhance extractive summarization,
methodologies present to implement text summarization and additional attention layers and network models have been
to note the ways in which problems such as huge suggested.
dimensionality, poor transparency, and unavailability of
proper datasets have been solved. Considering all the surveyed methods, it can be inferred
that the extractive summarization along with the TF-IDF
One notable approach is the use of Generative algorithm would be suitable for the chosen problem statement
Adversarial Networks (GANs) [1]. This framework consists to implement text summarization. The chosen algorithm will
of a generator and a discriminator. The generator employs not require large datasets to train the model and thus can make
reinforcement learning techniques to generate rewarded the model applicable to all the subject classes. It will also
summaries, overcoming exposure bias and non-differentiable have the advantage of time complexity in comparison to the
metrics. The discriminator acts as a text classifier, abstractive summarization methodology.
distinguishing between machine-generated and human-
generated summaries. Through a minimax game optimization III. THEORY
process, the proposed GAN-based approach achieves high-
quality abstractive summaries. The generator incorporates A. NLTK
bidirectional LSTM encoders, attention-based LSTM NLTK is a standard Python library that comes with
decoders, and a switching pointer-generator network. The prebuilt functions and utilities for convenience of usage and
discriminator utilizes a CNN architecture with pooling and implementation. It is among the most used NLP and
classification layers. This research significantly advances computational linguistics libraries. For our experiments, we
abstractive summarization methods, improving the quality and used the nltk library for data pre-processing such as
effectiveness of summary generation. Tokenization. Tokenization in NLP is the process of breaking
down or separating sentences into smaller units called tokens.
In a research paper [2], an abstractive text We used two types of tokenization for our experiments:
summarization approach using sequence-to-sequence
Recurrent Neural Networks (RNNs) is presented. Originally Sentence tokenization - Sentence tokenization,
designed for machine translation tasks, this model is adapted performed using the sent_tokenize method, involves splitting
for summarizing food reviews. It employs a two-layered RNN text into individual sentences. This tokenizer ensures that
network with LSTM cells and attention mechanisms applied individual words within a sentence remain intact during the
to the target text. By maximizing the conditional probability tokenization process.
of the target text sequence, the model generates
comprehensive summaries. However, this approach requires a Word tokenization - The process of segmenting a
substantial amount of training data and entails long sentence into individual words is called word tokenization.
computational time and resource-intensive hardware. The word_tokenize function, whose result is a list of words,
Nevertheless, it demonstrates the effectiveness of sequence- has been used. In machine learning applications, the word

IJISRT23OCT1674 www.ijisrt.com 1812


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
tokenization output can be transformed into a data frame to  Data Pre Processing
improve text understanding. It can also be used as an input for The data preprocessing involves the following methods:
text cleaning processes that come after, like stemming,
punctuation removal, and numeric character removal.  Tokenization –
In tokenization the text data is divided into smaller parts
 Additionally, we Employed the following NLTK called tokens. Two types of tokenization are employed: word
Functionalities for text Normalization: tokenization, which splits sentences into words using the
word_tokenize method, and sentence tokenization, which
 Stop Words Removal: splits text into individual sentences using the sent_tokenize
Stop words are words that occur commonly in a corpus method. Tokenization aids in pattern finding, stemming, and
such as “the”, “a”, “an”, “in” etc. that, in order to achieve text lemmatization processes.
normalization, must be eliminated from the corpus.
 Stemming –
 Stemming: Stemming is performed using the PorterStemmer
Stemming is the process of producing morphological algorithm to produce morphological variants of root/base
variants of a root/base word. It is the reduction of inflection words. This process reduces redundancy, as the word stem
from words. Words with the same origin will get reduced to a and their inflected/derived words often convey the same
form that may or may not be a word. PorterStemmer function meaning.
has been used for stemming.
 Creating a Frequency Table
B. Beautiful Soup and Cloud Client Libraries: Finding the most or least common terms based on the
Beautiful Soup is a Python package used for parsing algorithm's requirements and calculating word frequencies are
HTML and XML documents. The Cloud Client Libraries in crucial. As a result, a dictionary including all of the terms in
Python are used to access Google Cloud APIs our corpus as keys and their frequency of occurrence as values
programmatically. The libraries make things easier to grasp was created.
by offering high-level abstractions for the API. These libraries
have been used for speech processing and web scraping in our  Calculation of Term Frequency (TF)
experiments to retrieve links to important topics from the This stage involves calculating each word's term
summarized text. frequency (TF) and creating a matrix. The term "TF" denotes
the frequency with which a word occurs in the phrase relative
IV. PROCEDURE to its entire word count. Words with comparable frequencies
have similar TF scores, as can be seen by comparing the
The proposed methodology consists of a three-step frequency table to the TF matrix.
process, which includes the conversion of speech data to text
data using the Speech Recognizer class using Python. The  Creating a Table for Documents per Word
second step consists of summarizing the text data to provide a This again a simple table that helps in calculating the
readable and concise summary using NLTK libraries by IDF matrix. The number of sentences that contain a particular
implementing the Term Frequency- Inverse Document word is calculated, which is then used to generate the
Frequency (TF-IDF) methodology. This is implemented by documents per word table.
using extractive methods for summarization. After obtaining
the summary, the last step to improve learning is by analyzing  Calculation of Inverse Document Frequency (IDF)
the output to provide links to websites relevant to the topic In this step, the IDF score is determined by taking a
using web scraping. log10 of the number obtained by dividing the total number of
sentences by the number of sentences that contain the word.
A. Conversion of raw Data from Speech to Text The IDF matrix quantifies the uniqueness of words in a
The first step involves converting raw speech data into paragraph.
text data using Python's speech recognizer class. The Speech
Recognizer library is installed and imported into the class.  Calculation of TF-IDF
The Recognizer class utilizes a microphone to recognize and The TF-IDF score is obtained by multiplying the TF and
record speech input. The recorded speech data is passed to the IDF values for each key-value pair from the TF and IDF
Recognizer class's recognize_google method, which converts matrices. The resulting TF-IDF matrix assigns weightage to
the audio file into text. words within a document.

B. Generation of a Summary using TF-IDF Methodology  Scoring the Sentences


The second step focuses on generating a readable and We used the TF-IDF score to assign a paragraph's
concise summary using the TF-IDF methodology and the weight based on words in a sentence. This was performed by
NLTK libraries. The proposed method for text summarization taking the individual sentences from tokenized sentences and
encompasses the following steps: computing the sentence score to determine which sentences
are the most essential. Following the computation of scores,
the user-provided retention rate was used to determine which
sentences rank highest and are summarized accordingly.

IJISRT23OCT1674 www.ijisrt.com 1813


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Determining the Threshold history, science, and geography - were provided, mimicking
The threshold, which determines the inclusion or classroom lectures of 903, 1478, and 760 words respectively.
exclusion of sentences in the summary, can be found in For each topic, the threshold coefficient of the model was
various ways. In this paper, the average sentence score is manipulated across three values ranging from 0.3 to 1.1. With
calculated to generate the threshold. each threshold value, the model generated a summary and the
output word count was calculated. It was observed that lower
 Generation of Summary threshold coefficients resulted in summaries with higher word
The final summary is generated by considering the counts, while increasing the threshold reduced the word count
sentences whose scores exceed a certain threshold, ensuring and summary length. This is because the model passes only
an accurate and concise summary. those sentences that have a sentence score value exceeding the
specified threshold. Using higher threshold values ensures that
C. Generation of Relevant URLs only the most salient sentences are included, generating more
The third step involves generating relevant URLs related precise and concise summaries. The results demonstrate that
to the summarized text. Web scraping techniques are the summarization model can effectively adjust summary
employed to extract links that are relevant to the generated brevity by tuning the threshold parameter. Across the diverse
summary. This process entails two parts: the crawler and the lecture topics, the model successfully produced condensed
scraper. The crawler searches the web by following links, summaries while preserving the core content.
while the scraper is designed to extract data from websites.
The Beautiful Soup and Google libraries are utilized for this Additionally, the web scraping model was implemented
purpose. The number of links to be generated can be to fetch relevant links related to the summarization topics. The
controlled by adjusting the values of 'num,' 'stop,' and 'pause.' number of generated links could also be varied as needed.
values.
This simulation study provides promising indications that
V. SIMULATION the approach can condense long-form speech into concise
summaries of key points through automated extraction.
To evaluate the performance of the summarization Although Google Colab was utilized, the model can be
model, several simulations were conducted using Python and implemented in any Python-based integrated development
Google Colab. Speech recordings on three distinct topics - environment, such as Jupyter Notebook.

Table 1 Simulation Result


Type of Input No. of Input Words Avg. Sentence Score Threshold Coeff. No. of Output Words
0.08830773536 0.7 312
History 903
0.08830773536 0.3 432
0.07899019456 0.9 699
Science 1478
0.07899019456 1.2 220
0.07737628511 0.8 483
Geography 760
0.07737628511 1.1 160

VI. RESULTS audio lectures and speeches. The speech-to-text conversion


allows the textual content to be processed by the TF-IDF text
In multiple trials that were conducted with different test summarizer, identifying the most salient points. These
cases, output was obtained by conversion of raw data from summaries can then be used to recommend relevant
speech to text, generation of a summary using TF-IDF supplementary materials through URL generation, enhancing
methodology, and generation of relevant links related to the the knowledge gained from the original audio source. This
topic. approach could be useful for improving accessibility of audio
materials and enabling quick review of lectures or speeches.
The model was verified to convert the raw data which is Further work on improving speech recognition accuracy and
either in the form of live speech or recorded audio data to text, summarization quality could make this solution even more
post which it could make a summary of the converted speech effective. Overall, this paper shows promising techniques for
using the TF-IDF methodology where the raw data of about automatically extracting and augmenting important
800- 1500 words was reduced down to a summary 200-500 information from real life audio sources. Looking ahead, this
words by varying the threshold coefficient, and finally, with system could potentially be integrated into a dedicated
crawler and scraper, relevant URLs were generated. This hardware device to enable portable, automated lecture or
would simplify the understanding process for the user and speech summarization. Users could record audio through the
help with in-depth analysis. device and easily review the generated summary and
recommendations immediately after events. Developing this
VII. CONCLUSION into a handheld gadget would further increase the accessibility
and convenience of the solution.
In conclusion, this paper demonstrates the potential for
using speech recognition and extractive text summarization
techniques to automatically extract key information from

IJISRT23OCT1674 www.ijisrt.com 1814


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
REFERENCES [11]. G. Xu, Y. Meng, X. Qiu, Z. Yu and X. Wu,
"Sentiment Analysis of Comment Texts Based on
[1]. Liu, Linqing, et al. "Generative adversarial network BiLSTM," in IEEE Access, vol. 7, pp. 51522-51532,
for abstractive text summarization." Proceedings of 2019, doi: 10.1109/ACCESS.2019.2909919.
the AAAI conference on artificial intelligence. Vol. [12]. H. P. Luhn, "The Automatic Creation of Literature
32. No. 1. 2018. Abstracts," in IBM Journal of Research and
[2]. A. K. Mohammad Masum, S. Abujar, M. A. Islam Development, vol. 2, no. 2, pp. 159-165, Apr. 1958,
Talukder, A. K. M. S. Azad Rabby and S. A. Hossain, doi: 10.1147/rd.22.0159.
"Abstractive method of text summarization with [13]. Gupta, V., & Lehal, G.S., (2010). A survey of text
sequence to sequence RNNs," 2019 10th International summarization extractive techniques. Journal of
Conference on Computing, Communication and emerging technologies in web intelligence, 2(3),
Networking Technologies (ICCCNT), Kanpur, India, pp.258-268.
2019, pp. 1-5, doi: 10.1109/ICCCNT45670. [14]. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S.,
2019.8944620. Trippe, E.D., Gutierrez, J.B. and Kochut, K., (2017).
[3]. T. Jo, "K nearest neighbor for text summarization A brief survey of text mining: Classification,
using feature similarity," 2017 International clustering and extraction techniques. arXiv preprint
Conference on Communication, Control, Computing arXiv:1707.02919.
and Electronics Engineering (ICCCCEE), Khartoum,
Sudan, 2017, pp. 1-5, doi:
10.1109/ICCCCEE.2017.7866705.
[4]. Ren, Shuxia, and Zheming Zhang. "Pointer-Generator
Abstractive Text Summarization Model with Part of
Speech Features." 2019 IEEE 10th International
Conference on Software Engineering and Service
Science (ICSESS). IEEE, 2019.
[5]. Qaiser, Shahzad, and Ramsha Ali. "Text mining: use
of TF-IDF to examine the relevance of words to
documents." International Journal of Computer
Applications 181.1 (2018): 25-29.
[6]. Khan, Rahim, Yurong Qian, and Sajid Naeem.
"Extractive based text summarization using k-means
and tf-idf." International Journal of Information
Engineering and Electronic Business 10.3 (2019): 33.
[7]. J. N. Madhuri and R. Ganesh Kumar, "Extractive Text
Summarization Using Sentence Ranking," 2019
International Conference on Data Science and
Communication (IconDSC), 2019, pp. 1-3, doi:
10.1109/IconDSC.2019.8817040.
[8]. P. M. ee, S. Santra, S. Bhowmick, A. Paul, P.
Chatterjee and A. Deyasi, "Development of GUI for
Text-to-Speech Recognition using Natural Language
Processing," 2018 2nd International Conference on
Electronics, Materials Engineering & Nano-
Technology (IEMENTech), 2018, pp. 1-4, doi:
10.1109/IEMENTECH.2018.8465238.
[9]. J. Zenkert, A. Klahold and M. Fathi, "Towards
Extractive Text Summarization Using
Multidimensional Knowledge Representation," 2018
IEEE International Conference on Electro/Information
Technology (EIT), Rochester, MI, USA, 2018, pp.
0826-0831, doi: 10.1109/EIT.2018.8500186.
[10]. D. Inouye and J. K. Kalita, "Comparing Twitter
Summarization Algorithms for Multiple Post
Summaries," 2011 IEEE Third International
Conference on Privacy, Security, Risk and Trust and
2011 IEEE Third International Conference on Social
Computing, Boston, MA, USA, 2011, pp. 298-306,
doi: 10.1109/PASSAT/SocialCom.2011.31.

IJISRT23OCT1674 www.ijisrt.com 1815

You might also like