0% found this document useful (0 votes)
2 views

Abstractive Text Summarization Using Transformer Architecture

The paper presents an advanced 'abstractive' text summarization system that utilizes Transformer architecture and Natural Language Processing techniques to generate concise summaries from lengthy texts. It addresses the challenges of information overload by capturing the essence of documents, thereby making it easier for users to extract key points without reading the entire content. The proposed system aims to improve the accuracy and efficiency of automatic summarization compared to traditional extractive methods.

Uploaded by

uliseraja1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Abstractive Text Summarization Using Transformer Architecture

The paper presents an advanced 'abstractive' text summarization system that utilizes Transformer architecture and Natural Language Processing techniques to generate concise summaries from lengthy texts. It addresses the challenges of information overload by capturing the essence of documents, thereby making it easier for users to extract key points without reading the entire content. The proposed system aims to improve the accuracy and efficiency of automatic summarization compared to traditional extractive methods.

Uploaded by

uliseraja1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC)

Abstractive Text Summarization using Transformer


Architecture
2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC) | 979-8-3503-8459-8/24/$31.00 ©2024 IEEE | DOI: 10.1109/AIC61668.2024.10730840

Shubham Dhapola Siddhant Goel Daksh Rawat


Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
Graphic Era Hill University Graphic Era Hill University Graphic Era Hill University
Dehradun, India Dehradun, India Dehradun, India
[email protected] [email protected] [email protected]

Satvik Vats, SMIEEE Vikrant Sharma, SMIEEE


Computer Science and Engineering, Computer Science and Engineering,
Graphic Era Hill University; Graphic Era Hill University;
Adjunct professor, Graphic Era Deemed to be University, Adjunct professor, Graphic Era Deemed to be University,
Dehradun, 248002, India. Dehradun, 248002, India.
[email protected] [email protected]

Abstract— The internet has large amounts of textual data techniques. This method helps in reducing the time taken by
which has become a double-edged sword. The information is an individual to read and understand the lengthy articles.
always available but due to its high volume it is hard to find
the things which we are looking for. In the past, subject experts Due to the internet explosion, a large amount of data is at
were used to create comprehensions which was very time our fingertips, but is has also created the challenge
consuming and not financially viable. These comprehensions information overload. There is much more data present than
were made by reading lengthy texts, but often gave redundant we can process, but finding what we need is a struggle.
results when working on similar articles. To address this There are many processes and methods that are used to
problem, Automatic Text Summarization can be used. It helps extract the relevant information from the text like search
in generating concise summaries using information retrieval by
engines, recommendation systems, and question-answering
capturing the document’s essence which makes it easier for
users to find the key points from the original text. Imagine systems. Additionally, there are different techniques like
quickly grasping an article's core message before deciding on a text summarization and information extraction that simplify
deeper dive. This efficiency is invaluable in today's the complex information by extracting key points and
information-driven world. This paper proposes an advanced considering lengthy text, making it simpler and easier to
"abstractive" text summarization system that delves deeper understand.
than simply extracting key sentences. By leveraging Natural
Language Processing techniques, the system analyses the text's Automatic text summarization faces several hurdles,
structure, identifies salient points, and grasps their including pinpointing the main topic, analyzing the content,
relationships. By comprehending the underlying context and crafting the summary for the text, and judging the quality of
logic, the system generates summaries using novel phrasings the model. Currently, most of the systems rely on the
that accurately reflect the content, paving the way for a “extractive” approach of text summarization, which only
revolution in information navigation. finds out the key sentences from the text and gives the
summary by combining them. However, there is a need for
Keywords—Summarization, Natural Language Processing,
“abstractive” text summarization models. This model will
Transformers, Deep Learning
understand the actual meaning of the input text and then
I. INTRODUCTION summarize it by making meaningful sentences. This
technology would be a game-changer for navigating the
Every day, we encounter vast amounts of information. The
maximum part of this information is in the form of text, ever-growing sea of online information. This paper aims to
written articles, blogs, newspapers, etc. To understand the develop a system that can automatically generate the
content of these articles, we need to read each article summary for the given text with better accuracy by
thoroughly. Longer articles can be time consuming to understanding the core meaning of the text and not by just
comprehend. Articles which are based on similar topics copying existing sentences.
require more efforts to extract the non-redundant information
due to the information overlap. This paper discusses an II. LITERATURE SURVEY
approach of abstractive text summarization which reads an K. D. Garg et.al [1] They describe text summarization as
entire document and generates the summary which contains shortening the text document while preserving its context
the essence of the original document. The main goal of text and general idea. Every summary should have the main idea
summarization is to develop a system that can process the of the main text document. They have talked about using
article’s data using different Natural Language Processing several NLP techniques to extract text from the documents.

979-8-3503-8459-8/24/$31.00 ©2024 IEEE 13


DOI: 10.1109/AIC.2024.3
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
They have used an unsupervised machine learning technique words that can be used in sentences for generating accurate
to develop a Punjabi text summarizer which is an extractive summaries. It results that the scoring sentences that are
summarizer. They suggested various processes like based on the citation gives the better output.
tokenizing the text, extracting stop words from it, finding
C. Prakash et. al [7] Because of the digital revolution, huge
similarity matrix, ranking using similarity matrix, and
amount of data is available online, but is not very accurate.
creating a summary.
The search engines provide huge amounts of data which
K. Mona Teja et.al [2] Blind people read text using different can’t be processed by humans. To do that we need to get the
ways and one of them is Braille script, which is very abstract of the data without having to read the whole
ineffective since it is very time consuming and requires a lot document. ‘SAAR’ a human aided text summarizer is
of skill. The authors have proposed a new solution to help proposed which summarizes a single document. If the
visually impaired people. Their research tries to summarize generated summary is liked by the user, they can finalize it
news articles to decrease the reading time. They compared or else the model will again generate a new summary. The
various methods like Text Ranking Algorithm, Luhn’s performance was tested using different metrics precision,
Algorithm and Latent Semantic Analysis Algorithm in their recall and F1-score.
paper.
Kavita Ganesan et. al [8], Introduced Opinosis, a graph-
Tacha Jo [3] A modified KNN approach has been proposed based framework for abstractive text summarization. This
in which words are treated as features. Their research framework is aimed at processing data like documents,
approaches the text summarization task as a classification movies, reviews. The approach involves constructing an
problem. Their approach involves dividing the original Opinosis graph from the text where nodes represent words.
document into small chunks of paragraphs and sentences This framework utilizes different graph properties, such as
and every chunk is then classified as summary and non- collapsible structures, redundancy capture and gapped
summary. The chunks which are classified as part of the subsequence capture helps in generating abstractive
summary are then selected to form the final summary. The summaries. In the graph generated, those paths are selected
modified KNN shows a better performance and more which are valid and marked with high redundancy score.
compact representation of data. The selected paths are then ordered in the descending value
of their redundancy scores. Jaccard index is used to remove
N. Moratanch et. al [4], Their research compares various
the duplicate paths for comparing it with the human
abstractive text summarization techniques. There are two
generated summaries.
main approaches, the first is Semantic based abstractive text
summarization and the other is Structure based abstractive Chin-Yew Lin et. al [9], This research paper first introduced
text summarization. The author has also discussed various ROUGE (Recall Oriented Understudy for Gisting
techniques and challenges they came across while Evaluation). It is a set of metrics used to evaluate the quality
implementing abstractive text summarization approaches. of summaries by comparing them to reference summaries. It
is widely used in natural language processing tasks. Various
Pankaj Gupta et. al [5], analyzed various techniques used for
types of ROUGE scores are ROUGE-L, ROUGE-N,
text summarization and sentiment analysis. Their approach
ROUGE-W and ROUGE-S. ROUGE methods are effective
involves extracting the emotions of the text. The author has
for evaluating both single and multiple document
used two machine learning algorithms which are Naïve
summaries.
Bayes Classifier and Support Vector Machines. These
algorithms are used for sentiment analysis of the text. Text III. METHODOLOGY
Summarization leverages Natural Language Processing and
semantic characteristics of the sentences to find the
importance of words and sentences to include them in the
final summary. This paper presents a survey of research
done in the field of text summarization and sentiment
analysis by evaluating pros and cons of different strategies
and techniques.
Dharmendra Hingu et. al [6], The author has discussed
extractive text summarization in their paper. They used Figure 1: Methodology and its workflow
Wikipedia articles as an input in their system to identify text
scoring. In the initial step, sentences are tokenized by Figure 1 shows the overall workflow of the project. Initially,
pattern matching with regular expressions. Orthodox the text is pre-processed and then given to the transformer
methods are used to score the sentences, which helps in the model for generating the summarized text. Then we evaluate
classification task of whether they should be present in the the output using ROUGE scores.
final result or not. Scoring is helpful for identifying the

14
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
A. Pre-Processing
Data preprocessing is used for cleaning and transforming
raw data into the format which is required for analyzing.
Data preprocessing is used to reduce noise, for enhancing
the quality of data.
Data preprocessing involves handling missing values,
handling categorical data, feature engineering and splitting
the dataset [10]. Tokenization is an important task in the
model. It involves converting words into tokens. The
maximum and the minimum length of the summary is
determined at this stage.
Figure 2: Basic Transformer Architecture
B. The Transformer Architecture
The above diagram shows the major components of the
Transformer is an important deep learning architecture which Transformer architecture that are encoder and decoder. It is
was introduced by Google researchers in “Attention is All not necessary to use both the components in the Transformer
You Need” paper by Vaswani et al. The main goal of this architecture.
architecture was to help in natural language processing tasks
1.Encoder: It is concerned with the manipulation of input
but has been applied to other domains also by creating
sequences and the creation of a useful representation of it.
suitable embeddings.
The encoder primarily consists of several layers of feed-
The Transformer architecture is a bit advanced then RNNs forward neural networks, and self-attention mechanisms.
and LSTMs. It relies on a special kind of attention Each layer in the encoder is an independent layer and
mechanism called self-attention. Self-attention helps in information flows between these layers parallelly. The output
finding global dependencies between input and output of the encoder is a collection of contextualized
sequences. This makes parallelization possible and efficient, representations for each token in the input sequence, which
which is not possible in RNNs and LSTMs. captures both the local and global dependencies.
The transformer architecture is utilized in various 2.Decoder: It is responsible for generating output sequence
techniques of Natural Language Processing. It solves the based on the input received from the encoder. The decoder
tasks sequentially, and, at the same time, it successfully consists of multiple independent layers which are self-
deals with the long-term dependencies. It is important to attention and feed-forward neural network layers. During the
note that prior to the multi-head attention operation, there decoding process the encoder’s output and the previously
are several attentions layers existing in the transformer’s generated tokens are used for generating new tokens.
architecture. These apply in highlighting some features to
3.Attention: It is the foundation of the Transformer
recognize, the pattern of the words in each input dataset are
architecture and is used to capture dependencies between
acknowledged by the method of the positional encoding. different parts of input and output sequences. It computes
It also reinforces in minimizing the time taken to process weighted sum of input representations, where the weights are
different sets of data at the same time. The adjustment of the based on the relevance of each input token to the current
transformer network therefore departs from the RNN token being processed. Attention mechanisms help
process for the filtration of dataset [11]. Instead, it employs transformers to capture long-range dependencies and
the layers that correspond to it by connecting the multi-head improve performance on various sequence to sequence tasks.
attention layer and the feed-forward network layers.
The encoder analyses the text, converting words into
Speaking of several types of attention, self-attention or
numerical representations and understanding relationships
intra-attention is the process of comparing different
between them. This creates a compressed understanding of
positions within a single pattern in order to compute a
the document. The decoder then uses this information and
sequential design. In the transformer network, both the
attention to focus on key parts, building a concise summary
encoder & decoder models gain the ability of the attention
word by word. This method allows for faster processing and
mechanism with the help of ‘multi-head attention’ layer.
superior grasp of complex documents compared to
Transformer models utilize encoder-decoder roles for the
traditional techniques.
text summing up process.

15
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
𝑁𝑔𝑂
𝑅𝑒𝑐𝑎𝑙𝑙 = (2)
𝑁𝑔𝑅𝑆
Where, NgO = Number of overlapping n grams,
NgRS = Number of n grams in the reference summary
F1-Score :
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
= (3)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
2.ROUGE-L: Measures the longest common subsequence
(LCS) between the system and reference summaries.
Precision:
𝐿𝑒𝑛𝐿𝐶𝑆
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
𝑁𝑤𝐺𝑆
Where, LenLCS = Length of longest common subsequence,
NwGS = Number of words in the generated summary
Recall:
𝐿𝑒𝑛𝐿𝐶𝑆
𝑅𝑒𝑐𝑎𝑙𝑙 = (5)
𝑁𝑤𝑅𝑆
Where, LenLCS = Length of longest common subsequence,
NwRS = Number of words in the reference summary
Figure 3: Transformer Architecture
IV. RESULT
The above diagram shows different layers of encoder and
The ROUGE score results for the Text − To − Text Transfer
decoder in the Transformer architecture. The encoder uses
Transformer (T5) model is present in Table II.
self-attention to process input sequences and generate
embeddings, while the decoder uses self-attention and cross- Table 1: Result on the Dataset using T5-Base Model
attention to produce output sequences, enabling efficient
parallel processing and capturing long-range dependencies. ROUGE-N ROUGE-L

Natural language processing technique is used in the F1-Score 0.64 0.67


process of Abstractive summarization for the processing of
input data. A new summarized document is generated from Precision 0.89 0.83
the input document by extracting useful and important
Recall 0.56 0.57
information using this architecture.
C. Evaluation Metrics
ROUGE is a set of metrics used to evaluate the quality of
summaries by comparing them to reference summaries. It
compares the extent of similarity between the generated
summary or translation and the reference summaries or
translations.
1.ROUGE-N (unigram, bigram, n-gram): It measures the
similarity of n-grams in the system generated summary and
the reference summary.
Precision:
𝑁𝑔𝑂
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (1)
𝑁𝑔𝐺𝑆
Where, NgO = Number of Overlapping n grams,
NgGS = Number of n grams in the generated Summary Figure 4: ROUGE Score
Recall:

16
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
Figure 4 represents the ROUGE – L and ROUGE – N Score [3] Taeho Jo, “K Nearest Neighbor for Text Summarization using Feature
Similarity.” International Conference on Communication, Control,
for the dataset with all the parameters such as Recall, Computing and Electronics Engineering (ICCCCEE), 2017.
Precision and F1-Score. [4] N. Moratanch, Dr. S. Chitrakala, “A Survey on Abstractive Text
Summarization.” International Conference on Circuit, Powe and
V. CONCLUSION Computing Technologies (ICCPCT), 2016.
In conclusion of this research on abstractive text [5] Gupta, P., Tiwari, R. and Robert, N., 2016, April. Sentiment analysis
and text summarization of online reviews: A survey. In 2016
summarization, it highlights the key insights and International Conference on Communication and Signal Processing
contributions that have emerged from our research. We (ICCSP) (pp. 0241-0245). IEEE.
found that some models work better with abstractive [6] Dharmendra Hingu, Deep Shah, Sandeep S.Udmale, “Automatic
summarization and some with extractive summarization. We Conference on Communication, Information & Computing
Technology (ICCICT), 2015.
also found that the transformer architecture is the best [7] Prakash, C. and Shukla, A., “Human aided text summarizer saar using
architecture for text summarization problems as it solves the reinforcement learning” 2014 International Conference on Soft
problem of long-range dependencies and parallelization Computing and Machine Intelligence, pp. 83-87, 2014.
which is present in RNNs and LSTMs. [8] Kavita Ganesan, ChengXiangZhai, Jiawei Han, “Opinosis: A Graph-
Based Approach to Abstractive Summarization of Highly Redundant
Furthermore, the research shows that this text Opinions.” Proceedings of the 23rd International Conference on
summarization model works best on non-conversational text Computational Linguistics (Coling 2010), pages 340 348, 2010.
input. [9] Chin-Yew Lin, “Rouge: A Package for Automatic Evaluation of
Summaries.” Barcelona Spain, Workshop o Text Summarization
Branches Out, Post- Conference Workshop of ACL 2004.
VI. FUTURE WORK
[10] S. Vats et al., “Incremental learning-based cascaded model for
A lot of promising work is to be done in the field of text detection and localization of tuberculosis from chest x-ray images,”
summarization. This means inventing new architectures or Expert Syst Appl, vol. 238, p. 122129, Mar. 2024, doi:
10.1016/J.ESWA.2023.122129.
using transfer learning to allow models to perform better in
[11] V. Sharma et al., “OGAS: Omni-directional Glider Assisted Scheme
varied contexts. for autonomous deployment of sensor nodes in open area wireless
The following scenarios could be done- sensor network,” ISA Trans., Aug. 2022, doi:
10.1016/j.isatra.2022.08.001.
• Multi-document summarization could be done to
combine multiple documents and provide a
cohesive summary.
• We can explore methods which can help us control
the level of abstraction in the summary.
• User centered approaches for summarization that
considers user preference, need and context.
REFERENCES
[1] Garg, K.D., Khullar, V. and Agarwal, A.K., 2021, August.
Unsupervised machine learning approach for extractive Punjabi text
summarization. In 2021 8th International Conference on Signal
Processing and Integrated Networks (SPIN) (pp. 750-754). IEEE.
[2] Teja, K.M., Sai, S.M. and Kushagra, P.S., 2018, November. Smart
Summarizer for Blind People. In 2018 3rd International Conference
on Inventive Computation Technologies (ICICT) (pp. 15-18). IEEE.

17
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.

You might also like