Abstractive Text Summarization Using Transformer Architecture
Abstractive Text Summarization Using Transformer Architecture
Abstract— The internet has large amounts of textual data techniques. This method helps in reducing the time taken by
which has become a double-edged sword. The information is an individual to read and understand the lengthy articles.
always available but due to its high volume it is hard to find
the things which we are looking for. In the past, subject experts Due to the internet explosion, a large amount of data is at
were used to create comprehensions which was very time our fingertips, but is has also created the challenge
consuming and not financially viable. These comprehensions information overload. There is much more data present than
were made by reading lengthy texts, but often gave redundant we can process, but finding what we need is a struggle.
results when working on similar articles. To address this There are many processes and methods that are used to
problem, Automatic Text Summarization can be used. It helps extract the relevant information from the text like search
in generating concise summaries using information retrieval by
engines, recommendation systems, and question-answering
capturing the document’s essence which makes it easier for
users to find the key points from the original text. Imagine systems. Additionally, there are different techniques like
quickly grasping an article's core message before deciding on a text summarization and information extraction that simplify
deeper dive. This efficiency is invaluable in today's the complex information by extracting key points and
information-driven world. This paper proposes an advanced considering lengthy text, making it simpler and easier to
"abstractive" text summarization system that delves deeper understand.
than simply extracting key sentences. By leveraging Natural
Language Processing techniques, the system analyses the text's Automatic text summarization faces several hurdles,
structure, identifies salient points, and grasps their including pinpointing the main topic, analyzing the content,
relationships. By comprehending the underlying context and crafting the summary for the text, and judging the quality of
logic, the system generates summaries using novel phrasings the model. Currently, most of the systems rely on the
that accurately reflect the content, paving the way for a “extractive” approach of text summarization, which only
revolution in information navigation. finds out the key sentences from the text and gives the
summary by combining them. However, there is a need for
Keywords—Summarization, Natural Language Processing,
“abstractive” text summarization models. This model will
Transformers, Deep Learning
understand the actual meaning of the input text and then
I. INTRODUCTION summarize it by making meaningful sentences. This
technology would be a game-changer for navigating the
Every day, we encounter vast amounts of information. The
maximum part of this information is in the form of text, ever-growing sea of online information. This paper aims to
written articles, blogs, newspapers, etc. To understand the develop a system that can automatically generate the
content of these articles, we need to read each article summary for the given text with better accuracy by
thoroughly. Longer articles can be time consuming to understanding the core meaning of the text and not by just
comprehend. Articles which are based on similar topics copying existing sentences.
require more efforts to extract the non-redundant information
due to the information overlap. This paper discusses an II. LITERATURE SURVEY
approach of abstractive text summarization which reads an K. D. Garg et.al [1] They describe text summarization as
entire document and generates the summary which contains shortening the text document while preserving its context
the essence of the original document. The main goal of text and general idea. Every summary should have the main idea
summarization is to develop a system that can process the of the main text document. They have talked about using
article’s data using different Natural Language Processing several NLP techniques to extract text from the documents.
14
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
A. Pre-Processing
Data preprocessing is used for cleaning and transforming
raw data into the format which is required for analyzing.
Data preprocessing is used to reduce noise, for enhancing
the quality of data.
Data preprocessing involves handling missing values,
handling categorical data, feature engineering and splitting
the dataset [10]. Tokenization is an important task in the
model. It involves converting words into tokens. The
maximum and the minimum length of the summary is
determined at this stage.
Figure 2: Basic Transformer Architecture
B. The Transformer Architecture
The above diagram shows the major components of the
Transformer is an important deep learning architecture which Transformer architecture that are encoder and decoder. It is
was introduced by Google researchers in “Attention is All not necessary to use both the components in the Transformer
You Need” paper by Vaswani et al. The main goal of this architecture.
architecture was to help in natural language processing tasks
1.Encoder: It is concerned with the manipulation of input
but has been applied to other domains also by creating
sequences and the creation of a useful representation of it.
suitable embeddings.
The encoder primarily consists of several layers of feed-
The Transformer architecture is a bit advanced then RNNs forward neural networks, and self-attention mechanisms.
and LSTMs. It relies on a special kind of attention Each layer in the encoder is an independent layer and
mechanism called self-attention. Self-attention helps in information flows between these layers parallelly. The output
finding global dependencies between input and output of the encoder is a collection of contextualized
sequences. This makes parallelization possible and efficient, representations for each token in the input sequence, which
which is not possible in RNNs and LSTMs. captures both the local and global dependencies.
The transformer architecture is utilized in various 2.Decoder: It is responsible for generating output sequence
techniques of Natural Language Processing. It solves the based on the input received from the encoder. The decoder
tasks sequentially, and, at the same time, it successfully consists of multiple independent layers which are self-
deals with the long-term dependencies. It is important to attention and feed-forward neural network layers. During the
note that prior to the multi-head attention operation, there decoding process the encoder’s output and the previously
are several attentions layers existing in the transformer’s generated tokens are used for generating new tokens.
architecture. These apply in highlighting some features to
3.Attention: It is the foundation of the Transformer
recognize, the pattern of the words in each input dataset are
architecture and is used to capture dependencies between
acknowledged by the method of the positional encoding. different parts of input and output sequences. It computes
It also reinforces in minimizing the time taken to process weighted sum of input representations, where the weights are
different sets of data at the same time. The adjustment of the based on the relevance of each input token to the current
transformer network therefore departs from the RNN token being processed. Attention mechanisms help
process for the filtration of dataset [11]. Instead, it employs transformers to capture long-range dependencies and
the layers that correspond to it by connecting the multi-head improve performance on various sequence to sequence tasks.
attention layer and the feed-forward network layers.
The encoder analyses the text, converting words into
Speaking of several types of attention, self-attention or
numerical representations and understanding relationships
intra-attention is the process of comparing different
between them. This creates a compressed understanding of
positions within a single pattern in order to compute a
the document. The decoder then uses this information and
sequential design. In the transformer network, both the
attention to focus on key parts, building a concise summary
encoder & decoder models gain the ability of the attention
word by word. This method allows for faster processing and
mechanism with the help of ‘multi-head attention’ layer.
superior grasp of complex documents compared to
Transformer models utilize encoder-decoder roles for the
traditional techniques.
text summing up process.
15
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
𝑁𝑔𝑂
𝑅𝑒𝑐𝑎𝑙𝑙 = (2)
𝑁𝑔𝑅𝑆
Where, NgO = Number of overlapping n grams,
NgRS = Number of n grams in the reference summary
F1-Score :
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
= (3)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
2.ROUGE-L: Measures the longest common subsequence
(LCS) between the system and reference summaries.
Precision:
𝐿𝑒𝑛𝐿𝐶𝑆
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
𝑁𝑤𝐺𝑆
Where, LenLCS = Length of longest common subsequence,
NwGS = Number of words in the generated summary
Recall:
𝐿𝑒𝑛𝐿𝐶𝑆
𝑅𝑒𝑐𝑎𝑙𝑙 = (5)
𝑁𝑤𝑅𝑆
Where, LenLCS = Length of longest common subsequence,
NwRS = Number of words in the reference summary
Figure 3: Transformer Architecture
IV. RESULT
The above diagram shows different layers of encoder and
The ROUGE score results for the Text − To − Text Transfer
decoder in the Transformer architecture. The encoder uses
Transformer (T5) model is present in Table II.
self-attention to process input sequences and generate
embeddings, while the decoder uses self-attention and cross- Table 1: Result on the Dataset using T5-Base Model
attention to produce output sequences, enabling efficient
parallel processing and capturing long-range dependencies. ROUGE-N ROUGE-L
16
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.
Figure 4 represents the ROUGE – L and ROUGE – N Score [3] Taeho Jo, “K Nearest Neighbor for Text Summarization using Feature
Similarity.” International Conference on Communication, Control,
for the dataset with all the parameters such as Recall, Computing and Electronics Engineering (ICCCCEE), 2017.
Precision and F1-Score. [4] N. Moratanch, Dr. S. Chitrakala, “A Survey on Abstractive Text
Summarization.” International Conference on Circuit, Powe and
V. CONCLUSION Computing Technologies (ICCPCT), 2016.
In conclusion of this research on abstractive text [5] Gupta, P., Tiwari, R. and Robert, N., 2016, April. Sentiment analysis
and text summarization of online reviews: A survey. In 2016
summarization, it highlights the key insights and International Conference on Communication and Signal Processing
contributions that have emerged from our research. We (ICCSP) (pp. 0241-0245). IEEE.
found that some models work better with abstractive [6] Dharmendra Hingu, Deep Shah, Sandeep S.Udmale, “Automatic
summarization and some with extractive summarization. We Conference on Communication, Information & Computing
Technology (ICCICT), 2015.
also found that the transformer architecture is the best [7] Prakash, C. and Shukla, A., “Human aided text summarizer saar using
architecture for text summarization problems as it solves the reinforcement learning” 2014 International Conference on Soft
problem of long-range dependencies and parallelization Computing and Machine Intelligence, pp. 83-87, 2014.
which is present in RNNs and LSTMs. [8] Kavita Ganesan, ChengXiangZhai, Jiawei Han, “Opinosis: A Graph-
Based Approach to Abstractive Summarization of Highly Redundant
Furthermore, the research shows that this text Opinions.” Proceedings of the 23rd International Conference on
summarization model works best on non-conversational text Computational Linguistics (Coling 2010), pages 340 348, 2010.
input. [9] Chin-Yew Lin, “Rouge: A Package for Automatic Evaluation of
Summaries.” Barcelona Spain, Workshop o Text Summarization
Branches Out, Post- Conference Workshop of ACL 2004.
VI. FUTURE WORK
[10] S. Vats et al., “Incremental learning-based cascaded model for
A lot of promising work is to be done in the field of text detection and localization of tuberculosis from chest x-ray images,”
summarization. This means inventing new architectures or Expert Syst Appl, vol. 238, p. 122129, Mar. 2024, doi:
10.1016/J.ESWA.2023.122129.
using transfer learning to allow models to perform better in
[11] V. Sharma et al., “OGAS: Omni-directional Glider Assisted Scheme
varied contexts. for autonomous deployment of sensor nodes in open area wireless
The following scenarios could be done- sensor network,” ISA Trans., Aug. 2022, doi:
10.1016/j.isatra.2022.08.001.
• Multi-document summarization could be done to
combine multiple documents and provide a
cohesive summary.
• We can explore methods which can help us control
the level of abstraction in the summary.
• User centered approaches for summarization that
considers user preference, need and context.
REFERENCES
[1] Garg, K.D., Khullar, V. and Agarwal, A.K., 2021, August.
Unsupervised machine learning approach for extractive Punjabi text
summarization. In 2021 8th International Conference on Signal
Processing and Integrated Networks (SPIN) (pp. 750-754). IEEE.
[2] Teja, K.M., Sai, S.M. and Kushagra, P.S., 2018, November. Smart
Summarizer for Blind People. In 2018 3rd International Conference
on Inventive Computation Technologies (ICICT) (pp. 15-18). IEEE.
17
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:50:43 UTC from IEEE Xplore. Restrictions apply.