Analysis of Legal Case Document Automated Summarizer
Analysis of Legal Case Document Automated Summarizer
Solan, India
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.
C. Extractive summarization 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
Selection of most relevant sentences is done from input 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
documents and relevant summary is created.
𝑒𝑞 − 1
D. Abstractive summarization
Summary is generated automatically using new sentences The recall gives the ratio between the true positives and
from the input document . the true positives + false negatives on a dataset as shown in
eq-2.
The text summarization has various applications like
Media monitoring, Newsletters, Financial research, Legal 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
contract analysis, Social media marketing, Question 𝑅𝑒𝑐𝑎𝑙𝑙 =
answering and bots. 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
IV. THE TASKS IN ARCHITECTURE OF AN AUTOMATED TEXT
SUMMARIZATION SYSTEM 𝑒𝑞 − 2
A. Pre-Processing
The F-score, or F1-score, is used as measure for accuracy on
Structured representation of the original text [10] is a dataset. The F-score combines the model’s precision and
produced using many linguistic techniques like removal of recall and is defined to calculate harmonic mean of
stop-words, sentences segmentation, stemming, part-of- precision & recall.
speech tagging, and words tokenization.
534
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.
VII. ALGORITHMS OF LEGAL CASE DOCUMENTS AUTOMATED Punctuation Removal means removal of all unwanted
SUMMARIZATION punctuations from text document excluding the dot
operator(.), as it acts as separator of sentence.
A. C4.5, Winnow Naive Bayes, and SVM
In Legal-domain Document, extra white spaces are inserted
to format the document, which adds more space, to cut the
The machine-learning methodologies such as C4.5,
Winnow, Naïve Bayes, and SVM worked on information size of document the Extra White spaces removal is
extraction which relies on semantic processing of legal- required.
domain case documents.
A method in which a root word is produced by the removal
of prefixes and suffixes of the words is proposed as Word
These algorithms are based on the characteristics like cue
words, sentences length, location, quotations objects and Stemming.
thematic phrases. Relevant phrases are identified by using relative frequency
technique, called as Identification of Key Phrase.
B. FLEXICON, SALOMON, LEXA Process of identification and separation of paragraph into a
sentence is termed as Sentence Segmentation.
Different systems were developed with the goal of explicitly
summarizing legal-domain based documents. Tokenization is termed as a process that splits Sentence into
individual words.
FLEXICON stands for Fast Legal Expert Consultant and
was one among the earliest systems developed. It is 2) Extraction of Features of Sentence
keyword based which refers terms in large database and
finds important regions of the text. Following Pre-processing technique, sentence feature
extraction methodology is applied where each sentence from
SALOMON which was introduced later uses cosine the document denotes a vector point. The Eleven features
similarity to find texts that are similar to group regions [4]. used in extraction methodology were used which are listed
LEXA is another system which is based on citation analysis as:
to generate automated summaries.
a) Sentence-Position feature
C. Hybrid Systems
In the Legal-based domain Documents, sentences appearing
The collaboration of techniques and methods from artificial at first paragraph and last paragraph are expected to contain
intelligence results in Hybrid system [11]. The following important information, hence highest score is given to them.
techniques were used in hybrid systems to provide the
automated summarization of Legal case documents. b) Proper Noun
• Matching technique using Keyword or Key phrase The sentence, that contains a greater number of Named
• Case-based technique Entity which starts with a Uppercase letter, referred as a
Proper Noun, may be considered as relevant sentence to be
Hybrid system frame work takes input to the system which included in an Automated Document Summary.
is a requested by the legal experts which are legal judgement
documents.
The system is loaded with past cases and roles are decided c) Sentence-Length
by preprocessing and hence structured text is maintained.
When the user wants to search certain case details the key Sentence Length is a process which identifies the best
phrase is matched, and the relevant of summarized sentences from the summary. The Length of sentence is
documents is generated. normalized by counting the number of words in a sentence.
d) Tf * isf
D. Feature-based Techniques of Extraction
Term-Weight calculates the importance of the sentences by
1) Preprocessing method tracing the frequency of re-occurrences of the terms within
Preprocessing method cleans the noisy text covering the document, termed as raw-term frequency. As importance
spelling and grammatical mistakes [12]. The pre-processing may not increase proportionately along the raw-term
methods are applied, named as Case Folding, Word frequency, there may be necessity of inverse-sentence
Stemming, Identification of Key Phrases, Sentence frequency, that will filter the words termed to be important
Segmentation and Tokenization, Removal of Stop Words, re-occurring words from the sentences.
Punctuations and White Spaces.
e) Sentence-Sentence Similarity
The translation of uppercase characters to lower case is
called as Case Folding like “CRIMINAL” TO “criminal”. Neighboring Sentences Similarity from a document is found
Removal of insignificant words which may appear using a measure referred as cosine-similarity.
recurrently but does not provide relevant meaning to text
processing can be termed as Removal of Stop Word. f) Citation
535
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.
Citations are needed in legal documents which are used for Six automated tools out of which four were online chosen
referring someone. The Common Law System is followed namely Auto Summarizer, Text Summarizer, Split Brain,
by India where judgements can be used for other case and SMMRY1 remaining were Apple Inc.’s Summary and
reference. V., VS. Denotes the citations. Galgani et al.’s summarization was used for comparison [4].
The legal domain experts were asked to provide summaries
g) Local and Layout Features for some selected cases.
Every Legal based Document has a layout structure, The ROUGE scores generated for each automated system
which has a head-note with information like court, judge, compared to the summaries legal expert is shown in Table I.
petitioner, respondent names and dates. Local Features and
the Layout Features has details that are related to Legal Case Summarizer could cover most important points of the
Document head note. case and proved to be most effective in generating
reasonable flow of events.
h) Paragraph-Structure feature
Every legal Document will have a distinct internal Table I ROUGE scores of automatic generated text summaries referred
from [4].
paragraph format hence high score can be assigned to first &
last paragraphs.
Auto Text Split SMM Apple Case- Galgani
Metrics
i) Thematic-Word Sum Sum Brain RY Sum Sum
In Legal- domain document final decision called as main
theme is specified by Thematic Word and should be Rouge1 .207 .183 .241 .248 .175 .194 .132
included into summary.
It means words/phrases which includes Legal RougeL .017 .015 .056 .062 .033 .061 .017
vocabularies for training data.
536
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.
The LetSum system shows the best performance as it Case Arguing .824 .787 .805
generated the higher score.
Case History .808 .796 .802
Table II ROUGE scores of statistical evaluations with LetSum showing Arguments .860 .846 .853
best scores reference values from [6].
Ratio
.924 .901 .912
Decidendi
ROUGE ROUGE ROUGE
ID ROUGE 1 ROUGE 2
3 4 L
Final Decision .986 .962 .974
Baseline .47244 .27569 .19391 .14472 .34683
Micro
Word .44473 .21295 .13747 .09727 .29652 .896 .864 .879
Average
Pertinen
ce .32833 .15127 .09798 .07151 .22375
Mining VIII. COMPARATIVE STUDY OF ALGORITHMS OF AUTOMATED
LetSum .57500 .31381 .20708 .15036 .45185 SUMMARIZATION OF LEGAL TEXT
The ROUGE scores are generated by comparison of
Mead .45581 .22314 .14241 .10064 .32089 automated generated summarization with the gold standard
summarization.
Table-4 shows the ROUGE scores generated by
G. Graphical Model (CRF) algorithmic methods of legal text summarization techniques
for the Indian supreme court legal-domain based case
Graphical models were one the of the models suggested for judgments .
automated legal-based text summarization task.
In the automated summarization process preprocessing stage Table IV Results of different Extractive legal summarization
Techniques are referred from [8].
starts with sending a input legal document in which the
division of document is done into sentences segments, and Methods Type Rouge 1 Rouge 2 Rouge L
F-Score
Re-call
F-Score
Re-call
F-Score
which are precise to of legal documents structure. In the
preprocessing stage the other NLP tools, like stemming,
filtering stop list word, etc., are also carried out.
Graphic .15 .34
S .386 .351 .171 .297 2.4
The resulting understandable words are used in the term al Model 9 3
distribution model called K-mixture model for the
normalization of terms. The document segmentation is done LetSum S .408 .298 .112
.07 .37
.235
10.
by identifying rhetorical status of each sentence using 3 1 16
graphical model (CRFs). The sentences that are selected at Case
term distribution model present the structured way of .06 .15 7.9
Summar US .198 .139 .094 .094
3 4 5
summary . izer
537
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.
drawback noticed with Letsum is that it has highest high level of research is still in process. Therefore more
execution time which can impact online applications. research is still required. The methodologies implemented
for legal documents could not provide the reasoning factor
Legal-domain experts analysed the overall quality of the for holding or ruling of the case. This feature plays an
automated summary by LetSum and gave the opinion that important role of the automated summary, as the legal
summary covered facts of case, statues and precedents. experts can decide whether to include or not the legal case
From the different legal text summarization techniques, for the reference. Therefore more research is still required.
Graphical Model technique and the LetSum technique has
comparable performance, while Case Summarizer technique REFERENCES
performs poor comparatively. [1] Lin, C. Y. (2004). Rouge: A package for automatic evaluation of
summaries. In Text summarization branches out (pp. 74-81).
IX. CONCLUSION [2] Yang, An, et al.(2018), "Adaptations of ROUGE and BLEU to better
evaluate machine reading comprehension task." arXiv preprint
The difficulty is encounterd in manual text summarization arXiv:1806.03578.
of the dense amount of the legal textual content , hence [3] Yang, A., Liu, K., Liu, J., Lyu, Y., & Li, S. (2018). Adaptations of
ROUGE and BLEU to better evaluate machine reading
Automated summarization appears to be the technolgy time comprehension task. arXiv preprint arXiv:1806.03578.
saving and instant referral to legal related issues.There are
[4] Polsley, S., Jhunjhunwala, P., & Huang, R. (2016). Case summarizer:
many automated legal text summarizers from the literature, a system for automated summarization of legal texts. In Proceedings
but they are still less efficient in performance from the of COLING 2016, the 26th international conference on
results compared to manual text summarization. The Computational Linguistics: System Demonstrations (pp. 258-262).
computer still finds to identify and understand the ‘‘most [5] Saravanan, M., Ravindran, B., & Raman, S. (2006). Improving legal
relevant” parts of the legal text. The main goal of applying document summarization using graphical models. Frontiers in
AI is not meant to replace the legal experts but to help Artificial Intelligence and Applications, 152, 51.
their work. Hence automated result, should be regarded as [6] Farzindar, A. (2004). Atefeh Farzindar and Guy Lapalme,'LetSum, an
automatic Legal Text Summarizing system in T. Gordon (ed.), Legal
reference. The legal professionals can deal with complex Knowledge and Information Systems. Jurix 2004: The Seventeenth
cases and the simple cases can be dealt by the Automated Annual Conference. Amsterdam: IOS Press, 2004, pp. 11-18. In Legal
model. However, for the safety purpose , these automated Knowledge and Information Systems: JURIX 2004, the Seventeenth
simple cases are required to be still be reviewed by experts. Annual Conference (Vol. 120, p. 11). IOS Press.
The LegalAI must act as a means of support to help legal [7] Radev, D., Otterbacher, J., Qi, H., & Tam, D. (2003). Mead reducs:
based system. Michigan at duc 2003. In Proceedings of the Document
Understanding Conference.
In this paper, we have to given special emphasis on different [8] Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., &
legal-domain text-based summarization techniques. This Ghosh, S. (2019). A comparative study of summarization algorithms
survey shows that significant work has been done on applied to legal case judgments. In European Conference on
extraction based summarization area, but abstractive based Information Retrieval (pp. 413-428). Springer, Cham.
summaries from legal text documents has to be explored in [9] Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., & Sun, M. (2020).
How does NLP benefit legal system: A summary of legal artificial
detail. intelligence. arXiv preprint arXiv:2004.12158.
The legal based text summarization methodologies applied [10] Gupta, V., & Lehal, G. S. (2010). A survey of text summarization
to legal case documents could not give the ruling or holding extractive techniques. Journal of emerging technologies in web
for the case added with a reason factor which is an intelligence, 2(3), 258-268.
important aspect of summary generation as this would help [11] Kavila, S. D., Puli, V., Raju, G. P., & Bandaru, R. (2013). An
automatic legal document summarization and search using hybrid
the legal experts for reference.
system. In Proceedings of the international conference on frontiers of
A successful text summarization technique is based on how intelligent computing: Theory and applications (FICTA) (pp. 229-
easily a user can understand the automated summary of the 236). Springer, Berlin, Heidelberg.
document. [12] Megala, S. S., Kavitha, A., & Marimuthu, A. (2014). Feature
extraction based legal document summarization. International Journal
The abstraction based summarization is almost similar to of Advance Research in Computer Science and Management Studies,
human summarization, but as it is a huge complex area and 2(12),346-352..
538
Authorized licensed use limited to: Zhejiang University. Downloaded on January 28,2024 at 06:59:56 UTC from IEEE Xplore. Restrictions apply.