Plain englist summarization of contracts
Plain englist summarization of contracts
1
Proceedings of the Natural Legal Language Processing Workshop 2019, pages 1–11
Minneapolis, Minesotta, June 7, 2019. c 2019 Association for Computational Linguistics
and case reports (Galgani et al., 2012), as well duce the novel dataset and provide details on the
as information extraction on patents (Tseng et al., level of abstraction, compression, and readability
2007; Tang et al., 2012). While some companies in Section 3. Next, we provide results and anal-
have conducted proprietary research in the sum- ysis on the performance of extractive summariza-
marization of contracts, this information sits be- tion baselines on our data in Section 5. Finally, we
hind a large pay-wall and is geared toward law discuss the potential for unsupervised systems in
professionals rather than the general public. this genre in Section 6.
In an attempt to motivate advancement in this
area, we have collected 446 sets of contract 2 Related work
sections and corresponding reference summaries Given a document, the goal of single document
which can be used as a test set for such a task.3 We summarization is to produce a shortened summary
have compiled these sets from two websites ded- of the document that captures its main semantic
icated to explaining complicated legal documents content (Nenkova et al., 2011). Existing research
in plain English. extends over several genres, including news (Over
Rather than attempt to summarize an entire doc- et al., 2007; See et al., 2017; Grusky et al., 2018),
ument, these sources summarize each document at scientific writing (TAC, 2014; Jaidka et al., 2016;
the section level. In this way, the reader can refer- Yasunaga et al., 2019), legal case reports (Gal-
ence the more detailed text if need be. The sum- gani et al., 2012), etc. A critical factor in success-
maries in this dataset are reviewed for quality by ful summarization research is the availability of
the first author, who has 3 years of professional a dataset with parallel document/human-summary
contract drafting experience. pairs for system evaluation. However, no such
The dataset we propose contains 446 sets of par- publicly available resource for summarization of
allel text. We show the level of abstraction through contracts exists to date. We present the first dataset
the number of novel words in the reference sum- in this genre. Note that unlike other genres where
maries, which is significantly higher than the ab- human summaries paired with original documents
stractive single-document summaries created for can be found at scale, e.g., the CNN/DailyMail
the shared tasks of the Document Understanding dataset (See et al., 2017), resources of this kind are
Conference (DUC) in 2002 (Over et al., 2007), a yet to be curated/created for contracts. As tradi-
standard dataset used for single document news tional supervised summarization systems require
summarization. Additionally, we utilize several these types of large datasets, the resources re-
common readability metrics to show that there is leased here are intended for evaluation, rather than
an average of a 6 year reading level difference be- training. Additionally, as a first step, we restrict
tween the original documents and the reference our initial experiments to unsupervised baselines
summaries in our legal dataset. which do not require training on large datasets.
In initial experimentation using this dataset, we The dataset we present summarizes contracts in
employ popular unsupervised extractive summa- plain English. While there is no precise defini-
rization models such as TextRank (Mihalcea and tion of plain English, the general philosophy is to
Tarau, 2004) and Greedy KL (Haghighi and Van- make a text readily accessible for as many English
derwende, 2009), as well as lead baselines. We speakers as possible. (Mellinkoff, 2004; Tiersma,
show that such methods do not perform well on 2000). Guidelines for plain English often suggest
this dataset when compared to the same methods a preference for words with Saxon etymologies
on DUC 2002. These results highlight the fact that rather than a Latin/Romance etymologies, the use
this is a very challenging task. As there is not cur- of short words, sentences, and paragraphs, etc.4
rently a dataset in this domain large enough for su- (Tiersma, 2000; Kimble, 2006). In this respect, the
pervised methods, we suggest the use of methods proposed task involves some level of text simplifi-
developed for simplification and/or style transfer. cation, as we will discuss in Section 4.2. However,
In this paper, we begin by discussing how this existing resources for text simplification target lit-
task relates to the current state of text summariza- eracy/reading levels (Xu et al., 2015) or learn-
tion and similar tasks in Section 2. We then intro- ers of English as a second language (Zhu et al.,
2010). Additionally, these models are trained us-
3
The dataset is available at https://github.com/
4
lauramanor/legal_summarization https://plainlanguage.gov/guidelines/
2
ing Wikipedia or news articles, which are quite
different from legal documents. These systems are
trained without access to sentence-aligned paral-
lel corpora; they only require semantically similar
texts (Shen et al., 2017; Yang et al., 2018; Li et al.,
2018). To the best of our knowledge, however,
there is no existing dataset to facilitate the transfer
of legal language to plain English.
3 Data
This section introduces a dataset compiled from
two websites dedicated to explaining unilateral
Figure 1: Unique n-grams in the reference summary,
contracts in plain English: TL;DRLegal5 and
contrasting our legal dataset with DUC 2002 single
TOS;DR6 . These websites clarify language within document summarization data.
legal documents by providing summaries for spe-
cific sections of the original documents. The data
was collected using Scrapy7 and a JSON interface While the multiple references can be useful for
provided by each website’s API. Summaries are system development and evaluation, the qualities
submitted and maintained by members of the web- of these summaries varied greatly. Therefore, each
site community; neither website requires commu- text was examined by the first author, who has
nity members to be law professionals. three years of professional experience in contract
drafting for a software company. A total of 361
3.1 TL;DRLegal sets had at least one quality summary in the set.
TL;DRLegal focuses mostly on software licenses, For each, the annotator selected the most informa-
however, we only scraped documents related to tive summary to be used in this paper.
specific companies rather than generic licenses Of the 361 accepted summaries, more than two-
(i.e. Creative Commons, etc). The scraped thirds of them (152) are ‘templatic’ summaries. A
data consists of 84 sets sourced from 9 docu- summary deemed templatic if it could be found
ments: Pokemon GO Terms of Service, TLDRLe- in more than one summary set, either word-for-
gal Terms of Service, Minecraft End User Licence word or with just the service name changed. How-
Agreement, YouTube Terms of Service, Android ever, of the 152 templatic summaries which were
SDK License Agreement (June 2014), Google selected as the best of their set, there were 111
Play Game Services (May 15th, 2013), Facebook unique summaries. This indicates that the tem-
Terms of Service (Statement of Rights and Re- platic summaries which were selected for the final
sponsibilities), Dropbox Terms of Service, and dataset are relatively unique.
Apple Website Terms of Service. A total of 369 summaries were outright rejected
Each set consists of a portion from the original for a variety of reasons, including summaries that:
agreement text and a summary written in plain En- were a repetition of another summary for the same
glish. Examples of the original text and the sum- source snippet (291), were an exact quote of the
mary are shown in Table 2. original text (63), included opinionated language
that could not be inferred from the original text
3.2 TOS;DR (24), or only described the topic of the quote but
TOS;DR tends to focus on topics related to user not the content (20). We also rejected any sum-
data and privacy. We scraped 421 sets of par- maries that are longer than the original texts they
allel text sourced from 166 documents by 122 summarize. Annotated examples from TOS;DR
companies. Each set consists of a portion of an can be found in Table 3.
agreement text (e.g., Terms of Use, Privacy Policy,
Terms of Service) and 1-3 human-written sum- 4 Analysis
maries.
4.1 Levels of abstraction and compression
5
https://tldrlegal.com/
6
https://tosdr.org/, CC BY-SA 3.0 To understand the level of abstraction of the pro-
7
https://scrapy.org/ posed dataset, we first calculate the number of n-
3
Source Facebook Terms of Service (Statement of Rights and Responsibilities) - November 15, 2013
Original Text Our goal is to deliver advertising and other commercial or sponsored content that is valuable to our
users and advertisers. In order to help us do that, you agree to the following: You give us permission
to use your name, profile picture, content, and information in connection with commercial, sponsored,
or related content (such as a brand you like) served or enhanced by us. This means, for example, that
you permit a business or other entity to pay us to display your name and/or profile picture with your
content or information, without any compensation to you. If you have selected a specific audience
for your content or information, we will respect your choice when we use it. We do not give your
content or information to advertisers without your consent. You understand that we may not always
identify paid services and communications as such.
Summary Facebook can use any of your stuff for any reason they want without paying you, for advertising in
particular.
Source Pokemon GO Terms of Service - July 1, 2016
Original Text We may cancel, suspend, or terminate your Account and your access to your Trading Items, Virtual
Money, Virtual Goods, the Content, or the Services, in our sole discretion and without prior notice,
including if (a) your Account is inactive (i.e., not used or logged into) for one year; (b) you fail to
comply with these Terms; (c ) we suspect fraud or misuse by you of Trading Items, Virtual Money,
Virtual Goods, or other Content; (d) we suspect any other unlawful activity associated with your
Account; or (e) we are acting to protect the Services, our systems, the App, any of our users, or
the reputation of Niantic, TPC, or TPCI. We have no obligation or responsibility to, and will not
reimburse or refund, you for any Trading Items, Virtual Money, or Virtual Goods lost due to such
cancellation, suspension, or termination. You acknowledge that Niantic is not required to provide a
refund for any reason, and that you will not receive money or other compensation for unused Virtual
Money and Virtual Goods when your Account is closed, whether such closure was voluntary or
involuntary. We have the right to offer, modify, eliminate, and/or terminate Trading Items, Virtual
Money, Virtual Goods, the Content, and/or the Services, or any portion thereof, at any time, without
notice or liability to you. If we discontinue the use of Virtual Money or Virtual Goods, we will
provide at least 60 days advance notice to you by posting a notice on the Site or App or through other
communications.
Summary If you haven’t played for a year, you mess up, or we mess up, we can delete all of your virtual goods.
We don’t have to give them back. We might even discontinue some virtual goods entirely, but we’ll
give you 60 days advance notice if that happens.
Source Apple Website Terms of Service - Nov. 20, 2009
Original Text Any feedback you provide at this site shall be deemed to be non-confidential. Apple shall be free to
use such information on an unrestricted basis.
Summary Apple may use your feedback without restrictions (e.g. share it publicly.)
Original Text When you upload, submit, store, send or receive content to or through our Services, you give Google
(and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative
works (such as those resulting from translations, adaptations or other changes we make so that your
content works better with our Services), communicate, publish, publicly perform, publicly display
and distribute such content.
Summary1 (best) The copyright license you grant is for the limited purpose of operating, promoting, and improving
existing and new Google Services. However, please note that the license does not end if you stop
using the Google services.
Summary2 The copyright license that users grant this service is limited to the parties that make up the service’s
broader platform.
Summary3 Limited copyright license to operate and improve all Google Services
Original Text We may share information with vendors, consultants, and other service providers (but not with ad-
vertisers and ad partners) who need access to such information to carry out work for us. The partners
use of personal data will be subject to appropriate confidentiality and security measures.
Summary1 (best) Reddit shares data with third parties
Summary2 Third parties may be involved in operating the service
Summary3 Third parties may be involved in operating the service
(rejected)
Table 3: Examples from TOS;DR. Contract sections from TOS;DR included up to three summaries. In each case,
the summaries were inspected for quality. Only the best summary was included in the analysis in this paper.
4
F-K C-L SMOG ARI Avg
Ref 12.66 15.11 14.14 12.98 13.29
Orig 20.22 16.53 19.58 22.24 19.29
5
Original Text: arise, unless, receive, whether, exam- • KLSum An algorithm introduced by
ple, signal, b, technology, identifier, expressly, trans- (Haghighi and Vanderwende, 2009) which
mit, visit, perform, search, partner, understand, conduct,
server, child, support, regulation, base, similar, purchase, greedily selects the sentences that mini-
automatically, mobile, agent, derivative, either, commer- mize the Kullback-Lieber (KL) divergence
cial, reasonable, cause, functionality, advertiser, act, ii, between the original text and proposed
thereof, arbitrator, attorney, modification, locate, c, in-
dividual, form, following, accordance, hereby, cookie, summary.
apps, advertisement
• Lead-1 A common baseline in news summa-
Reference Summary: fingerprint, fit, header, targeted,
involve, pixel, advance, quality, track, want, stuff, even, rization is to select the first 1-3 sentences of
guarantee, maintain, beacon, ban, month, prohibit, al- the original text as the summary (See et al.,
low, defend, notification, ownership, acceptance, delete,
user, prior, reason, hold, notify, govern, keep, class,
2017). With this dataset, we include the first
change, might, illegal, old, harmless, indemnify, see, as- sentence as the summary as it is the closest
sume, deletion, waive, stop, operate, year, enforce, tar- to the average number of sentences per refer-
get, many, constitute, posting
ence (1.2).
Table 5: The 50 words most associated with the origi- • Lead-K A variation of Lead-1, this baseline
nal text or reference summary, as measured by the log selects the first k sentences until a word limit
odds ratio.
is satisfied.
• Random-K This baseline selects a random
sentence until a word limit is satisfied. For
Odds(w, S) P (w|S)
log ' log this baseline, the reported numbers are an av-
Odds(w, D) P (w|D) erage of 10 runs on the entire dataset.
The list of words with the highest log odds ra- Settings We employ lowercasing and lemmati-
tios for the reference summaries (Ws ) and original zation, as well as remove stop words and punctu-
texts (Wd ) can be found in Table 5. ation during pre-processing10 . For TextRank, KL-
We calculate the differences (in years) of ARI Sum, Lead-K, and Random-K, we produce sum-
and F-K scores between Ws and Wd : maries budgeted at the average number of words
ARI(Wd ) − ARI(Ws ) = 5.66 among all summaries (Rush et al., 2015). How-
ever, for the sentence which causes the summary
F K(Wd ) − F K(Ws ) = 6.12 to exceed the budget, we keep or discard the full
Hence, there is a ∼6-year reading level distinction sentence depending on which resulting summary
between the two sets of words, an indication that is closer to the budgeted length.
lexical difficulty is paramount in legal text. Results To gain a quantitative understanding of
the baseline results, we employed ROUGE (Lin,
5 Summarization baselines
2004). ROUGE is a standard metric used for eval-
We present our legal dataset as a test set for con- uating summaries based on the lexical overlap be-
tracts summarization. In this section, we report tween a generated summary and gold/reference
baseline performances of unsupervised, extrac- summaries. The ROUGE scores for the unsuper-
tive methods as most recent supervised abstractive vised summarization baselines found in this paper
summarization methods, e.g., Rush et al. (2015), can be found in Table 6.
See et al. (2017), would not have enough training In the same table, we also tabulate ROUGE
data in this domain. We chose to look at the fol- scores of the same baselines run on DUC
lowing common baselines: 2002 (Over et al., 2007), 894 documents with sum-
mary lengths of 100 words, following the same
• TextRank Proposed by Mihalcea and Tarau
settings. Note that our performance is a bit differ-
(2004), TextRank harnesses the PageRank
ent from reported numbers in Mihalcea and Tarau
algorithm to choose the sentences with the
(2004), as we performed different pre-processing
highest similarity scores to the original docu-
and the summary lengths were not processed in the
ment.9
9
same way.
For this paper we utilized the TextRank package from
10
Summa NLP: https://github.com/summanlp/ NLTK was used for lemmatization and identification of
textrank stop words.
6
TLDRLegal TOS;DR Combined DUC 2002
R-1 R-2 R-L R-1 R-2 R-L R-1 R-2 R-L R-1 R-2 R-L
TextRank 25.60 8.05 18.62 23.88 6.96 16.96 24.03 7.16 17.10 40.94 18.89 36.70
KLSum 24.98 7.84 18.08 23.25 6.76 16.67 23.56 6.94 16.93 40.06 16.94 35.85
Lead-1 23.09 8.23 17.10 24.05 7.30 17.22 23.87 7.47 17.19 29.66 13.76 19.46
Lead-K 24.04 8.14 17.46 24.47 7.40 17.66 24.38 7.52 17.63 43.57 21.69 39.49
Random-K 21.94 6.19 15.84 22.39 6.17 16.01 22.32 6.33 16.09 35.75 14.12 31.91
Table 6: Performance for each dataset on the baselines was measured using Rouge-1, Rouge-2, and Rouge-L.
Crucially, ROUGE scores are much higher on shows that by having much shorter sentences, the
DUC 2002 than on our legal dataset. We speculate reference summary is able to cover more of the
that this is due to the highly abstractive nature of original text. (3b) is able to restate 651-word orig-
this data, in addition to the divergent styles of the inal text in 11 words.
summaries and original texts.
Finally, in (4), the sentences from the original
In general, Lead-K performed best on both text are extremely long, and thus the automated
TOS;DR and DUC 2002. The performance gap summaries, while only having one sentence, are
between TextRank and Lead-K is much larger on 711 and 136 words respectively. Here, we also see
DUC 2002 than on our dataset. On the legal that the reference summaries have a much differ-
datasets, TextRank outperformed Lead-K on TL- ent style than the original text.
DRLegal and is very close to the performance
of Lead-K on TOS;DR. Additionally, Random-K
performed only about 2 ROUGE points lower than
Lead-K on our dataset, while it scored almost 8 6 Discussion
points lower on the DUC 2002 dataset. We at-
tribute this to the structure of the original text; Our preliminary experiments and analysis show
news articles (i.e. DUC 2002) follow the inverse that summarizing legal contracts in plain English
pyramid structure where the first few sentences is challenging, and point to the potential useful-
give an overview of the story, and the rest of the ar- ness of a simplification or style transfer system
ticle content is diverse. In contracts, the sentences in the summarization pipeline. Yet this is chal-
in each section are more similar to each other lex- lenging. First, there may be a substantial domain
ically. gap between legal documents and texts that ex-
isting simplification systems are trained on (e.g.,
Qualitative Analysis We examined some of the Wikipedia, news). Second, popular supervised ap-
results of the unsupervised extractive techniques proaches such as treating sentence simplification
to get a better understanding of what methods as monolingual machine translation (Specia, 2010;
might improve the results. Select examples can Zhu et al., 2010; Woodsend and Lapata, 2011;
be found in Table 7. Xu et al., 2016; Zhang and Lapata, 2017) would
As shown by example (1), the extractive sys- be difficult to apply due to the lack of sentence-
tems performed well when the reference sum- aligned parallel corpora. Possible directions in-
maries were either an extract or a compressed ver- clude unsupervised lexical simplification utilizing
sion of the original text. However, examples (2-4) distributed representations of words (Glavaš and
show various ways the extractive systems were not Štajner, 2015; Paetzold and Specia, 2016), unsu-
able to perform well. pervised sentence simplification using rich seman-
In (2), the extractive systems were able to select tic structure (Narayan and Gardent, 2016), or un-
an appropriate sentence, but the sentence is much supervised style transfer techniques (Shen et al.,
more complex than the reference summary. Uti- 2017; Yang et al., 2018; Li et al., 2018). However,
lizing text simplification techniques may help in there is not currently a dataset in this domain large
these circumstances. enough for unsupervised methods, nor corpora un-
In (3), we see that the reference summary is aligned but comparable in semantics across legal
much better able to abstract over a larger portion of and plain English, which we see as a call for fu-
the original text than the selected sentences. (3a) ture research.
7
Reference Summary librarything will not sell or give personally identifiable information to any third party.
Textrank, Lead-K no sale of personal information. librarything will not sell or give personally identifiable information to
(1a)
any third party.
KLSum this would be evil and we are not evil.
Reference Summary you are responsible for maintaining the security of your account and for the activities on your account
(1b) TextRank, KLSum, you are responsible for maintaining the confidentiality of your password and account if any and are
Lead-K fully responsible for any and all activities that occur under your password or account
Reference Summary if you offer suggestions to the service they become the owner of the ideas that you give them
(2a) TextRank, KLSum, if you provide a submission whether by email or otherwise you agree that it is non confidential unless
Lead-K couchsurfing states otherwise in writing and shall become the sole property of couchsurfing
Reference Summary when the service wants to change its terms users are notified a month or more in advance.
(2b) TextRank in this case you will be notified by e mail of any amendment to this agreement made by valve within
60 sixty days before the entry into force of the said amendment.
Reference Summary you cannot delete your account for this service.
(2c) TextRank, KLSum, please note that we have no obligation to delete any of stories favorites or comments listed in your
Lead-K profile or otherwise remove their association with your profile or username.
Original Text by using our services you are agreeing to these terms our trainer guidelines and our privacy policy. if
you are the parent or legal guardian of a child under the age of 13 the parent you are agreeing to these
terms on behalf of yourself and your child ren who are authorized to use the services pursuant to these
terms and in our privacy policy. if you don t agree to these terms our trainer guidelines and our privacy
(3a) policy do not use the services.
Reference Summary if you don t agree to these terms our trainer guidelines and our privacy policy do not use the services.
TextRank by playing this game you agree to these terms. if you re under 13 and playing your parent guardian
agrees on your behalf.
KLSum, Lead-K by using our services you are agreeing to these terms our trainer guidelines and our privacy policy.
Original Text subject to your compliance with these terms niantic grants you a limited nonexclusive nontransferable
non sublicensable license to download and install a copy of the app on a mobile device and to run
such copy of the app solely for your own personal noncommercial purposes. [...] by using the app you
represent and warrant that i you are not located in a country that is subject to a u s government embargo
or that has been designated by the u s government as a terrorist supporting country and ii you are not
listed on any u s government list of prohibited or restricted parties.
Reference Summary don t copy modify resell distribute or reverse engineer this app.
(3b) TextRank in the event of any third party claim that the app or your possession and use of the app infringes that
third party s intellectual property rights niantic will be solely responsible for the investigation defense
settlement and discharge of any such intellectual property infringement claim to the extent required by
these terms.
KLSum if you accessed or downloaded the app from any app store or distribution platform like the apple store
google play or amazon appstore each an app provider then you acknowledge and agree that these terms
are concluded between you and niantic and not with app provider and that as between us and the app
provider niantic is solely responsible for the app.
Reference Summary don t be a jerk. don t hack or cheat. we don t have to ban you but we can. we ll also cooperate with
law enforcement.
KLSum by way of example and not as a limitation you agree that when using the services and content you will
(4a) not defame abuse harass harm stalk threaten or otherwise violate the legal rights including the rights of
privacy and publicity of others [...] lease the app or your account collect or store any personally iden-
tifiable information from the services from other users of the services without their express permission
violate any applicable law or regulation or enable any other individual to do any of the foregoing.
Reference Summary don t blame google.
TextRank, KLSum, the indemnification provision in section 9 of the api tos is deleted in its entirety and replaced with the
Lead-K following you agree to hold harmless and indemnify google and its subsidiaries affiliates officers agents
and employees or partners from and against any third party claim arising from or in any way related
(4b) to your misuse of google play game services your violation of these terms or any third party s misuse
of google play game services or actions that would constitute a violation of these terms provided that
you enabled such third party to access the apis or failed to take reasonable steps to prevent such third
party from accessing the apis including any liability or expense arising from all claims losses damages
actual and consequential suits judgments litigation costs and attorneys fees of every kind and nature.
Table 7: Examples of reference summaries and results from various extractive summarization techniques. The text
shown here has been pre-processed. To conserve space, original texts were excluded from most examples.
8
7 Conclusion Goran Glavaš and Sanja Štajner. 2015. Simplifying
lexical simplification: Do we need simplified cor-
In this paper, we propose the task of summarizing pora? In Proceedings of the 53rd Annual Meeting of
legal documents in plain English and present an the Association for Computational Linguistics and
the 7th International Joint Conference on Natural
initial evaluation dataset for this task. We gather Language Processing (Vol 2: Short Papers), pages
our dataset from online sources dedicated to ex- 63–68.
plaining sections of contracts in plain English and
manually verify the quality of the summaries. We Max Grusky, Mor Naaman, and Yoav Artzi. 2018.
show that our dataset is highly abstractive and that Newsroom: A dataset of 1.3 million summaries with
diverse extractive strategies. In Proceedings of the
the summaries are much simpler to read. This task 2018 Conference of the North American Chapter of
is challenging, as popular unsupervised extractive the Association for Computational Linguistics: Hu-
summarization methods do not perform well on man Language Technologies, Volume 1 (Long Pa-
this dataset and, as discussed in section 6, cur- pers), pages 708–719.
rent methods that address the change in register
Ben Hachey and Claire Grover. 2006. Extractive sum-
are mostly supervised as well. We call for the de- marisation of legal texts. Artificial Intelligence and
velopment of resources for unsupervised simplifi- Law, 14(4):305–345.
cation and style transfer in this domain.
Aria Haghighi and Lucy Vanderwende. 2009. Explor-
ing content models for multi-document summariza-
Acknowledgments tion. In Proceedings of Human Language Tech-
nologies: The 2009 Annual Conference of the North
We would like to personally thank Katrin Erk for American Chapter of the Association for Computa-
her help in the conceptualization of this project. tional Linguistics, pages 362–370.
Additional thanks to May Helena Plumb, Barea
Sinno, and David Beavers for their aid in the re- Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal
vision process. We are grateful for the anonymous Rustagi, and Min-Yen Kan. 2016. Overview of the
cl-scisumm 2016 shared task. In Proceedings of the
reviewers and for the TLDRLegal and TOS;DR Joint Workshop on Bibliometric-enhanced Informa-
communities and their pursuit of transparency. tion Retrieval and Natural Language Processing for
Digital Libraries, pages 93–102.
General Data Protection Regulation. 2018. Regula- G Harry Mc Laughlin. 1969. Smog grading-a new
tion on the protection of natural persons with re- readability formula. Journal of reading, 12(8):639–
gard to the processing of personal data and on the 646.
free movement of such data, and repealing directive
95/46/ec (data protection directive). L119, 4 May David Mellinkoff. 2004. The language of the law.
2016, pages 188. Wipf and Stock Publishers.
9
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bring- Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi
ing order into text. In Proceedings of the 2004 con- Jaakkola. 2017. Style transfer from non-parallel text
ference on empirical methods in natural language by cross-alignment. In Advances in neural informa-
processing. tion processing systems, pages 6830–6841.
Shashi Narayan and Claire Gardent. 2016. Unsuper- Lucia Specia. 2010. Translating from complex to sim-
vised sentence simplification using deep semantics. plified sentences. In Proceedings of the Interna-
In The 9th International Natural Language Genera- tional Conference on Computational Processing of
tion conference, pages 111–120. the Portuguese Language, pages 30–39.
Michael E Sykuta, Peter G Klein, and James Cutts.
Ani Nenkova, Kathleen McKeown, et al. 2011. Auto-
2007. Cori k-base: Data overview.
matic summarization. Foundations and Trends R in
Information Retrieval, 5(2–3):103–233. TAC. 2014. In https://tac.nist.gov/2014/BiomedSumm/.
Benjamin Nye and Ani Nenkova. 2015. Identification Jie Tang, Bo Wang, Yang Yang, Po Hu, Yanting Zhao,
and characterization of newsworthy verbs in world Xinyu Yan, Bo Gao, Minlie Huang, Peng Xu, We-
news. In Proceedings of the 2015 Conference of ichang Li, et al. 2012. Patentminer: topic-driven
the North American Chapter of the Association for patent analysis and mining. In Proceedings of the
Computational Linguistics: Human Language Tech- 18th Internationasl Conference on Knowledge Dis-
nologies, pages 1440–1445. covery and Data Mining, pages 1366–1374.
Jonathan A Obar and Anne Oeldorf-Hirsch. 2018. The Peter M Tiersma. 2000. Legal language. University of
biggest lie on the internet: Ignoring the privacy poli- Chicago Press.
cies and terms of service policies of social network-
Yuen-Hsien Tseng, Chi-Jen Lin, and Yu-I Lin. 2007.
ing services. Information, Communication & Soci-
Text mining techniques for patent analysis. In-
ety, pages 1–20.
formation Processing & Management, 43(5):1216–
1247.
Paul Over, Hoa Dang, and Donna Harman. 2007. DUC
in context. Information Processing & Management, Kristian Woodsend and Mirella Lapata. 2011. Learn-
43(6):1506–1520. ing to simplify sentences with quasi-synchronous
grammar and integer programming. In Proceedings
Gustavo H. Paetzold and Lucia Specia. 2016. Unsuper- of the conference on empirical methods in natural
vised lexical simplification for non-native speakers. language processing, pages 409–420.
In Proceedings of the 13th Association for the Ad-
vancement of Artificial Intelligence Conference on Wei Xu, Chris Callison-Burch, and Courtney Napoles.
Artificial Intelligence, pages 3761–3767. 2015. Problems in current text simplification re-
search: New data can help. Transactions of the As-
Plain English law. 1978. Title 7: Requirements for sociation for Computational Linguistics, 3:283–297.
use of plain language in consumer transactions. The
Laws Of New York Consolidated Laws. General Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze
Obligations. Article 5: Creation, Definition And En- Chen, and Chris Callison-Burch. 2016. Optimizing
forcement Of Contractual Obligations. statistical machine translation for text simplification.
Transactions of the Association for Computational
Plain Writing Act. 2010. An act to enhance citizen ac- Linguistics, 4:401–415.
cess to government information and services by es-
Zichao Yang, Zhiting Hu, Chris Dyer, Eric P Xing, and
tablishing that government documents issued to the
Taylor Berg-Kirkpatrick. 2018. Unsupervised text
public must be written clearly, and for other pur-
style transfer using language models as discrimina-
poses. House of Representatives 946; Public Law
tors. In Advances in Neural Information Processing
No. 111-274; 124 Statues at Large 2861.
Systems, pages 7298–7309.
Alexander M. Rush, Sumit Chopra, and Jason Weston. Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexan-
2015. A neural attention model for abstractive sen- der R Fabbri Irene Li Dan, and Friedman
tence summarization. In Proceedings of the 2015 Dragomir R Radev. 2019. Scisummnet: A large an-
Conference on Empirical Methods in Natural Lan- notated corpus and content-impact models for sci-
guage Processing, pages 379–389. entific paper summarization with citation networks.
In Proceedings of the 13th Association for the Ad-
Abigail See, Peter J Liu, and Christopher D Manning. vancement of Artificial Intelligence Conference on
2017. Get to the point: Summarization with pointer- Artificial Intelligence.
generator networks. In Proceedings of the 55th An-
nual Meeting of the Association for Computational Xingxing Zhang and Mirella Lapata. 2017. Sentence
Linguistics (Vol 1: Long Papers), pages 1073–1083. simplification with deep reinforcement learning. In
Proceedings of the 2017 Conference on Empirical
RJ Senter and Edgar A Smith. 1967. Automated read- Methods in Natural Language Processing, pages
ability index. Technical report, Cincinnati Univ OH. 584–594.
10
Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych.
2010. A monolingual tree-based translation model
for sentence simplification. In Proceedings of the
23rd international conference on computational lin-
guistics, pages 1353–1361.
11