Detecting Offensive Language in Social Media To Protect Adolescent Online Safety

Uploaded by

gest

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Detecting Offensive Language in Social Media To Protect Adolescent Online Safety

Uploaded by

gest

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Detecting Offensive Language in Social Media to Protect Adolescent Online Safety

Ying Chen1 Yilu Zhou2 Sencun Zhu1,3, Heng Xu3

1 2 3
Department of Computer Science and Department of Information Systems and College of Information Sciences and
Engineering Technology Management Technology
The Pennsylvania State University George Washington University The Pennsylvania State University,
University Park, PA, USA Washington, DC, USA University Park, PA, USA
[email protected] [email protected] {sxz16, hxx4}@psu.edu

Abstract—Since the textual contents on online social media are To address concerns on children’s access to offensive
highly unstructured, informal, and often misspelled, existing content over Internet, administrators of social media often
research on message-level offensive language detection cannot manually review online contents to detect and delete
accurately detect offensive content. Meanwhile, user-level offensive materials. However, the manual review tasks of
offensiveness detection seems a more feasible approach but it identifying offensive contents are labor intensive, time
is an under researched area. To bridge this gap, we propose the consuming, and thus not sustainable and scalable in reality.
Lexical Syntactic Feature (LSF) architecture to detect offensive Some automatic content filtering software packages, such as
content and identify potential offensive users in social media. Appen and Internet Security Suite, have been developed to
We distinguish the contribution of pejoratives/profanities and
detect and filter online offensive contents. Most of them
obscenities in determining offensive content, and introduce
hand-authoring syntactic rules in identifying name-calling
simply blocked webpages and paragraphs that contained
harassments. In particular, we incorporate a user’s writing dirty words. These word-based approaches not only affect
style, structure and specific cyberbullying content as features the readability and usability of web sites, but also fail to
to predict the user’s potentiality to send out offensive content. identify subtle offensive messages. For example, under these
Results from experiments showed that our LSF framework conventional approaches, the sentence “you are such a crying
performed significantly better than existing methods in baby” will not be identified as offensive content, because
offensive content detection. It achieves precision of 98.24% and none of its words is included in general offensive lexicons. In
recall of 94.34% in sentence offensive detection, as well as addition, the false positive rate of these word-based detection
precision of 77.9% and recall of 77.8% in user offensive approaches is often high, due to the word ambiguity problem,
detection. Meanwhile, the processing speed of LSF is i.e., the same word can have very different meanings in
approximately 10msec per sentence, suggesting the potential different contexts. Moreover, existing methods treat each
for effective deployment in social media. message as an independent instance without tracing the
source of offensive contents.
Keywords – cyberbullying; adolescent safety; offensive
languages; social media To address these limitations, we propose a more powerful
solution to improve the deficiency of existing offensive
I. INTRODUCTION content detection approaches. Specifically, we propose the
Lexical Syntactic Feature-based (LSF) language model to
With the rapid growth of social media, users especially effectively detect offensive language in social media to
adolescents are spending significant amount of time on protect adolescents. LSF provides high accuracy in subtle
various social networking sites to connect with others, to offensive message detection, and it can reduce the false
share information, and to pursue common interests. In 2011, positive rate. Besides, LSF not only examines messages, but
70% of teens use social media sites on a daily basis [1] and also the person who posts the messages and his/her patterns
nearly one in four teens hit their favorite social-media sites of posting. LSF can be implemented as a client-side
10 or more times a day [2]. While adolescents benefit from application for individuals and groups who are concerned
their use of social media by interacting with and learning about adolescent online safety. It is able to detect whether
from others, they are also at the risk of being exposed to online users and websites push recognizable offensive
large amounts of offensive online contents. ScanSafe's contents to adolescents, trigger applications to alert the
monthly "Global Threat Report" [3] found that up to 80% of senders to regulate their behavior, and eventually block the
blogs contained offensive contents and 74% included porn sender if this pattern continues. Users are also allowed to
in the format of image, video, or offensive languages. In adjust the threshold of acceptable level of offensive contents.
addition, cyber-bullying occurs via offensive messages Our language model may not be able to make adolescents
posted on social media. It has been found that 19% of teens completely immune to offensive contents, because it is hard
report that someone has written or posted mean or to fully detect what is “offensive.” However, we aim to
embarrassing things about them on social networking sites provide an improved automatic tool to detect offensive
[1]. As adolescents are more likely to be negatively affected contents in social media to help school teachers and parents
by biased and harmful contents than adults, detecting online have better control over the contents adolescents are viewing.
offensive contents to protect adolescent online safety
becomes an urgent task.
While there is no universal definition of "offensive," in B. Using Text Mining Techniques to Detect Online
this study we employ Jay and Janschewitz’s [4] definition of Offensive Contents
offensive language as vulgar, pornographic, and hateful Offensive language identification in social media is a
language. Vulgar language refers to coarse and rude difficult task because the textual contents in such
expressions, which include explicit and offensive reference environment is often unstructured, informal, and even
to sex or bodily functions. Pornographic language refers to misspelled. While defensive methods adopted by current
the portrayal of explicit sexual subject matter for the social media are not sufficient, researchers have studied
purposes of sexual arousal and erotic satisfaction. Hateful intelligent ways to identify offensive contents using text
language includes any communication outside the law that mining approach. Implementing text mining techniques to
disparages a person or a group on the basis of some analyze online data requires the following phases: 1) data
characteristics such as race, color, ethnicity, gender, sexual acquisition and preprocess, 2) feature extraction, and 3)
orientation, nationality, and religion. All of these are classification. The major challenges of using text mining to
generally immoral and harmful for adolescents’ mental detect offensive contents lie on the feature selection phrase,
health. which will be elaborated in the following sections.

II. RELATED WORK a) Message-level Feature Extraction

In this section, we review existing methods on offensive Most offensive content detection research extracts two
content filtering in social media, and then focus on text kinds of features: lexical and syntactic features.
mining based offensive detection research. Lexical features treat each word and phrase as an entity.
Word patterns such as appearance of certain keywords and
A. Offensiveness Content Filtering Methods in Social
their frequencies are often used to represent the language
Media
model. Early research used Bag-of-Words (BoW) in
Popular online social networking sites apply several offensiveness detection[5]. The BoW approach treats a text
mechanisms to screen offensive contents. For example, as an unordered collection of words and disregards the
Youtube’s safety mode, once activated, can hide all
syntactic and semantic information. However, using BoW
comments containing offensive languages from users. But
pre-screened content will still appear—the pejoratives approach alone not only yields low accuracy in subtle
replaced by asterisks, if users simply click "Text offensive language detection, but also brings in a high false
Comments." On Facebook, users can add comma-separated positive rate especially during heated arguments, defensive
keywords to the "Moderation Blacklist." When people reactions to others’ offensive posts, and even conversations
include blacklisted keywords in a post and/or a comment on between close friends. N-gram approach is considered as an
a page, the content will be automatically identified as spam improved approach in that it brings words’ nearby context
and thus be screened. Twitter client, “Tweetie 1.3,” was information into consideration to detect offensive contents
rejected by Apple Company for allowing foul languages to [6]. N-grams represent subsequences of N continuous words
appear in users’ tweets. Currently, Twitter does not pre- in texts. Bi-gram and Tri-gram are the most popular N-
screen users’ posted contents, claiming that if users
grams used in text mining. However, N-gram suffers from
encounter offensive contents, they can simply block and
unfollow those people who post offensive contents. difficulty in exploring related words separated by long-
distances in texts. Simply increasing N can alleviate the
In general, the majority of popular social media use problem but will slow down system processing speed and
simple lexicon-based approach to filter offensive contents. bring in more false positives.
Their lexicons are either predefined (such as Youtube) or
composed by the users themselves (such as Facebook). Syntactic features: Although lexical features perform
Furthermore, most sites rely on users to report offensive well in detecting offensive entities, without considering the
contents to take actions. Because of their use of simple syntactical structure of the whole sentence, they fail to
lexicon-based automatic filtering approach to block the distinguish sentences’ offensiveness which contain same
offensive words and sentences, these systems have low words but in different orders. Therefore, to consider
accuracy and may generate many false positive alerts. In
syntactical features in sentences, natural language parsers
addition, when these systems depend on users and
administrators to detect and report offensive contents, they [7] are introduced to parse sentences on grammatical
often fail to take actions in a timely fashion. For adolescents structures before feature selection. Equipping with a parser
who often lack cognitive awareness of risks, these can help avoid selecting un-related word sets as features in
approaches are hardly effective to prevent them from being offensiveness detection.
exposed to offensive contents. Therefore, parents need more
b) User-level Offensiveness Detection
sophisticate software and techniques to efficiently detect
offensive contents to protect their adolescents from potential Most contemporary research on detecting online
exposure to vulgar, pornographic and hateful languages. offensive languages only focus on sentence-level and
message-level constructs. Since no detection technique is
100% accurate, if users keep connecting with the sources of
offensive contents (e.g., online users or websites), they are at
Figure 1. Framework of LSF-based offensive language detection

high risk of continuously exposure to offensive contents.

However, user-level detection is a more challenging task and IV. DESIGN FRAMEWORK
studies associated with the user level of analysis are largely In order to tackle these challenges, we propose a Lexical
missing. There are some limited efforts at the user level. For Syntactic Feature (LSF) based framework to detect offensive
example, Kontostathis et al [8] propose a rule-based content and identify offensive users in social media. We
communication model to track and categorize online propose to include two phases of offensiveness detection.
predators. Pendar [6] uses lexical features with machine Phase 1 aims to detect the offensiveness on the sentence
learning classifiers to differentiate victims from predators in level and Phase 2 derives offensiveness on the user level. In
online chatting environment. Pazienza and Tudorache [9] Phase 1, we apply advanced text mining and natural
propose utilizing user profiling features to detect aggressive language processing techniques to derive lexical and
discussions. They use users’ online behavior histories (e.g., syntactic features of each sentence. Using these features, we
presence and conversations) to predict whether or not users’ derive an offensive value for each sentence. In Phase 2, we
future posts will be offensive. Although their work points out further incorporate user-level features where we leverage
an interesting direction to incorporate user information in research on authorship analysis. The framework is illustrated
detecting offensive contents, more advanced user in Fig.1.
information such as users’ writing styles or posting trends or
reputations has not been included to improve the detection The system consists of pre-processing and two major
rate. components: sentence offensiveness prediction and user
offensiveness estimation. During the pre-processing stage,
III. RESEARCH QUESTIONS users’ conversation history is chunked into posts, and then
into sentences. During sentence offensiveness prediction,
Based on our review, we identify the following research
questions to prevent adolescents from exposing to offensive each sentence’s offensiveness can be derived from two
textual content: features: its words’ offensiveness and the context. We use
lexical feature to represent words’ offensiveness in a
• How to design an effective framework that sentence, and syntactic feature to represent context in a
incorporates both message-level and user-level sentence. Words’ offensiveness nature is measured from two
features to detect and prevent offensive content in lexicons. For the context, we grammatically parse sentences
social media? into dependency sets to capture all dependency types
• What strategy is effective in detecting and evaluating between a word and other words in the same sentence, and
level of offensiveness in a message? Will advanced mark some of its related words as intensifiers. The
linguistic analysis improve the accuracy and reduce intensifiers are effective in detecting whether offensive
false positives in detecting message-level words are used to describe users or other offensive words.
offensiveness? During user offensiveness estimation stage, sentence
offensiveness and users’ language patterns are helped to
• What strategy is effective in detecting and predicting predict users’ likelihood of being offensive.
user-level offensiveness? Besides using information
from message-level offensiveness, could user profile Sentence Offensiveness Calculation
information further improve the performance?
To address the limitations of the previous methods for
• Is the proposed framework efficient enough to be sentence offensiveness detection [10-13], we propose a new
deployed on real time social media? method of sentence-level analysis based on offensive word
lexicons and sentence syntactic structures. Firstly, we
construct two offensive word dictionaries based on different
strengths of offensiveness. Secondly, the concept of syntactic offensiveness level requires adjusting. This study uses a
intensifier is introduced to adjust words’ offensiveness levels nature language process parser, proposed by Stanford
based on their context. Lastly, for each sentence, an Natural Language Processing Group, to capture the
offensiveness value is generated by aggregating its words’ grammatical dependencies within a sentence. The parsing
offensiveness. Since we already use intensifiers to further results of sentences become combinations of a dependency-
adjust words’ offensiveness, no extra weights are assigned to type and word-pair with the form “(governor, dependent).”
words during the aggregation. For example, the typed dependency “appos (you, idiot)” in
the sentence “You, by any means, an idiot.” means that
a) Lexical Features: Offensiveness Dictionary “idiot”, the dependent, is an appositional modifier of the
Construction pronoun “you,” the governor. The governor and dependent
Offensive sentences always contain pejoratives, can be any syntactic elements of sentences. Some selected
profanities, or obscenities. Strongly profanities, such as dependency types capture the possible grammatical relations
“f***” and “s***”, are always undoubtedly offensive when between an offensive word and a user-identifier (or another
directed at users or objects; but there are many other weakly offensive word) in a sentence. The study also proposes
pejoratives and obscenities, such as “stupid” and “liar,” that syntactical intensifier detection rules listed in Table II ( A
may also be offensive. This research differentiates between represents a user identifier, and B represents an offensive
these two levels of offensiveness based on their strength. The word).
offensive word lexicon used in this research includes the The offensiveness levels of offensive words and other
lexicon used in Xu and Zhu’s study [14] and a lexicon, based inappropriate words receive adjustment by multiplying their
on Urban Dictionary, established during the coding process. prior offensiveness levels by an intensifier [15]. In sentence,
All profanities are labeled as strongly offensive. Pejoratives
s , words syntactically related to offensive word, w , are
and obscenities receive the label of strongly offensive if
more than 80% of their use in our dataset is offensive. The categorized in an intensifier set, i w , s = {c1 ,..., c k } , for each
dataset is collected from Youtube command board (details word c j (1 ≤ j ≤ k ) , its intensify value, d j , is defined as:
will be described in the experiment section). Otherwise,
known pejoratives and obscenities receive the label of
weakly offensive word. Word offensiveness is defined as: for b1 if c j is a user identifier
 (2)
each offensive word, w , in sentence, s , its offensiveness d j = b2 if c j is an offensive word
1 otherwise
a1 if w is a strongly offensive word
 (1) where b1 > b2 > 1 , for offensive words used to describe users
O w = a 2 if w is a weakly offensive word
0 othewise are more offensive than the words used to describe other
offensive words. Thus, the value of intensifier, I w , for
where 1 > a1 > a 2 , for the offensiveness of strongly offensive
∑
k
offensive word, w , can be calculated as dj .
j =1
words is higher than weakly offensive words.
c) Sentence Level Offensiveness Value Generation
b) Syntactic Features: Syntactic Intensifier Detection
Consequently, the offensiveness value of sentence, s ,
Once pejoratives or obscenities are directed at online
users, or semantically associated with another pejorative or becomes a determined linear combination of words’
obscenity, they become more offensive from users’ offensiveness, Os = ∑ ow I w .
perspectives. For example, “you stupid” and “f***ing
stupid,” are much more insulting than “This game is stupid.” User Offensiveness Estimation
In addition, the dataset from Content Analysis for the In user offensiveness estimation stage, our design has
Web2.0 Workshop 1 shows that most offensive sentences two major steps: aggregating users’ sentence offensiveness
include not only offensive words but also user identifiers, i.e. and extracting extra features from users’ language styles. We
second person pronouns, victim’s screen names, and other incorporate sentence offensiveness values and user language
terms referring to people. Table I lists some examples of this features to classify users’ offensiveness.
type of sentences.
When offensive words grammatically relate to user a) Sentence Offensiveness Aggregation
identifiers or other offensive words in sentences, the
While there are few studies on user-level offensiveness
TABLE I. LANGUAGE FEATURES OF OFFENSIVE SENTENCES analysis, studies on document-level sentiment analysis share
Language Features Example
some similarity with this research [15-18]. Document-level
Second person pronoun (victim’s screen name) +
sentiment analysis predicts the overall polarity of a document
<You, gay> by aggregating polarity scores of individual sentences. Since
pejorative (i.e. JK, gay, wtf, emo, fag, loner, loser)
Offensive adjective (i.e. stupid, foolish, sissy) + people <stupid, bitch> the importance of each sentence varies in a document, one
referring terms (i.e. emo, bitch, whore, boy, girl) <sissy, boy> assigns weights to all sentences to adjust their contributions
to the overall polarity. Similarly, we cannot simply sum up

1
http://caw2.barcelonamedia.org/
TABLE II. SYNTACTICAL INTENSIFIER DETECTION RULES
Rules Meanings Examples Dependency Types
Descriptive Modifiers and B is used to define or you f***ing; • abbrev (abbreviation modifier),
complements: modify A. you who f***ing; • acomp (adjectival complement),
A(noun, verb, adj) B(adj, you…the one…f***ing. • amod (adjectival modifier),
adv, noun) • appos (appositional modifier),
• nn (noun compound modifier),
• partmod (participial modifier)
Object: A is B’s direct or F*** yourselves; • dobj (direct object),
B(noun, verb) A(noun) indirect object. shut the f** up; • iobj (indirect object),
f*** you idiot; • nsubj (nominal subject)
you are an idiot;
you say that f***...
Subject: A is B’s subject or you f***…; • nsubj (nominal subject),
A(noun)B(noun, verb) passive subject. you are **ed… • nsubjpass (passive nominal
…f***ed by you… subject),
• xsubj (controlling subject),
• agent (passive verb’s subject).
Close phrase, coordinating A and B or two Bs are F** and stupid; • conj (conjunct),
conjunction: close to each other in a you, idiot. • parataxis (from Greek for “place
A and B; sentence, but be side by side”)
…A, B…; separated by comma or
…B, B… semicolon.
Possession modifiers: A is a possessive your f*** …; • poss (holds between the user and
A(noun)B(noun) determiner of B. s*** falls out of your its possessive determiner)
mouth.
Rhetorical questions: B is used to describe Do you have a point, f***? • rcmod (relative clause modifier)
clause with A as root
A(noun)B(noun) (main object).

the offensive values of all sentences to compute users’ users who have more posts are not necessarily more
offensiveness, because the strength of sentence offensiveness offensive than others. Ou , should be no less than 0.
depends on its context. For example, one may post “Stupid
guys need more care. You are one of them.” If we calculate b) Additional Features Extracted from Users’ Lanuage
offensiveness level of this sentence without considering the Profiles
context, the offensiveness of this post will not be detected Other characteristics such as the punctuation used,
even using natural language parsers. To bypass the limitation sentence structure, and the organization of sentences within
of current parsers, we modify each post by combining posts could also affect others’ perceptions of the poster’s
sentences and replacing the periods with commas before offensiveness level. Considering the following cases:
feeding them to parsers. Then the parser generates different
phrase sets for further calculation of the offensiveness level
of the modified posts. However, since the modified posts Sentence styles. Users may use punctuation and words with
may sometimes miss the original meanings, we have to all uppercase letters to indicate feelings or speaking volume.
balance between using the sum of sentence offensiveness and Punctuation, such as exclamation marks, can emphasize
using the offensiveness of the modified posts to represent offensiveness of posts. (i.e. Both “You are stupid!” and “You
post offensiveness. In this case, the greater value of the two are STUPID.” are stronger than “You are stupid.”). Some
is chosen to represent the final posts’ offensiveness levels. users tend to post short insulting comments, such as “Holy
The detail of the schema is illustrated as following: s***.” and “You idiot.” Consequently, compared to those
who post the same number of offensive words but in longer
Given a user, u , we retrieve his/her conversation history sentences, the former users appear more offensive for
which contains several posts { p1 ,..., p m } , and each post intensive usage of pejoratives and obscenities. Users may use
p i (1 ≤ i ≤ m ) contains sentences {s1 ,..., s n } . Sentence offensive words to defend themselves when they are arguing
offensiveness values are denoted as {O s ,..., O s } . The with others who are offensive. But it is costly to detect
1 n
whether their conversation partners are offensive or not.
original offensiveness value of post p, O p = ∑ Os . The Instead, we noticed that arguments should happen in
offensiveness value of modified posts can be presented as, relatively short period of time. For example, for user u,
O p → s . So the final post offensiveness O 'p of post p can be whose conversation history is valid in 100 days within 2
calculated as, O p' = max(O p , O p→s ) = max(∑ Os , O p →s ) . Hence, years, while the time period he/she is using offensive words
is only 5 days, no matter how many offensive words (s)he is
the offensiveness value, Ou , of user, u , can be presented as, using, (s)he should not be considered as an offensive user.
1 . We normalize the offensiveness value because Thus, to make sure users’ offensiveness values evenly
∑O p
'
Ou =
m distributed over the span of their conversation history is a
TABLE III. ADDITIONAL FEATURE SELECTION FOR USER V. EXPERIMENT
OFFENSIVENESS ANALYSIS
This section describes several experiments we conducted
Style Features Structural Features Content-specific Features
to examine LSF on detecting offensiveness languages in
-Ratio of short -Ratio of imperative -Race social media.
sentences sentences -Religion
-Appearance of -Appearance of -Violence Dataset Description
punctuations offensive words as -Sexual orientation
-Appearance of nouns, verbs, adjs -Clothes The experimental dataset, retrieved from Youtube
words with all and advs. -Accent comment boards, is a selection of text comments from
uppercase -Appearance postings in reaction to the top 18 videos. Classification of the
letters -Intelligence videos includes thirteen categories: Music, Autos, Comedies,
-Special needs or disabilities
Educations, Entertainments, Films, Gaming, Style, News,
reasonable way to differentiate general offensive users from Nonprofits, Animals, Sciences, and Sports. Each text
the occasional ones. comment includes a user id, a timestamp and text content.
The user id identifies the author who posted the comment,
Sentence structures. Users who frequently use the timestamp records when the comment was posted and the
imperative sentences tend to be more insulting, because text content contained a user’s comments. The dataset
imperative sentences deliver stronger sentiments. For includes comments from 2,175,474 distinct users.
example, a user who always posts messages such as
Pre-processing
“F***ing u” and “Slap your face” gives the impression of
being more offensive and aggressive than those ones posting Before feeding the dataset to the classifier, an automatic
“you are f***ing” and “your face get slapped.” pre-processing procedure assembles the comments for each
user and chunks them into sentences. For each sentence in
Cyberbullying related content. O'Neill and Zinga [19] the sample dataset, an automatic spelling and grammar
described seven types of children who, due to differences correction process precedes introduction of the sample
from peers, may be easy targets for online bullies, including dataset to the classifier. With the help of WordNet corpus
those children from minority races, with religious beliefs, or and spell-correction algorithm 2 , correction of spelling and
with non-typical sexual orientations. Detecting online grammar mistakes in the raw sentences occurs by tasks such
conversations referring to these individual differences also as deleting repeated letters in words, deleting meaningless
provides clues for identifying offensive users. symbols, splitting long words, transposing substituted letters,
and replacing the incorrect and missing letters in words. As a
Based on the above observations, three types of features result, words missing letters, such as “speling,” are corrected
are developed to identify the level of offensiveness, which to “spelling”; misspelled words, such as “korrect,” change to
leveraged from authorship analysis research on cybercrime “correct.”
investigation [20-25]: style features, structural features, and
content-specific features. Style features and structural Experiment Settings in Sentence Offensive Prediction
features capture users’ language patterns, while content- The experiment compares six approaches in sentence
specific features help to identify abnormal contents in users’ offensive prediction:
conversations. The style features in our study infer users’ a) Bag-of-words (BoW): The BoW approach disregards
offensiveness levels from their language patterns, including grammar and word order and detects offensive sentences by
whether or not they are frequently/recently using offensive checking whether or not they contain both user identifiers
words and intensifiers such as uppercase letters and and offensive words. This approach also acts as a
punctuation. The structural features capture the way users benchmark.
construct their posts, which check whether or not users are
frequently using imperative sentences. They also try to infer b) 2-gram: The N-gram approach detects offensive
users’ writing styles by checking offensive words used as sentences by selecting all sequences of n words in a given
nouns, verbs, adjs, or advs. The content-specific features sentence and checking whether or not the sequences include
check whether or not users post suspicious contents which both user identifiers and offensive words. In this approach,
probably will be identified as cyberbullying messages. In this N equals to 2, it also acts as a benchmark.
study, we identify cyberbullying contents by checking
c) 3-gram: N-gram approach, selecting all sequences
whether they contain cyberbullying related words (i.e.
of 3 words in a given sentence. It also acts as a benchmark.
religious words). The details of these features are
summarized in Table III. d) 5-gram: N-gram approach, selecting all sequences
of 5 words in a given sentence. It also acts as a benchmark.
c) Overall User Offensiveness Estimation e) Appraisal approach: The appraisal approach was
Besides style features, structure features and content- proposed for sentiment analysis [26], here we use it on
specific features, sentence offensiveness values are sentence offensive detection for comparison. It can detect
considered as one type of user language features. By using offensive sentences by going through all types of
these features, machine learning techniques can be adopted dependency sets and checking whether or not certain
to classify users’ offensiveness levels. offensive words and user identifiers grammatically related

2
Spell-Correction Algorithm, at http://norvig.com/spell-correct.html
in a given sentence. The major differences between applying 100%
the appraisal approach on sentence offensive detection and
80% Precision
ours is that appraisal approach cannot differentiate offensive
words based on their strength, and it generally considers two 60% Recall
words as “related” if they are within any type dependency FP rate
40%
set, while some of the dependency type does not really FN rate
20%
indicate one is acting on the other. For instance, type F-score
dependency “parataxis” relation (from Greek for “place side 0%
BoW 2-gram 3-gram 5-gram Appraisal LSF
by side”) is a relation between the main verb of a clause and
other sentential elements, such as a sentential parenthetical, Figure 2. Accuracies of sentence level offensiveness detection
a clause after a “:” or a “;”. An example sentence for type The recall of N-gram is low when n is small. However, as n
dependency “parataxis(left, said)” can be “The guy, John increases, the false positive rate increases as well. Once N
said, left early in the morning”. Here “said” and “left” are equals to the length of sentences, N-gram is equivalent to the
not really used to describe one another. bag-of-words approach. To further apply N-gram in the
f) LSF: The sentence offensive prediction method classification, application of different values of N is
proposed in this study. necessary to balance, perfectly, the trade-off between recall
and false positive rate.
Evaluation Metrics
The appraisal approach reaches high precision, but its
In our experiments, standard evaluation metrics for recall rate is poor. LSF obtains the highest f-score, because it
classification in sentiment analysis [16, 17, 27] (i.e., sufficiently balances the precision-recall tradeoff. It achieves
precision, recall, and f-score) are used to evaluate the precision of 98.24% and recall of 94.34% in sentence
performance of LSF. In particular, precision presents the offensive detection. Unfortunately, the parser sometimes
percent of identified posts that are truly offensive messages. misidentifies noun appositions, in part because of
Recall measures the overall classification correctness, which typographical errors in the input, such as: “you stupid
represents the percent of actual offensive messages posts that sympathies” Here, the sender presumably meant to write
are correctly identified. False positive (FP) rate represents “your” instead of “you.” This is the major reason for false
the percent of identified posts that are not truly offensive negative rates. The false positive rate arises mainly from
messages. False negative (FN) rate represents the percent of multiple appearances of weak offensive words, for example,
actual offensive messages posts that are unidentified. F-score “fake and stupid,” which can only represent a negative
[13] represents the weighted harmonic mean of precision and opinion for a video clip but accidently identified as
recall, which is defined as: “offensive” because LSF calculate a value higher than (or
equal to) 1.
2( precision × recall ) (3)
f − score =
precision + recall Experiment 2: User Offensiveness Estimation-with presence
of strongly offensive words
Experiment 1: Sentence Offensiveness Calculation In this experiment we randomly selected 249 users with
uniformly distributed offensiveness values calculated from
In this experiment, we randomly select a uniform
Experiment 1 from the dataset. The selected users have 15
distributed sample from the complete dataset, which includes
posts on average. Each of the 249 users was rated by three
1700 sentences. In total, we select 359 strongly offensive
coders (two males and one female) who were not otherwise
words and 251 weakly offensive words as offensive word
involved in this research. Coders were told to mark a user as
lexicons, and the experimental parameters are set as:
being offensive if his(her) posts contained insulting or
a1 = 1; a 2 = 0 .5; b1 = 2; b2 = 1 .5 . We define “1” to be the abusive language which makes the recipient feel offended,
threshold for offensive sentence classification, that is, not merely if the sender expressed disagreement with the
sentences with offensiveness values more than (inclusive) recipient. In other words, coders were asked to classify a
“1” receive labels of offensive sentences, because by our message as “offensive” or “inoffensive”. In terms of inter-
definition, offensive sentence means a sentence containing coder reliability, Cohen’s Kappa of 0.73 suggested a high
strongly offensive words, or containing weakly offensive level of agreement between the coders. A valid user label
words used to describe another user. After manual labeling, was generated when all coders put the same label on that
173 sentences are marked as “offensive”. Subsequently, a user. After balancing the positive and negative results, we
manual check on the classifier’s output produced the results have 99 users in each class.
as shown in Fig. 2.
According to Fig.2, none of the baseline approaches Machine learning techniques—NaiveBayes (NB) and
provides recall rate higher than 70%, because many of the SVM—are used to perform the classification, and 10-fold
offensive sentences are imperatives, which omit all user cross validation was conducted in this experiment. To fully
identifiers. Among the baseline approaches, the BoW evaluate the effectiveness of users’ sentence offensiveness
approach has the highest recall rate 66%. However, BoW value (LSF), style features, structure features and content-
generates a high false positive rate because it captures specific features for user offensiveness estimation, we fed
numbers of unrelated <user identifier, offensive word> sets. them sequentially into the classifiers, and get the result in
Fig.3. The “Strong+Weak” means simply uses offensive
90% content features content feature
structure features 90% structure feature
style feature
70% style features
70% Basic feat ure
Basic feat ures
50%
50%
30% 30%

10% 10%

-10% Strong+WeakStrong+Weak LSF LSF NaïveBayes SVM -10% Weak Weak LSF LSF NaïveBayes SV M
NaïveBayes SV M NaïveBayes SV M NaïveBayes SVM NaïveBayes SVM

Figure 4. F-score for different feature sets using NB and SVM (without
Figure 3. F-score for different feature sets using NB and SVM strongly offensive words)
features are still more valuable than structure features in user
words as the base feature to detect offensive user. Similarly, offensiveness classification. However, we did observe the
“LSF” means the sentence offensiveness value generated by appearance of imperative sentences frequently occurs in
LSF is used as the base feature. offensive users’ conversations. One reason to cause this is
the POS tagger does not have enough accuracy in tagging
According to Fig.3, offensive words and user language verbs, and it even marks “Yes”, ”Youre” and “Im” as verbs
features are not compensating to each other to improve the in some sentences. In such case, many imperative sentences
detection rate, which means they are not independent. In are not be tagged, and the tagged ones are not necessary
contrast, incorporating with user language features, the imperative.
classifiers have better detection rate than just adopting LSF.
While all three types of features are useful to improve the In this experiment, LSF performs better than offensive
classification rate, style features and content features are words in detecting offensive user this time, it achieves
more valuable than structure features in user offensiveness precision of 77.9% and recall of 77.8% in user offensive
classification. However, LSF is not as useful as using detection using SVM, which proves our hypotheses that LSF
offensive words alone in detecting offensive user. One do work better than using offensive words alone in detecting
possible reason is that once the number of strongly offensive non-obvious user offensiveness detection, and incorporating
words beyond certain amount, the user who posts the user language features will further stir the detection rate.
comments is considered being offensive anyway. In such Therefore, we can further conclude that considering context
case, LSF might be less useful than using merely offensive and talking objects will help precisely detect offensive
words. We looked further into this situation and test the language which does not have dirty words. However, strong
model under a situation where the messages does not contain offensive word is still the primary element which annoys
strong offensive words and are not obviously offensive in general readers. Our experiment results might suggest a
Experiment 3. possible 2-stage offensiveness detection when there are many
appearance of strong offensive words.
Experiment 3: User Offensiveness Estimation-without
strongly offensive words Experiment 4: Efficiency
In this experiment we only want to test the situation when Experiment 4(a): Efficiency of Sentence Offensiveness
the offensiveness of a user is subtle. We chose to use a Calculation
dataset without strongly offensive words. Our testing data are
In addition to accuracy measurement, assessment of
randomized selections of the original data followed by
processing speed on masses of text messages is necessary,
filtering out messages that contain strong offensive words.
because speed is a critical attribute for offensive detection in
We got 200 users with uniformly distributed offensiveness
real-time online communities. The sentence processing time
values. This dataset does not overlap with the one in
in each case appear in Fig.5.
experiment 2. The selected users have 85 posts on average,
and none of the posts contains strongly offensive words. The average time for reading each word is 0.0002 ms, and
After balancing the positive and negative results, we have 81 it takes 0.0033 ms to compare it with the words in
users in each class. The experiment condition is identical to dictionaries to determine whether it is a user identifier or an
Experiment 2. The result in presented in Fig.4. “Weak” offensive word. In our sample, each sentence contains about
means it is simply using (weak) offensive words as the base 10.42 words. Thus, the average processing time for BoW and
feature to detect offensive user, because there is no strongly N gram can be calculated as read time plus twice comparison
offensive words in this experiment. time for each word in the sentence, which is about 0.07 ms
(shown in Fig.5). However, for appraisal approach, it takes
According to Fig.4, offensive words and user language
longer time to grammatically parse sentences before the
features are still not well compensating to each other to
analysis. In contrast, LSF method firstly check whether
improve the detection rate, either as LSF and user language
sentence contain offensive words. If it does contain offensive
features. Hence, the dependency between user language
words, LSF will proceed to parse the sentence and search for
features and offensive words, and the dependency between
their intensifiers. We list the worst case for LSF method in
user language features and LSF are both data driven; they
Fig.5, and its performance really depend on the offensive
vary from domain to domain. Style features and content
sentence ratio on social media. However, we still can prove it
15 0.35
0.3
Sentence
10 Processing Time 0.25
(ms) 0.2
Sentence
Processing R ate 0.15
5
(/ms) 0.1
0.05
0
BOW n gram Appraisal LSF
0
LSF LSF + Style LSF + Style, LSF + Style,
Figure 5. Sentence Processing Time for different methods feature Structure Structure,
feature Content
is practical for application to online social media and other Figure 7. Classification Time for different feature sets using
real-time online communities. Take Youtube as example, NaiveBayes and SVM classifiers in Experiment 2 and Experiment 3
over 80% of its content don’t contain offensive words, so the To sum up, for a user who posts 100 sentences on social
sentence processing rate for LSF can be cut down to 2.6 ms. media, LSF takes approximately 2.2 second to predict users’
offensive potential.
Experiment 4(b): Efficiency of User Offensiveness
Estimation
VI. CONCLUSION
In experiment 2, users have 15 posts on average, and In this study, we investigate existing text-mining methods
each post contains 2 sentences, total 31 sentences posted by in detecting offensive contents for protecting adolescent
each user. In experiment 3, users have 85 posts on average, online safety. Specifically, we propose the Lexical
and each post contains 4 sentences, total 339 sentences Syntactical Feature (LSF) approach to identify offensive
posted by each user. The feature extraction time for different contents in social media, and further predict a user’s
feature sets in experiment 2 and experiment 3 are presented potentiality to send out offensive contents. Our research has
in Fig.6. several contributions. First, we practically conceptualize the
From Fig.6, we find that aggregating users’ sentences notion of online offensive contents, and further distinguish
offensiveness (LSF) takes most of the time, and it is positive the contribution of pejoratives/ profanities and obscenities in
correlated with the number of sentences a user posts. Other determining offensive contents, and introduce hand-
than that, the calculation of structure features also takes authoring syntactic rules in identifying name-calling
much more time than style features and content-specific harassment. Second, we improved the traditional machine
features. Assume an online user has 100 sentences in his learning methods by not only using lexical features to detect
(her) conversation history; it takes approximately 1.9s to offensive languages, but also incorporating style features,
extract both the sentence feature and language features, structure features and context-specific features to better
which will not even be noticed. predict a user’s potentiality to send out offensive content in
social media. Experimental result shows that the LSF
We further examined the classification rates for different sentence offensiveness prediction and user offensiveness
feature sets using NaiveBayes and SVM classifiers. Since the estimate algorithms outperform traditional learning-based
rates vary from time to time, we run each instance 5 times approaches in terms of precision, recall and f-score. It also
and take the average. The result is shown in Fig.7. As to achieves high processing speed for effective deployment in
Fig.7, we find that the calculation rate of machine learning social media. Besides, the LSF tolerates informal and
techniques is much faster than feature extraction time in misspelling contents, and it can easily adapt to any formats
Fig.6, the longest running time for machine learning of English writing styles. We believe that such language
classifiers is only 0.33s to predict users’ offensiveness. And processing model will greatly help online offensive language
the classification rate is independent on the number of users monitoring, and eventually build a safer online environment.
and the number of sentences. Generally, NaiveBayes works
much faster than SVM in classification, but SVM produces
more accurate classification results. ACKNOWLEDGMENT
We thank the reviewers for their valuable comments. Part
4.50 4.43 4.43 4.49 of this research was supported by the U.S. National Science
3.50 Foundation under grant CAREER 0643906 and CNS-
2.50 1018302. Any opinions, findings, and conclusions or
1.50 recommendations expressed herein are those of the
0.40 0.41 0.47 0.47
0.50
researchers and do not necessarily reflect the views of the
U.S. National Science Foundation.
-0.50
LSF LSF + Style LSF + Style, LSF + Style,
feature Structure Structure, REFERENCES
feature Content
Feature Calculation Time (s) Experiment 2 feature [1] T. Johnson, R. Shapiro, and R. Tourangeau, "National survey of
Figure 6. Feature extraction time for different feature sets in American attitudes on substance abuse XVI: Teens and parents.," in
Experiment 2 and Experiment 3 (per user)
The National Center on Addiction and Substance Abuse. vol. 2011, [16] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: Sentiment
2011. classification using machine learning techniques," In EMNLP'02:
[2] S. O. K. Gwenn, C.-P. Kathleen, and C. O. C. A. MEDIA, "Clinical Proceedings of the ACL-02 Conference on Empirical Methods in
report--the impact of social media on children, adolescents, and Natural Language Processing, pp. 79-86, 2002.
families.," Pediatrics, 2011. [17] P. Turney, "Thumbs up or thumbs down? Semantic orientation
[3] J. Cheng, "Report: 80 percent of blogs contain "offensive" content," applied to unsupervised classification of reviews," In Proceedings of
in ars technica. vol. 2011, 2007. the Association for Computational Linguistics (ACL), pp. 417-424,
[4] T. Jay and K. Janschewitz, "The pragmatics of swearing," Journal of 2002.
Politeness Research. Language, Behaviour, Culture, vol. 4, pp. 267- [18] B. K. Y. Tsou, R. W. M. Yuen, O. Y. Kwong, T. B. Y. Lai, and W. L.
288, 2008. Wong, "Polarity classification of celebrity coverage in the Chinese
[5] A. McEnery, J. Baker, and A. Hardie, "Swearing and abuse in modern press," Paper presented at the International Conference on
British English," in Practical Applications of Language Corpora Peter Intelligence Analysis, 2005.
Lang, Hamburg, 2000, pp. 37-48. [19] T. O'Neill and D. Zinga, Children's rights: multidisciplinary
[6] N. Pendar, "Toward spotting the pedophile telling victim from approaches to participation and protection: Univ of Toronto Pr, 2008.
predator in text chats," in Proceedings of the First IEEE International [20] R. Zheng, J. Li, H. Chen, and Z. Huang, "A framework for authorship
Conference on Semantic Computing, 2007, pp. 235-241. identification of online messages: Writing-style features and
[7] M.-C. d. Marneffe, B. MacCartney, and C. D. Manning, "Generating classification techniques," Journal of the American Society of
typed dependency parses from phrase structure parses," in LREC, Information Science and Technology, vol. 57, pp. 378-393, 2006.
2006. [21] J. V. Hansen, P. B. Lowry, R. D. Meservy, and D. M. McDonald,
[8] A. Kontostathis, L. Edwards, and A. Leatherman, "Chatcoder: "Genetic programming for prevention of cyberterrorism through
Toward the tracking and categorization of internet predators," In Proc. dynamic and evolving intrusion detection," Decision Support
Text Mining Workshop 2009 held in conjunction with the Ninth Systems, vol. 43, pp. 1362-1374, 2007.
SIAM International Conference on Data Mining, 2009. [22] R. Zheng, Y. Qin, Z. Huang, and H. Chen, "Authorship analysis in
[9] M. Pazienza and A. Tudorache, "Interdisciplinary contributions to cybercrime investigation," Intelligence and Security Informatics, pp.
flame modeling," AI* IA 2011: Artificial Intelligence Around Man 959-959, 2010.
and Beyond, pp. 213-224, 2011. [23] S. Symonenko, E. D. Liddy, O. Yilmazel, R. Del Zoppo, E. Brown,
[10] E. Spertus, "Smokey: Automatic recognition of hostile messages," and M. Downey, "Semantic analysis for monitoring insider threats,"
Innovative Applications of Artificial Intelligence (IAAI) ’97, 1997. Intelligence and Security Informatics, pp. 492-500, 2004.
[11] A. Razavi, D. Inkpen, S. Uritsky, and S. Matwin, "Offensive language [24] A. Orebaugh and D. J. Allnutt, "Data mining instant messaging
detection using multi-level classification," Advances in Artificial communications to perform author identification for cybercrime
Intelligence, vol. 6085/2010, pp. 16-27, 2010. investigations," Digital Forensics and Cyber Crime, pp. 99-110, 2010.
[12] A. Mahmud, Ahmed, Kazi Zubair, and Khan, Mumit "Detecting [25] J. Ma, G. Teng, S. Chang, X. Zhang, and K. Xiao, "Social network
flames and insults in text," in Proc. of 6th International Conference on analysis based on authorship identification for cybercrime
Natural Language Processing (ICON' 08), 2008. investigation," Intelligence and Security Informatics, pp. 27-35, 2011.
[13] D. Yin, Z. Xue, L. Hong, and B. Davison, "Detection of harassment [26] C. Whitelaw, N. Garg, and S. Argamon, "Using appraisal groups for
on Web 2.0," in the Content Analysis in the Web 2.0 Workshop, sentiment analysis," in Proceedings of the 14th ACM International
2009. Conference on Information and Knowledge Management NY, USA,
[14] Z. Xu and S. Zhu, "Filtering offensive language in online 2005, pp. 625-631.
communities using grammatical relations," in Proceedings of The [27] Q. Ye, W. Shi, and Y. Li, "Sentiment classification for movie reviews
Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and in Chinese by improved semantic oriented approach," in HICSS '06.
Spam Conference (CEAS'10), 2010. Proceedings of the 39th Annual Hawaii International Conference on
[15] C. Zhang, D. Zeng, J. Li, F. Y. Wang, and W. Zuo, "Sentiment System Sciences, 2006, pp. 53b-53b.
analysis of Chinese documents: from sentence to document level,"
Journal of the American Society for Information Science and
Technology, vol. 60, pp. 2474-2487, 2009.