0% found this document useful (0 votes)
30 views

Emoji2vec: Learning Emoji Representations From Their Description

The document presents a method to learn emoji representations from their descriptions in the Unicode standard. It introduces emoji2vec, pretrained embeddings for all Unicode emojis that outperform models trained on tweets for sentiment analysis, while requiring less data. It also provides a qualitative analysis of the emoji embedding space.

Uploaded by

JHCP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Emoji2vec: Learning Emoji Representations From Their Description

The document presents a method to learn emoji representations from their descriptions in the Unicode standard. It introduces emoji2vec, pretrained embeddings for all Unicode emojis that outperform models trained on tweets for sentiment analysis, while requiring less data. It also provides a qualitative analysis of the emoji embedding space.

Uploaded by

JHCP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/311990263

emoji2vec: Learning Emoji Representations from their Description

Conference Paper · January 2016


DOI: 10.18653/v1/W16-6208

CITATIONS READS
122 3,471

5 authors, including:

Ben Eisner Tim Rocktäschel


Princeton University University College London
2 PUBLICATIONS 127 CITATIONS 97 PUBLICATIONS 3,005 CITATIONS

SEE PROFILE SEE PROFILE

Sebastian Riedel
University College London
205 PUBLICATIONS 8,780 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Factmata View project

emoji2vec: Learning Emoji Representation from their Description View project

All content following this page was uploaded by Tim Rocktäschel on 19 February 2017.

The user has requested enhancement of the downloaded file.


emoji2vec: Learning Emoji Representations from their Description
Ben Eisner Tim Rocktäschel
Princeton University University College London
[email protected] [email protected]

Isabelle Augenstein Matko Bošnjak


University College London University College London
[email protected] [email protected]

Sebastian Riedel
University College London
[email protected]

Abstract year of the emoji, citing an increase in usage of over


800% during the course of the year, and elected the
Many current natural language processing ap- ‘Face with Tears of Joy’ emoji ( ) as the Word of
plications for social media rely on represen- the Year. As of this writing, over 10% of Twitter
tation learning and utilize pre-trained word posts and over 50% of text on Instagram contain one
embeddings. There currently exist several
or more emoji (Cruse, 2015).2 Due to their popular-
publicly-available, pre-trained sets of word
embeddings, but they contain few or no emoji ity and broad usage, they have been the subject of
representations even as emoji usage in social much formal and informal research in language and
media has increased. In this paper we re- social communication, as well as in natural language
lease emoji2vec, pre-trained embeddings processing (NLP).
for all Unicode emoji which are learned from In the context of social sciences, research has fo-
their description in the Unicode emoji stan-
cused on emoji usage as a means of expressing emo-
dard.1 The resulting emoji embeddings can
be readily used in downstream social natu-
tions on mobile platforms. Interestingly, Kelly and
ral language processing applications alongside Watts (2015) found that although essentially thought
word2vec. We demonstrate, for the down- of as means of expressing emotions, emoji have
stream task of sentiment analysis, that emoji been adopted as tools to express relationally useful
embeddings learned from short descriptions roles in conversation. (Lebduska, 2014) showed that
outperforms a skip-gram model trained on a emoji are culturally and contextually bound, and are
large collection of tweets, while avoiding the open to reinterpretation and misinterpretation, a re-
need for contexts in which emoji need to ap-
sult confirmed by (Miller et al., 2016). These find-
pear frequently in order to estimate a represen-
tation. ings have paved the way for many formal analyses
of semantic characteristics of emoji.
Concurrently we observe an increased interest
1 Introduction in natural language processing on social media
data (Ritter et al., 2011; Gattani et al., 2013; Rosen-
First introduced in 1997, emoji, a standardized set
thal et al., 2015). Many current NLP systems ap-
of small pictorial glyphs depicting everything from
plied to social media rely on representation learning
smiling faces to international flags, have seen a dras-
and word embeddings (Tang et al., 2014; Dong et
tic increase in usage in social media over the last
al., 2014; Dhingra et al., 2016; Augenstein et al.,
decade. The Oxford Dictionary named 2015 the
1 2
http://www.unicode.org/emoji/charts/ See https://twitter.com/Kyle_MacLachlan/
full-emoji-list.html status/765390472604971009 for an extreme example.

48

Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pages 48–54,
Austin, TX, November 1, 2016. c 2016 Association for Computational Linguistics
2016). Such systems often rely on pre-trained word
embeddings that can for instance be obtained from
word2vec (Mikolov et al., 2013a) or GloVe (Pen- Figure 1: Example description of U+1F574. We also use busi-
nington et al., 2014). Yet, neither resource contain ness, man and suit keywords for training.
a complete set of Unicode emoji representations,
which suggests that many social NLP applications ondly, our method works with much less data. In-
could be improved by the addition of robust emoji stead of training on millions of tweets, our represen-
representations. tations are trained on only a few thousand descrip-
In this paper we release emoji2vec, embed- tions. Still, we obtain higher accuracy results on a
dings for emoji Unicode symbols learned from their Twitter sentiment analysis task.
description in the Unicode emoji standard. We In addition, our work relates to the work of Hill et
demonstrate the usefulness of emoji representations al. (2016) who built word representations for words
trained in this way by evaluating on a Twitter senti- and concepts based on their description in a dictio-
ment analysis task. Furthermore, we provide a qual- nary. Similarly to their approach, we build repre-
itative analysis by investigating emoji analogy ex- sentations for emoji based on their descriptions and
amples and visualizing the emoji embedding space. keyword phrases.
Some of the limitations of our work are evident in
2 Related Work the work of Park et al. (2013) who showed that dif-
ferent cultural phenomena and languages may co-
There has been little work in distributional embed-
opt conventional emoji sentiment. Since we train
dings of emoji. The first research done in this direc-
only on English-language definitions and ignore
tion was an informal blog post by the Instagram Data
temporal definitions of emoji, our training method
Team in 2015 (Dimson, 2015). They generated vec-
might not capture the full semantic characteristics
tor embeddings for emoji similar to skip-gram-based
of an emoji.
vectors by training on the entire corpus of Insta-
gram posts. Their research gave valuable insight into 3 Method
the usage of emoji on Instagram, and showed that
distributed representations can help understanding Our method maps emoji symbols into the same
emoji semantics in everyday usage. The second con- space as the 300-dimensional Google News
tribution, closest to ours, was introduced by (Bar- word2vec embeddings. Thus, the resulting
bieri et al., 2016). They trained emoji embeddings emoji2vec embeddings can be used in addition
from a large Twitter dataset of over 100 million En- to 300-dimensional word2vec embeddings in
glish tweets using the skip-gram method (Mikolov any application. To this end we crawl emoji, their
et al., 2013a). These pre-trained emoji representa- name and their keyword phrases from the Unicode
tions led to increased accuracy on a similarity task, emoji list, resulting in 6088 descriptions of 1661
and a meaningful clustering of the emoji embedding emoji symbols. Figure 1 shows an example for an
space. While this method is able to learn robust uncommon emoji.
representations for frequently-used emoji, represen-
3.1 Model
tations of less frequent emoji are estimated rather
poorly or not available at all. In fact, only around We train emoji embeddings using a simple method.
700 emoji can be found in Barbieri et al. (2016)’s For every training example consisting of an emoji
corpus, while there is support of over 1600 emoji in and a sequence of words w1 , . . . , wN describing that
the Unicode standard. emoji, we take the sum of the individual word vec-
Our approach differs in two important aspects. tors in the descriptive phrase as found in the Google
First, since we are estimating the representation of News word2vec embeddings
emoji directly from their description, we obtain ro- N
X
bust representations for all supported emoji symbols vj = wk
— even the long tail of infrequently used ones. Sec- k=1

49
where wk is the word2vec vector for word wk if
that vector exists (otherwise we drop the summand)
and vj is the vector representation of the descrip-
tion. We define a trainable vector xi for every emoji
in our training set, and model the probability of a
match between the emoji representation xi and its
description representation vj using the sigmoid of
the dot product of the two representations σ(xTi vj ).
For training we use the logistic loss

L(i, j, yij ) = − log(σ(yij xTi vj − (1 − yij )xTi vj ))

where yij is 1 if description j is valid for emoji i and


0 otherwise. Figure 2: Receiver operating characteristic curve for learned
emoji vectors evaluated against the test set.
3.2 Optimization
Our model is implemented in TensorFlow (Abadi et ”crying”, True}, as well as the example { , ”fish”,
al., 2015) and optimized using stochastic gradient False}. We calculate σ(xTi vi ) for each example in
descent with Adam (Kingma and Ba, 2015) as opti- the test set, measuring the similarity between the
mizer. As we do not observe any negative training emoji vector and the sum of word vectors in the
examples (invalid descriptions of emoji do not ap- phrase.
pear in the original training set), to increase gener- When a classifier thresholds the above prediction
alization performance we randomly sample descrip- at 0.5 to determine a positive or negative correla-
tions for emoji as negative instances (i.e. induce a tion, we obtain an accuracy of 85.5% for classi-
mismatched description). One of the parameters of fying whether an emoji-description pair is valid or
our model is the ratio of negative samples to positive not. By varying the threshold used for this classifier,
samples; we found that having one positive example we obtain a receiver operating characteristic curve
per negative example produced the best results. We (Figure 4.1) with an area-under-the-curve of 0.933,
perform early-stopping on a held-out development which demonstrates that high quality of the learned
set and found 80 epochs of training to give the best emoji representations.
results. As we are only training on emoji descrip-
tions and our method is simple and cheap, training 4.2 Sentiment Analysis on Tweets
takes less than 3 minutes on a 2013 MacBook Pro. As downstream task we compare the accuracy of
sentiment classification of tweets for various classi-
4 Evaluation
fiers with three different sets of pre-trained word em-
We quantitatively evaluate our approach on an in- beddings: (1) the original Google News word2vec
trinsic (emoji-description classification) and extrin- embeddings, (2) word2vec augmented with emoji
sic (Twitter sentiment analysis) task. Furthermore, embeddings trained by Barbieri et al. (2016), and (3)
we give a qualitative analysis by visualizing the word2vec augmented with emoji2vec trained
learned emoji embedding space and investigating from Unicode descriptions. We use the recent
emoji analogy examples. dataset by Kralj Novak et al. (2015), which con-
sists of over 67k English tweets labelled manually
4.1 Emoji-Description Classification for positive, neutral, or negative sentiment. In both
To analyze how well our method models the distri- the training set and the test set, 46% of tweets are la-
bution of correct emoji descriptions, we created a beled neutral, 29% are labeled positive, and 25% are
manually-labeled test set containing pairs of emoji labeled negative. To compute the feature vectors for
and phrases, as well as a correspondence label. For training, we summed the vectors corresponding to
instance, our test set includes the example: { , each word or emoji in the text of the Tweet. The goal

50
of this simple sentiment analysis model is not to pro- words directly in the vector space. For instance, it
duce state-of-the-art results in sentiment analysis; it holds that the vector representation of ’king’ minus
is simply to show that including emoji adds discrim- ’man’ plus ’woman’ is closest to ’queen’ (Mikolov
inating information to a model, which could poten- et al., 2013b). Word embeddings have commonly
tially be exploited in more advanced social NLP sys- been evaluated on such word analogy tasks (Levy
tems. and Goldberg, 2014). Unfortunately, it is difficult
Because the labels are rather evenly distributed, to build such an analogy task for emoji due to the
accuracy is an effective metric in determining per- small number and semantically distinct categories
formance on this classification task. Results are of emoji. Nevertheless, we collected a few intuitive
reported in Table 1. We find that augmenting examples in Figure 4. For every query we have re-
word2vec with emoji embeddings improves over- trieved the closest five emoji. Though the correct
all classification accuracy on the full corpus, and answer is sometimes not the top one, it is often con-
substantially improves classification performance tained in the top three.
for tweets that contain emoji. It suggests that emoji
embeddings could improve performance for other
social NLP tasks as well. Furthermore, we find
that emoji2vec generally outperforms the emoji
embeddings trained by Barbieri et al. (2016), de-
spite being trained on much less data using a simple
model.
Figure 4: Emoji analogy exmaples. Notice that the seemingly
4.3 t-SNE Visualization ”correct” emoji often appears in the top three closest vectors,
To gain further insights, we project the learned but not always in the top spot (furthest to the left).
emoji embeddings into two-dimensional space us-
ing t-SNE (Maaten and Hinton, 2008). This method
5 Conclusion
projects high-dimensional embeddings into a lower-
dimensional space while attempting to preserve rel- Since existing pre-trained word embeddings such as
ative distances. We perform this projection of emoji Google News word2vec embeddings or GloVe
representation into two-dimensional space. fail to provide emoji embeddings, we have released
From Figure 4.3 we see a number of notable emoji2vec — embeddings of 1661 emoji sym-
semantic clusters, indicating that the vectors we bols. Instead of running word2vec’s skip-gram
trained have accurately captured some of the seman- model on a large collection of emoji and their con-
tic properties of the emoji. For instance, all flag texts appearing in tweets, emoji2vec is directly
symbols are clustered in the bottom, and many smi- trained on Unicode descriptions of emoji. The re-
ley faces in the center. Other prominent emoji clus- sulting emoji embeddings can be used to augment
ters include fruits, astrological signs, animals, vehi- any downstream task that currently uses word2vec
cles, or families. On the other hand, symbolic rep- embeddings, and might prove especially useful in
resentations of numbers are not properly disentan- social NLP tasks where emoji are used frequently
gled in the embedding space, indicating limitations (e.g. Twitter, Instagram, etc.). Despite the fact that
of our simple model. A two-dimensional projection our model is simpler and trained on much less data,
is convenient from a visualization perspective, and we outperform (Barbieri et al., 2016) on the task of
certainly shows that some intuitively similar emoji Twitter sentiment analysis.
are close to each other in vector space. As our approach directly works on Unicode de-
scriptions, it is not restricted to emoji symbols. In
4.4 Analogy Task the future we want to investigate the usefulness of
A well-known property of word2vec is that em- our method for other Unicode symbol embeddings.
beddings trained with this method to some ex- Furthermore, we plan to improve emoji2vec in
tent capture meaningful linear relationships between the future by also reading full text emoji description

51
Classification accuracy on entire dataset, N = 12920
Word Embeddings Random Forest Linear SVM
Google News 57.5 58.5
Google News + (Barbieri et al., 2016) 58.2* 60.0*
Google News + emoji2vec 59.5* 60.5*
Classification accuracy on tweets containing emoji, N = 2295
Word Embeddings Random Forrest Linear SVM
Google News 46.0 47.1
Google News + (Barbieri et al., 2016) 52.4* 57.4*
Google News + emoji2vec 54.4 * 59.2*
Classification accuracy on 90% most frequent emoji, N = 2186
Word Embeddings Random Forrest Linear SVM
Google News 47.3 45.1
Google News + (Barbieri et al., 2016) 52.8 * 56.9*
Google News + emoji2vec 55.0 * 59.5*
Classification accuracy on 10% least frequent emoji, N = 308
Word Embeddings Random Forrest Linear SVM
Google News 44.7 43.2
Google News + (Barbieri et al., 2016) 53.9* 52.9*
Google News + emoji2vec 54.5* 55.2*
Table 1: Three-way classification accuracy on the Twitter sentiment analysis corpus using Random Forrests (Ho, 1995) and Linear
SVM (Fan et al., 2008) classifier with different word embeddings. ”*” denotes results with significance of p < 0.05 as calculated
by McNemar’s test, with the respect to classification with Google News embeddings per each classifier, and dataset

Figure 3: Emoji vector embeddings, projected down into a 2-dimensional space using the t-SNE technique. Note the clusters of
similar emoji like flags (bottom), family emoji (top left), zodiac symbols (top left), animals (left), smileys (middle), etc.

52
from Emojipedia3 and using a recurrent neural net- Network for Target-dependent Twitter Sentiment Clas-
work instead of a bag-of-word-vectors approach for sification. In Proceedings of ACL, pages 49–54.
enocoding descriptions. In addition, since our ap- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-
proach does not capture the context-dependent def- Rui Wang, and Chih-Jen Lin. 2008. Liblinear: A
initions of emoji (such as sarcasm, or appropriation library for large linear classification. Journal of ma-
chine learning research, 9(Aug):1871–1874.
via other cultural phenomena), we would like to ex-
Abhishek Gattani, Digvijay S Lamba, Nikesh Garera,
plore mechanisms of efficiently capturing these nu-
Mitul Tiwari, Xiaoyong Chai, Sanjib Das, Sri Sub-
anced meanings. ramaniam, Anand Rajaraman, Venky Harinarayan,
and AnHai Doan. 2013. Entity Extraction, Link-
Data Release and Reproducibility ing, Classification, and Tagging for Social Media: A
Wikipedia-Based Approach. In Proceedings of the
Pre-trained emoji2vec embeddings as well as
VLDB Endowment, 6(11):1126–1137.
the training data and code are released at https:
Felix Hill, Kyunghyun Cho, Anna Korhonen, and Yoshua
//github.com/uclmr/emoji2vec. Note Bengio. 2016. Learning to Understand Phrases by
that the emoji2vec format is compatible with Embedding the Dictionary. TACL.
word2vec and can be loaded into gensim4 or sim- Tin Kam Ho. 1995. Random decision forests. In Doc-
ilar libraries. ument Analysis and Recognition, 1995., Proceedings
of the Third International Conference on, volume 1,
pages 278–282. IEEE.
References Ryan Kelly and Leon Watts. 2015. Characterising the
Martın Abadi, Ashish Agarwal, Paul Barham, Eugene inventive appropriation of emoji as relationally mean-
Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, ingful in mediated close personal relationships. Expe-
Andy Davis, Jeffrey Dean, Matthieu Devin, et al. riences of Technology Appropriation: Unanticipated
2015. TensorFlow: Large-Scale Machine Learning on Users, Usage, Circumstances, and Design.
Heterogeneous Distributed Systems. Software avail- Diederik Kingma and Jimmy Ba. 2015. Adam: A
able from tensorflow. org, 1. Method for Stochastic Optimization. In Proceedings
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, of ICLR.
and Kalina Bontcheva. 2016. Stance Detection with Petra Kralj Novak, Jasmina Smailović, Borut Sluban, and
Bidirectional Conditional Encoding. In Proceedings Igor Mozetič. 2015. Sentiment of Emojis. PLoS
of EMLNP. ONE, 10(12):1–22, 12.
Francesco Barbieri, Francesco Ronzano, and Horacio Lisa Lebduska. 2014. Emoji, Emoji, What for Art Thou?
Saggion. 2016. What does this Emoji Mean? A Vec- Harlot: A Revealing Look at the Arts of Persuasion,
tor Space Skip-Gram Model for Twitter Emojis. In 1(12).
Proceedings of LREC, May. Omer Levy and Yoav Goldberg. 2014. Linguistic Reg-
Joe Cruse. 2015. Emoji usage in TV conversation. ularities in Sparse and Explicit Word Representations.
Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, In Proceedings of ConLL, pages 171–180.
Michael Muehl, and William Cohen. 2016. Laurens van der Maaten and Geoffrey Hinton. 2008. Vi-
Tweet2Vec: Character-Based Distributed Representa- sualizing data using t-sne. Journal of Machine Learn-
tions for Social Media. In Proceedings of ACL, pages ing Research, 9(Nov):2579–2605.
269–274. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
Thomas Dimson. 2015. Machine Learning rado, and Jeff Dean. 2013a. Distributed Representa-
for Emoji Trends. http://instagram- tions of Words and Phrases and their Compositionality.
engineering.tumblr.com/post/ In Proceedings of NIPS, pages 3111–3119.
117889701472/emojineering-part-1-
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig.
machine-learning-for-emoji. Accessed:
2013b. Linguistic Regularities in Continuous Space
2016-09-05.
Word Representations. In Proceedings of NAACL-
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming HLT, pages 746–751.
Zhou, and Ke Xu. 2014. Adaptive Recursive Neural
Hannah Miller, Jacob Thebault-Spieker, Shuo Chang,
3
emojipedia.org Isaac Johnson, Loren Terveen, and Brent Hecht. 2016.
4
https://radimrehurek.com/gensim/models/ “Blissfully happy” or “ready to fight”: Varying Inter-
word2vec.html pretations of Emoji. In Proceedings of ICWSM.

53
Jaram Park, Vladimir Barash, Clay Fink, and Meeyoung
Cha. 2013. Emoticon style: Interpreting differences
in emoticons across cultures. In ICWSM.
Jeffrey Pennington, Richard Socher, and Christopher
Manning. 2014. Glove: Global Vectors for Word Rep-
resentation. In Proceedings of EMNLP, pages 1532–
1543, October.
Alan Ritter, Sam Clark, Mausam, and Oren Etzioni.
2011. Named Entity Recognition in Tweets: An Ex-
perimental Study. In Proceedings of EMNLP, pages
1524–1534.
Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko,
Saif Mohammad, Alan Ritter, and Veselin Stoyanov.
2015. SemEval-2015 Task 10: Sentiment Analysis in
Twitter. In Proceedings of the SemEval, pages 451–
463.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu,
and Bing Qin. 2014. Learning Sentiment-Specific
Word Embedding for Twitter Sentiment Classification.
In Proceedings of ACL, pages 1555–1565.

54

View publication stats

You might also like