Cyberbullying detection using Emolex lexicon to analyse tweets. This template will be used for an hackaton organized by [AIInAfrica}(https://aiinafrica.org)
- Installation
- Project Analysis
- Metric
- File Descriptions
- Results
- Licensing, Authors, and Acknowledgements
The code in this project is written in Python 3.6.6 :: Anaconda custom (64-bit). The following additional libraries have been used:
- pandas
- numpy
- matplotlib
- seaborn
- sklearn
- warnings
- time
- mpl_toolkits.mplot3d
We present an analysis of emotions linked to tweets in order to detect instances of cyberbulling. The tweets dataset has been manually collected using twitter APIs by Margarita Bugueño, Fabián Fernandez and Francisco Mena.
The NRC Emotion Lexicon (aka Emolex) is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). It has been developed by Saif Mohammad and is a lexic tag based on the Plutchick wheel of emotions. The annotations were manually done by crowdsourcing.
The metric used to compare all models is RMSE. The loss function used is MSE.
The Jupyter notebooks included in this project are:
- Cyberbullying.ipynb
Data files (under data directory):
- tweets : directory with repository of csv files
We load a dataset of tweets and score them using Emolex. The aggregated data ais then displayed using Visualizations. We want to try find a metric to detect potential "bullies" "or bullied" twitter users.
For licensing see LICENSE file. The tweets dataset has been manually collected using twitter APIs by Margarita Bugueño, Fabián Fernandez and Francisco Mena. Emolex has been developed by Saif Mohammad