0% found this document useful (0 votes)
218 views

A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

The document provides a hands-on guide to using transformer models like BERT, XLNet, XLM, and RoBERTa for text classification tasks. It discusses setting up the environment, preparing the data, converting the data to the required format, fine-tuning the transformer models, and evaluating the models. The guide uses Jupyter notebooks and walks through loading a dataset, preprocessing the text data, converting it to numerical features, training the models, and evaluating the performance.

Uploaded by

sita devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views

A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

The document provides a hands-on guide to using transformer models like BERT, XLNet, XLM, and RoBERTa for text classification tasks. It discusses setting up the environment, preparing the data, converting the data to the required format, fine-tuning the transformer models, and evaluating the models. The guide uses Jupyter notebooks and walks through loading a dataset, preprocessing the text data, converting it to numerical features, training the models, and evaluating the performance.

Uploaded by

sita devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

You have 2 free member-only stories left this month. Sign up and get an extra one for free.

A Hands-On Guide To Text Classi cation With


Transformer Models (XLNet, BERT, XLM,
RoBERTa)
A step-by-step tutorial on using Transformer Models for Text Classi cation tasks. Learn how to
load, ne-tune, and evaluate text classi cation tasks with the Pytorch-Transformers library.
Includes ready-to-use code for BERT, XLNet, XLM, and RoBERTa models.

Thilina Rajapakse Follow


Sep 3, 2019 · 8 min read

Photo by Arseny Togulev on Unsplash

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 1/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

Update Notice
Please consider using the Simple Transformers library as it is easy to use, feature-
packed, and regularly updated. The article still stands as a reference to BERT models
and is likely to be helpful with understanding how BERT works. However, Simple
Transformers offers a lot more features, much more straightforward tuning options, all
the while being quick and easy to use! The links below should help you get started
quickly.

1. Binary Classification

2. Multi-Class Classification

3. Multi-Label Classification

4. Named Entity Recognition (Part-of-Speech Tagging)

5. Question Answering

6. Sentence-Pair Tasks and Regression

7. Conversational AI

8. Language Model Fine-Tuning

9. ELECTRA and Language Model Training from Scratch

10. Visualising Model Training

The Pytorch-Transformers (now Transformers) library has moved on quite a bit


since this article was written. I recommend using SimpleTransformers as it is kept
up to date with the Transformers library and is significantly more user-friendly.
While the ideas and concepts in this article still stand, the code and the Github
repo are no longer actively maintained.

1. Language Model Fine-Tuning

2. ELECTRA and Language Model Training from Scratch

3. Visualising Model Training

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 2/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

I highly recommend cloning the Github repo for this article and running the code while you
follow the guide. It should help you understand both the guide and the code better. Reading
is great, but coding is better. 😉

Special thanks to Hugging Face for their Pytorch-Transformers library for making
Transformer Models easy and fun to play with!

1. Introduction
Transformer models have taken the world of Natural Language Processing by storm,
transforming (sorry!) the field by leaps and bounds. New, bigger, and better models
seem to crop up almost every month, setting new benchmarks in performance across a
wide variety of tasks.

This post is intended as a straightforward guide to utilizing these awesome models for
text classification tasks. As such, I won’t be talking about the theory behind the
networks, or how they work under the hood. If you are interested in diving into the nitty-
gritty of Transformers, my recommendation is Jay Alammar’s Illustrated Guides here.

This also serves as an update to my earlier guide on Using BERT for Binary Text
Classification. I’ll be using the same dataset (Yelp Reviews) that I used the last time to
avoid having to download a new dataset because I’m lazy and I have terrible internet.
The motivation behind the update is down to several reasons, including the update to
the HuggingFace library I used for the previous guide, as well as the release of multiple
new Transformer models which have managed to knock BERT off its perch.

With the background set, let’s take a look at what we’ll be doing.

1. Setting up the development environment, with the Pytorch-Transformers library by


HuggingFace.

2. Converting .csv datasets to .tsv format used by the HuggingFace library.

3. Setting up pre-trained models.

4. Converting data into features.

5. Fine-tuning models.

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 3/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

6. Evaluation.

I’ll be using two Jupyter Notebooks, one for data preparation, and one for training and
evaluation.

2. On your marks
Let’s set up the environment.
1. It’s highly recommended to use a virtual environment when installing and working
with various Python libraries. My personal favourite is Anaconda, but you can use
anything you wish.
conda create -n transformers python pytorch pandas tqdm jupyter

conda activate transformers

conda install -c anaconda scikit-learn

pip install pytorch-transformers

pip install tensorboardX

Please note that there may be additional packages used in the guide that are not
installed here. If run into missing packages, simply install them via conda or pip.

2. Linux users can use the shell script here to download and extract the Yelp Reviews
Polarity dataset. Others can manually download it here at fast.ai. Also, direct
download link.
I placed the train.csv and test.csv files in a directory named data .

<starting_directory>/data/

3. Get set
Time to get the data ready for Transformer models.
Most online datasets will typically be in .csv format. Following the norm, the Yelp
dataset contains two csv files train.csv and test.csv .

Kicking off our first (data preparation) notebook, let’s load the csv files in with Pandas.

However, the labels used here break the norm by being 1 and 2 instead of the usual 0
and 1. I’m all for a bit of rebellion, but this just puts me off. Let’s fix this so that the labels
https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 4/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

are 0 and 1, indicating a bad review and good review respectively.

We need to do some final bit of retouching before our data is ready for the Pytorch-
Transformer models. The data needs to be in tsv format, with four columns, and no
header.

guid: An ID for the row.

label: The label for the row (should be an int).

alpha: A column of the same letter for all rows. Not used in classification but still
needed.

text: The text for the row.

So, let’s get the data in order, and save it in tsv format.

This marks the end of the data preparation Notebook, and we’ll continue with the
training Notebook from the next section.

4. Go! (Almost)
From text to features.
Before we can start the actual training, we need to convert our data from text into
numerical values that can be fed into neural networks. In the case of Transformer
models, the data will be represented as InputFeature objects.

To make our data Transformer-ready, we’ll be using the classes and functions in the file
utils.py . (Brace yourself, a wall of code incoming!)

Let’s look at the important bits.

The InputExample class represents a single sample of our dataset;

guid : a unique ID

text_a : Our actual text

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 5/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

text_b : Not used in classification

label : The label of the sample

The DataProcessor and BinaryProcessor classes are used to read in the data from tsv

files and convert it into InputExamples .

The InputFeature class represents the pure, numerical data that can be fed to a
Transformer.

The three functions convert_example_to_feature , convert_examples_to_features ,

_truncate_seq_pair are used to convert InputExamples into InputFeatures which will


finally be sent to the Transformer model.

The conversion process includes tokenization, and converting all sentences to a given
sequence length (truncating longer sequences, and padding shorter sequences). During
tokenization, each word in the sentence is broken apart into smaller and smaller tokens
(word pieces) until all the tokens in the dataset are recognized by the Transformer.

As a contrived example, let’s say we have the word understanding. The Transformer we
are using does not have a token for understanding but it has separate tokens for
understand and ing. Then, the word understanding would be broken into the tokens
understand and ing. The sequence length is the number of such tokens in the sequence.

The convert_example_to_feature function takes a single sample of data and converts it


into an InputFeature . The convert_examples_to_features function takes a list of
examples and returns a list of InputFeatures by using the convert_example_to_feature

function. The reason behind there being two separate functions is to allow us to use
Multiprocessing in the conversion process. By default, I’ve set the process count to
cpu_count() - 2 , but you can change it by passing a value for the process_count

parameter in the convert_examples_to_features function.

Now, we can go to our training notebook and import the stuff we’ll use and configure
our training options.

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 6/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

Go through the args dictionary carefully and note all the different settings you can
configure for training. In my case, I am using fp16 training to lower memory usage and
speed up training. If you don’t have Nvidia Apex installed, you will have to turn off fp16
by setting it to False.

In this guide, I am using the XL-Net model with a sequence length of 128. Please refer to
the Github repo for the full list of available models.

Now, we are ready to load our model for training.

The coolest thing about the Pytorch-Transformers library is that you can use any of the
MODEL_CLASSES above, just by changing the model_type and model_name in the arguments
dictionary. The process for fine-tuning, and evaluating is basically the same for all the
models. All hail HuggingFace!

Next, we have functions defining how to load data, train a model, and to evaluate a
model.

Finally, we have everything ready to tokenize our data and train our model.

5. Go! (Really)
Training.
It should be fairly straightforward from here.

This will convert the data into features and start the training process. The converted
features will be automatically cached, and you can reuse them later if you want to run
the same experiment. However, if you change something like the max_seq_length , you

will need to reprocess the data. Same goes for changing the model used. To reprocess
the data, simply set reprocess_input_data to True in the args dictionary.

For comparison, this dataset took about 3 hours for training on my RTX 2080.

Once training completes, we can save everything.

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 7/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

6. Looking back
Evaluation.
Evaluation is quite easy as well.

Without any parameter tuning, and with one training epoch, my results are as follows.

INFO:__main__:***** Eval results *****


INFO:__main__: fn = 1238
INFO:__main__: fp = 809
INFO:__main__: mcc = 0.8924906867291726
INFO:__main__: tn = 18191
INFO:__main__: tp = 17762

Not too shabby!

7. Wrap up
Transformer models have displayed incredible prowess in handling a wide variety of
Natural Language Processing tasks. Here, we’ve looked at how we can use them for one
of the most common tasks, which is Sequence Classification.

The Pytorch-Transformers library by HuggingFace makes it almost trivial to harness the


power of these mammoth models!

8. Final Thoughts
When working with your own datasets, I recommend editing the data_prep.ipynb

notebook to save your data files as tsv files. For most cases, you should be able to
get things running by simply making sure that the correct columns containing the
text and the labels are passed to the train_df and the dev_df constructors. You
could also define your own class that inherits from the DataProcessor class in the
utils.py file, but I feel the first approach is simpler.

Please do use the Github repo as opposed to copying and pasting from the post here.
Any fixes or extra features will be added to the Github repo and is unlikely to be

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 8/9
6/28/2020 A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)

added here unless it is a breaking change. Code is embedded to Medium using Gists,
and as such, they are not automatically synced with the repo code.

If you need support, or you spot a bug, opening an issue on the Github repo will
probably get a quicker response than comments on this article. It’s easy to miss
comments on here, and the lack of comment/chat threads makes it difficult to
follow. As a bonus, other people struggling with the same issue will probably be able
to find the answer easier if it was on Github rather than on a Medium response.

Data Science Arti cial Intelligence Pytorch Naturallanguageprocessing

About Help Legal

Get the Medium app

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca 9/9

You might also like