Zero-Shot Text Classification using HuggingFace Model

Last Updated : 06 Jun, 2024

Zero-shot text classification is a groundbreaking technique that allows for categorizing text into predefined labels without any prior training on those specific labels. This method is particularly useful when labeled data is scarce or unavailable. Leveraging the HuggingFace Transformers library, we can easily implement zero-shot classification using pre-trained models. In this article, we'll explore how to use the HuggingFace pipeline for zero-shot classification and create an interactive web interface using Gradio.

Understanding Zero-Shot Classification

Zero-shot classification relies on pre-trained language models that understand language context deeply. These models can be prompted with new tasks, such as classification, by providing text and candidate labels. The model evaluates the text against the labels and assigns probabilities to each label based on its understanding.

HuggingFace Transformers

The HuggingFace Transformers library provides an easy-to-use interface for various natural language processing tasks, including zero-shot classification. One of the most popular models for this task is facebook/bart-large-mnli, which is based on the BART model and fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset.

Implementing Zero-Shot Classification

Step 1: Install HuggingFace Transformers

First, ensure that you have the HuggingFace Transformers library installed:

pip install transformers

Step 2: Initialize the Zero-Shot Classification Pipeline

Next, we initialize the zero-shot classification pipeline using the facebook/bart-large-mnli model:

from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

Step 3: Perform Classification

We can now classify a sample text into predefined labels. Here’s an example:

text = "The company's quarterly earnings increased by 20%, exceeding market expectations."
candidate_labels = ["finance", "sports", "politics", "technology"]

result = classifier(text, candidate_labels)
print(result)

Code of Zero-Shot Classification

Python

from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

text = "The company's quarterly earnings increased by 20%, exceeding market expectations."
candidate_labels = ["finance", "sports", "politics", "technology"]

result = classifier(text, candidate_labels)

print(result)

Output:

{'sequence': "The company's quarterly earnings increased by 20%, exceeding market expectations.", 'labels': ['finance', 'technology', 'sports', 'politics'], 'scores': [0.6282334327697754, 0.22457945346832275, 0.08779555559158325, 0.05939162150025368]}

Evaluating Zero-Shot Classification

To evaluate the performance, you can compare the predicted labels with true labels using metrics like precision, recall, and F1-score. Here’s an example using a small dataset:

Python

from sklearn.metrics import classification_report

texts = ["The stock market is up today.", "The new movie is a great thriller.", "The football match was exciting."]
true_labels = ["finance", "entertainment", "sports"]
predicted_labels = []

for text in texts:
    result = classifier(text, candidate_labels=["finance", "entertainment", "sports"])
    predicted_labels.append(result['labels'][0])

print(classification_report(true_labels, predicted_labels))

Output:

               precision    recall  f1-score   support

entertainment       0.50      1.00      0.67         1
      finance       1.00      1.00      1.00         1
       sports       0.00      0.00      0.00         1

     accuracy                           0.67         3
    macro avg       0.50      0.67      0.56         3
 weighted avg       0.50      0.67      0.56         3

Creating an Interactive Interface with Gradio

Gradio provides an easy way to create web interfaces for machine learning models. We can use Gradio to build an interactive interface for zero-shot classification.

Step 1: Install Gradio

First, install Gradio:

pip install gradio

Step 2: Define the Classification Function

Create a function that takes text and labels as inputs and returns the classification results:

import gradio as gr
from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define the classification function
def classify_text(text, labels):
    labels = labels.split(",")
    result = classifier(text, candidate_labels=labels)
    return {label: score for label, score in zip(result["labels"], result["scores"])}

Step 3: Create the Gradio Interface

Set up the Gradio interface with text inputs for the sentence and labels, and a label output:

# Create the Gradio interface
interface = gr.Interface(
    fn=classify_text,
    inputs=[
        gr.inputs.Textbox(lines=2, placeholder="Enter text here..."),
        gr.inputs.Textbox(lines=1, placeholder="Enter comma-separated labels here...")
    ],
    outputs=gr.outputs.Label(num_top_classes=3),
    title="Zero-Shot Text Classification",
    description="Classify text into labels without training data.",
)

# Launch the interface
interface.launch()

Complete Code for Creating an Interactive Interface with Gradio

Python

import gradio as gr
from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define the classification function
def classify_text(text, labels):
    labels = labels.split(",")
    result = classifier(text, candidate_labels=labels)
    return {label: score for label, score in zip(result["labels"], result["scores"])}

# Create the Gradio interface
interface = gr.Interface(
    fn=classify_text,
    inputs=[
        gr.Textbox(lines=2, placeholder="Enter text here..."),
        gr.Textbox(lines=1, placeholder="Enter comma-separated labels here...")
    ],
    outputs=gr.Label(num_top_classes=3),
    title="Zero-Shot Text Classification",
    description="Classify text into labels without training data.",
)

# Launch the interface
interface.launch()

Output:

Zero-Shot-Classification — Gradio Interface for Interactive Zero Shot Classification

Conclusion

Zero-shot text classification using the HuggingFace Transformers library offers a flexible and powerful way to categorize text without the need for labeled training data. By leveraging models like facebook/bart-large-mnli, we can achieve high accuracy in various classification tasks. Additionally, integrating this functionality with Gradio allows for easy deployment of interactive web interfaces, making it accessible to a wider audience. This approach opens up numerous possibilities for real-world applications where labeled data is not readily available.

Zero-Shot Text Classification using HuggingFace Model

mohammedraziullahansari

Improve

Article Tags :

Zero-Shot Text Classification using HuggingFace Model

Understanding Zero-Shot Classification

HuggingFace Transformers

Implementing Zero-Shot Classification

Step 1: Install HuggingFace Transformers

Step 2: Initialize the Zero-Shot Classification Pipeline

Step 3: Perform Classification

Code of Zero-Shot Classification

Evaluating Zero-Shot Classification

Creating an Interactive Interface with Gradio

Step 1: Install Gradio

Step 2: Define the Classification Function

Step 3: Create the Gradio Interface

Complete Code for Creating an Interactive Interface with Gradio

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?