Open In App

Zero-Shot Text Classification using HuggingFace Model

Last Updated : 06 Jun, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Zero-shot text classification is a groundbreaking technique that allows for categorizing text into predefined labels without any prior training on those specific labels. This method is particularly useful when labeled data is scarce or unavailable. Leveraging the HuggingFace Transformers library, we can easily implement zero-shot classification using pre-trained models. In this article, we'll explore how to use the HuggingFace pipeline for zero-shot classification and create an interactive web interface using Gradio.

Understanding Zero-Shot Classification

Zero-shot classification relies on pre-trained language models that understand language context deeply. These models can be prompted with new tasks, such as classification, by providing text and candidate labels. The model evaluates the text against the labels and assigns probabilities to each label based on its understanding.

HuggingFace Transformers

The HuggingFace Transformers library provides an easy-to-use interface for various natural language processing tasks, including zero-shot classification. One of the most popular models for this task is facebook/bart-large-mnli, which is based on the BART model and fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset.

Implementing Zero-Shot Classification

Step 1: Install HuggingFace Transformers

First, ensure that you have the HuggingFace Transformers library installed:

pip install transformers

Step 2: Initialize the Zero-Shot Classification Pipeline

Next, we initialize the zero-shot classification pipeline using the facebook/bart-large-mnli model:

from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

Step 3: Perform Classification

We can now classify a sample text into predefined labels. Here’s an example:

text = "The company's quarterly earnings increased by 20%, exceeding market expectations."
candidate_labels = ["finance", "sports", "politics", "technology"]

result = classifier(text, candidate_labels)
print(result)

Code of Zero-Shot Classification

Python
from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

text = "The company's quarterly earnings increased by 20%, exceeding market expectations."
candidate_labels = ["finance", "sports", "politics", "technology"]

result = classifier(text, candidate_labels)

print(result)

Output:

{'sequence': "The company's quarterly earnings increased by 20%, exceeding market expectations.", 'labels': ['finance', 'technology', 'sports', 'politics'], 'scores': [0.6282334327697754, 0.22457945346832275, 0.08779555559158325, 0.05939162150025368]}

Evaluating Zero-Shot Classification

To evaluate the performance, you can compare the predicted labels with true labels using metrics like precision, recall, and F1-score. Here’s an example using a small dataset:

Python
from sklearn.metrics import classification_report

texts = ["The stock market is up today.", "The new movie is a great thriller.", "The football match was exciting."]
true_labels = ["finance", "entertainment", "sports"]
predicted_labels = []

for text in texts:
    result = classifier(text, candidate_labels=["finance", "entertainment", "sports"])
    predicted_labels.append(result['labels'][0])

print(classification_report(true_labels, predicted_labels))

Output:

               precision    recall  f1-score   support

entertainment 0.50 1.00 0.67 1
finance 1.00 1.00 1.00 1
sports 0.00 0.00 0.00 1

accuracy 0.67 3
macro avg 0.50 0.67 0.56 3
weighted avg 0.50 0.67 0.56 3

Creating an Interactive Interface with Gradio

Gradio provides an easy way to create web interfaces for machine learning models. We can use Gradio to build an interactive interface for zero-shot classification.

Step 1: Install Gradio

First, install Gradio:

pip install gradio

Step 2: Define the Classification Function

Create a function that takes text and labels as inputs and returns the classification results:

import gradio as gr
from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define the classification function
def classify_text(text, labels):
labels = labels.split(",")
result = classifier(text, candidate_labels=labels)
return {label: score for label, score in zip(result["labels"], result["scores"])}

Step 3: Create the Gradio Interface

Set up the Gradio interface with text inputs for the sentence and labels, and a label output:

# Create the Gradio interface
interface = gr.Interface(
fn=classify_text,
inputs=[
gr.inputs.Textbox(lines=2, placeholder="Enter text here..."),
gr.inputs.Textbox(lines=1, placeholder="Enter comma-separated labels here...")
],
outputs=gr.outputs.Label(num_top_classes=3),
title="Zero-Shot Text Classification",
description="Classify text into labels without training data.",
)

# Launch the interface
interface.launch()

Complete Code for Creating an Interactive Interface with Gradio

Python
import gradio as gr
from transformers import pipeline

# Initialize the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Define the classification function
def classify_text(text, labels):
    labels = labels.split(",")
    result = classifier(text, candidate_labels=labels)
    return {label: score for label, score in zip(result["labels"], result["scores"])}

# Create the Gradio interface
interface = gr.Interface(
    fn=classify_text,
    inputs=[
        gr.Textbox(lines=2, placeholder="Enter text here..."),
        gr.Textbox(lines=1, placeholder="Enter comma-separated labels here...")
    ],
    outputs=gr.Label(num_top_classes=3),
    title="Zero-Shot Text Classification",
    description="Classify text into labels without training data.",
)

# Launch the interface
interface.launch()

Output:

Zero-Shot-Classification
Gradio Interface for Interactive Zero Shot Classification

Conclusion

Zero-shot text classification using the HuggingFace Transformers library offers a flexible and powerful way to categorize text without the need for labeled training data. By leveraging models like facebook/bart-large-mnli, we can achieve high accuracy in various classification tasks. Additionally, integrating this functionality with Gradio allows for easy deployment of interactive web interfaces, making it accessible to a wider audience. This approach opens up numerous possibilities for real-world applications where labeled data is not readily available.


Next Article

Similar Reads