Code Along for Text Classification for Customer Support Ticket Routing

Step 1: Set up the Project Environment**

To set up the project environment, you need to install the required libraries and import the necessary modules and functions. Additionally, you need to load the customer support ticket dataset into a pandas DataFrame.

!pip install transformers
!pip install pandas
!pip install scikit-learn
!pip install flask

from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, pipeline
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
from flask import Flask, request, jsonify

# Load the customer support ticket dataset into a pandas DataFrame
df = pd.read_csv('customer_support_tickets.csv')

Step 2: Preprocess the Text Data

In this step, you will clean the text data by removing unnecessary characters, numbers, and special symbols. Then, you will split the dataset into training and testing sets. Finally, you will encode the text labels into numerical values.

import re

# Clean the text data
def clean_text(text):
    text = text.lower()  # Convert to lowercase
    text = re.sub(r'\d+', '', text)  # Remove numbers
    text = re.sub(r'[^\w\s]', '', text)  # Remove special characters
    return text

df['cleaned_text'] = df['text'].apply(clean_text)

# Split the dataset into training and testing sets
train_texts, test_texts, train_labels, test_labels = train_test_split(df['cleaned_text'], df['label'], test_size=0.2, random_state=42)

# Encode the text labels into numerical values
label_encoder = LabelEncoder()
train_labels = label_encoder.fit_transform(train_labels)
test_labels = label_encoder.transform(test_labels)

Step 3: Initialize the Text Classification Model

In this step, you will load a pre-trained language model suitable for text classification tasks. Then, you will define the text classification model architecture and configuration. Finally, you will create a data processing function to prepare the text data for model training.

# Load a pre-trained language model suitable for text classification tasks
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=len(label_encoder.classes_))

# Define the text classification model architecture and configuration
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a data processing function to prepare the text data for model training
def preprocess_function(examples):
    return tokenizer(examples['text'], padding=True, truncation=True)

Step 4: Train the Text Classification Model

In this step, you will fine-tune the pre-trained model using the training dataset. You will set up the training parameters, such as the number of epochs and batch size. Finally, you will evaluate the model’s performance on the testing dataset using accuracy as the metric.

# Fine-tune the pre-trained model using the training dataset
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy='epoch'
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

trainer.train()

# Evaluate the model's performance on the testing dataset using accuracy as the metric
results = trainer.evaluate()
print(f"Accuracy: {results['eval_accuracy']}")

Step 5: Implement Ticket Classification Service

In this step, you will create a Flask web application. Then, you will define a route for the ticket classification service. Next, you will load the trained model for making predictions. Finally, you will preprocess the incoming ticket text using the same data processing function as during training and return the predicted category for the ticket.

app = Flask(__name__)

@app.route('/classify', methods=['POST'])
def classify_ticket():
    ticket_text = request.form['text']
    
    # Preprocess the incoming ticket text using the same data processing function as during training
    processed_input = preprocess_function({'text': [ticket_text]})
    
    # Make a prediction using the trained model
    predictions = trainer.predict(processed_input)
    predicted_label = label_encoder.inverse_transform(predictions.argmax(axis=1))[0]
    
    return jsonify({'predicted_label': predicted_label})

if __name__ == '__main__':
    app.run()

Step 6: Deploy the Ticket Classification Service

To deploy the ticket classification service, you can run the Flask application locally. You can also test the ticket classification service using sample ticket texts.

if __name__ == '__main__':
    app.run()

You can test the ticket classification service using sample ticket texts by sending a POST request to the ‘/classify’ route with the text data. Here’s an example using the requests library:

import requests

text = "I'm having trouble with my internet connection."
response = requests.post('http://localhost:5000/classify', data={'text': text})
print(response.json())

Remember to replace ‘http://localhost:5000’ with the appropriate URL if you deploy the Flask application to a remote server.