In the transformers library, auto classes are a key design that allows you to use pre-trained models without having to worry about the underlying model architecture. It makes your code more concise and easier to maintain. For example, you can easily switch between different model architectures by just changing the model name; even the code to run the model is vastly different. In this post, you will learn how auto classes work and how to use them in your code.
Let’s get started!
Using Auto Classes in the Transformers Library
Photo by Erik Mclean. Some rights reserved.
Overview
This post is divided into three parts; they are:
- What Is Auto Classes
- How to Use Auto Classes
- Limitations of the Auto Classes
What Is Auto Classes
There is no class called “AutoClass” in the transformers library. Instead, several classes are named with the “Auto” prefix.
In transformer models for natural language processing, you will start with some text. You need to convert the text into tokens and then convert the tokens into token IDs. The token IDs are then fed into the model to get the output. The output should be converted back to text.
In this process, you will need a tokenizer and the main model. Depending on the task, such as text classification or question answering, you may use different variants of the same model. They are the same at the core, but they will use a different “head” to do the task.
Given the workflow is standardized at a high level, the only difference is how exactly a model should be operated. There are dozens of model architectures in the library. You are not going to know all of them in detail. But if you do, you can write code like the following:
import torch from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
model_name = “KernAI/stock-news-distilbert” tokenizer = DistilBertTokenizer.from_pretrained(model_name) model = DistilBertForSequenceClassification.from_pretrained(model_name)
text = “Machine Learning Mastery is a nice website.” inputs = tokenizer(text, return_tensors=“pt”) with torch.no_grad(): logits = model(**inputs).logits predicted_class_id = logits.argmax().item() |
First of all, this is not the most verbose way to use a model. In the transformers library, you can define a bare DistilBertTokenizer
object and then load the vocabulary from files, define the special tokens, and other rules, such as whether to force all letters to lowercase. Secondly, creating a DistilBertForSequenceClassification
object should first create a config object DistilBertConfig
that defines the hyperparameters of the model. Then you can load the weights from a checkpoint. But you can imagine that’s a lot of work.
In the above, you already simplified the workflow by using the from_pretrained()
method. This is to download a pre-trained model from the internet, in which the config and the corresponding tokenizer parameters are enclosed. However, the code above set up the model first and then loaded the weights and parameters. It assumes that the downloaded model files are compatible with the architecture. For example, the model may expect a parameter called hidden_size
, and the downloaded file must not call it hidden_dim
.
Remembering the name of the class for each architecture of the model is not easy. Therefore, the auto classes are designed to hide such complexity.
How to Use Auto Classes
Take DistilBERT as an example, there are multiple variations. Firstly, there are PyTorch, TensorFlow, and Flax implementations of the exact same model. Secondly, DistilBERT is the name of the base model. On top of it, you can add a different “head” for various tasks. You can get:
- the base model (
DistilBertModel
) that outputs the raw hidden states, - a model for masked language modeling (
DistilBertForMaskedLM
), which predicts what the masked token should be, - a model for sequence classification (
DistilBertForSequenceClassification
), which is used to label the entire input into predefined categories, - a model for question answering (
DistilBertForQuestionAnswering
), which is used to find answers to the specified questions from the provided context, - a model for token classification (
DistilBertForTokenClassification
), which is used to classify each token into a category, - a model for multiple choice tasks (
DistilBertForMultipleChoice
), which compares the multiple answers to a question and scores the likelihood of each answer.
These are all the same base model but with different heads. This is not an exclusive list of different variants because some base models may have a head that is not available in DistilBERT, and some base models may not have the head that DistilBERT has.
As long as you know how to use the model for a particular task, you can easily switch to another model. For example, the code below runs fine without any error:
import torch from transformers import GPT2Tokenizer, OPTForSequenceClassification
model_name = “ArthurZ/opt-350m-dummy-sc” tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = OPTForSequenceClassification.from_pretrained(model_name)
text = “Machine Learning Mastery is a nice website.” inputs = tokenizer(text, return_tensors=“pt”) with torch.no_grad(): logits = model(**inputs).logits predicted_class_id = logits.argmax().item() |
Disregard the output, this code only changed the name of the tokenizer and the model. That’s the result of the standardized interfaces of the transformers library. But look at the above code: You need to know that the model stored as “ArthurZ/opt-350m-dummy-sc” is using the architecture OPTForSequenceClassification
(probably you can guess it from the name). You also need to know that the tokenizer is GPT2Tokenizer
(probably you won’t be able to guess it from the name, but you can figure it out from the documentation).
It would be much more convenient if you could just change the model name, and the code will work. That’s where the auto classes come in. The code will be the following:
import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = “ArthurZ/opt-350m-dummy-sc” # or “KernAI/stock-news-distilbert” tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = “Machine Learning Mastery is a nice website.” inputs = tokenizer(text, return_tensors=“pt”) with torch.no_grad(): logits = model(**inputs).logits predicted_class_id = logits.argmax().item() |
You used AutoTokenizer
and AutoModelForSequenceClassification
instead. Now, when you change the model name, the code will work. This is because the auto classes will automatically download the model and check its config file. Then, based on what is specified in the config file, it will instantiate the correct tokenizer and model—all without your input.
Note that the example above is using PyTorch. You asked the tokenizer to give you a PyTorch tensor, and the model itself is a PyTorch one. This is the default in the transformers library. But you can create a TensorFlow/Keras equivalent if the model supports, witha slight modification of the code:
import tensorflow as tf from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
model_name = “KernAI/stock-news-distilbert” tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
text = “Machine Learning Mastery is a nice website.” inputs = tokenizer(text, return_tensors=“tf”) logits = model(**inputs).logits predicted_class_id = tf.math.argmax(logits).numpy() |
You can try with the other model, “ArthurZ/opt-350m-dummy-sc”, and you should see an error. This is because the class OPTForSequenceClassification
does not have the counterpart TFOPTForSequenceClassification
.
Limitation of the Auto Classes
There are many auto classes in the transformers library. For the NLP tasks, some examples are AutoModel
, AutoModelForCausalLM
, AutoModelForMaskedLM
, AutoModelForSequenceClassification
, AutoModelForQuestionAnswering
, AutoModelForTokenClassification
, AutoModelForMultipleChoice
, AutoModelForTextEncoding
, and AutoModelForNextSentencePrediction
. Note that each of these is for a different task (i.e., different head on top of a base model), and not all are supported by any model. For example, in the previous section, you learned that there are DistilBertForMaskedLM
, and hence you can create one using AutoModelForMaskedLM
and a DistilBERT model name, but you cannot create a DistilBERT model using AutoModelForCausalLM
because there is not a DistilBertForCausalLM
class.
Also, note that you will see a warning with the following code:
from transformers import AutoModelForSequenceClassification
model_name = “distilbert-base-uncased” model = AutoModelForSequenceClassification.from_pretrained(model_name) |
You will see the following warning:
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: [‘classifier.bias’, ‘classifier.weight’, ‘pre_classifier.bias’, ‘pre_classifier.weight’] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. |
This is because the model name “distilbert-base-uncased” contains only the base model. Its config is sufficient to create all kinds of models under the DistilBERT family because their differences are in the heads only. However, a base model does not have the weights for the specific head. When you instantiate a model and try to load the weights, the library will find that some layers are not initialized, which then can only use the random weights as a placeholder. This also means that the model is not working for what you expect yet. You either need to train the model with your own dataset, or load the weights from a different model, such as “KernAI/stock-news-distilbert” in the previous example.
The second limitation of the auto classes is that it is a wrapper around a deep learning model. That is, it expects a numerical tensor and outputs a numerical tensor. That’s why you need to use a tokenizer in the examples above. If you do not need to manipulate those tensors but just use the model for a task, you can further simplify the code by using the pipeline()
function:
import torch from transformers import pipeline
model_name = “KernAI/stock-news-distilbert” classifier = pipeline(model=model_name)
text = “Machine Learning Mastery is a nice website.” prediction = classifier(text) print(prediction) |
This example actually does more than any example above. It interprets the result from the model and gives you a human-readable output. You can see its output to be:
[{‘label’: ‘positive’, ‘score’: 0.9953118562698364}] |
Further Readings
Below are some further readings that you may find useful.
Summary
In this post, you learned how to use the auto classes in the transformers library. It is a replacement for the specific model classes so that you let the library figure out the correct classes to use based on the model config. This allows you to easily switch between different models or checkpoints by just changing the name or path without any code changes. Using auto classes is one step more verbose than using the pipeline API, but it saves you from the headache of figuring out the correct classes to use.