Using GPUs with transformers

Using GPUs with transformers#

This notebook is available on Google Colab at this url: https://colab.research.google.com/drive/1-B5Y_x5TXJLb2qlV-b5h4Ak6m8Ioej95?usp=sharing

Before executing this notebook on Google Colab, make sure to change the runtime to a GPU in the menu Runtime > Change runtime type.

Example: Zero-Shot classification#

Let’s measure the inference time, with CPU and with a GPU, respectively

from transformers import pipeline

pipe = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

In a Jupyter notebook, we can measure the time execution of a cell by using the magic command %%timeit.

It repeats the execution several times, and computes the average and the standard deviation between all the runs.

%%timeit

example = "I have a problem with my iphone that needs to be resolved asap!"
pipe(example, candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer", "Mac"], multi_label=False)

The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.
The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.

2.95 s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On average, it took around 3 seconds per example…

To use the GPU, we need to load the model again, specifying it with the device argument cuda. This will move all model’s parameters to the GPU VRAM memory

pipe = pipeline("zero-shot-classification", device='cuda')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.

%%timeit

example = "I have a problem with my iphone that needs to be resolved asap!"
pipe(example, candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer", "Mac"], multi_label=False)

/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:1157: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  warnings.warn(

218 ms ± 57.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now, it only takes around 0.2 seconds per example!! Which is more than a 10X speed-up ⚡️

By using a GPU, the inference (and training) times typically speed-up by a factor between 10-20X.

Example: Text to Speech#

In the case the model is not being used from a pipeline, we need to load them to the GPU by moving both the inputs and the Model with the to('cuda') method:

from transformers import AutoProcessor, AutoModel

processor = AutoProcessor.from_pretrained("suno/bark-small")
model = AutoModel.from_pretrained("suno/bark-small").to('cuda')

inputs = processor(
    text=["Hello, my name is Suno. And, uh — and I like pizza [laughs] But I also have other interests such as playing tic tac toe."],
    return_tensors="pt",
).to('cuda')

speech_values = model.generate(**inputs, do_sample=True)

/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.

from IPython.display import Audio

sampling_rate = model.generation_config.sample_rate
Audio(speech_values.cpu().numpy().squeeze(), rate=sampling_rate)

inputs = processor(
    text=["Hello, my name is Suno. And, uh — and I like pizza [laughs] But I also have other interests such as playing tic tac toe."],
    return_tensors="pt",
    voice_preset="v2/en_speaker_3"
).to('cuda')

speech_values = model.generate(**inputs, do_sample=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.

from IPython.display import Audio

sampling_rate = model.generation_config.sample_rate
Audio(speech_values.cpu().numpy().squeeze(), rate=sampling_rate)

Exercise: podcast summarizer 🎙️#

Develop a Podcast summarizer, which takes as an input an mp3, and generates a text with the summary.

As an example, you can use the files/podcast_sample.mp3 from the course materials

import torch
from transformers import pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
    "automatic-speech-recognition", model="openai/whisper-small", device=device
)

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

import IPython

IPython.display.Audio("./podcast_sample.mp3")

with open("./podcast_sample.mp3", "rb") as f:
    audio = f.read()

transcription = pipe(audio)

Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.

transcription

{'text': " In this episode we will discuss tax credits. We are going to speak with Eri Aji Yoshikawa, author of lower taxes in 7 easy steps. In a previous episode we talked to Aji about tax deductions. Many people are confused as to the difference between a credit and a deduction. Here's how it works. Let's say you are in the 28% tax category. If you had a thousand tax deduction, you would deduct that from your taxable income and then you'd determine your taxes. The result of deducting $1000 is that you would save $280 in taxes. But if you had a $1000 tax credit, you could deduct that directly from your taxes. So you would save $1000 in taxes. So a $1000 tax credit is always going to be more valuable, maybe three or four times more valuable, as a $1000 tax deduction. It all depends on your tax rate. Okay, we asked Age Yoshikawa what could a typical family do to maximize their tax credits? What's the ability to average tax-paying family member? Well, if they had a baby, bought a hybrid car, added new insulation to their home, installed a solar water heater in their home, incurred the childcare expenses so that they could both worked and took night classes at the local college, they could have reduced their taxes by approximately $6,000 to $7,000. It's been reported that the average tax credit for a hybrid car is $2,000. We asked AG if that was accurate. Not necessarily $2,000. That depends on the type of car and its fuel consumption statistics. In addition, another factor complicates things. Congress didn't want to allow too many tax credits for hybrids, so once a hybrid manufacturer sells 60,000 vehicles, the credit will be phased out over the following 15 months for all hybrids produced by that company. You can find the phase out times and percentages in my book and at the IRIS website. That brings up another point. Tax credits come and go. Some tax credits have been around for years and are more or less permanent. For example, the child care credit, the low income housing credit for example. Others have been created more recently and have scheduled phase out dates. Congress can, and often does, extend credits to the scheduled to end. Some credits set the research and experimentation, work opportunity and wealth of the work credits have been extended one year at a time for several years. However, there is no guarantee that Congress will extend the tax credit, so it's always wise to act before the expiration date if you want to use the tax credit that is scheduled to expire. As Ajay pointed out, there are various tax credits for homeowners and as he indicates in his book, fuel efficiency is the primary target here. In the credit style, great breaks for homeowners to put fuel efficient windows, roofing, insulation and heating and cooling systems. However, there are some rules you'll have to follow. Homoes must buy these energy efficient products during 2006 and 2007. And the total component credit you can get for all taxis is $500. And no more than $200 of the credit can be for windows. The other thing to keep in mind is you get the credit only if the items you buy meet the energy efficiency specifications established by law and a lot of these specifications are quite stringent. For example, an electric heat pump water heater qualifies for the credit only if its energy efficiency is over twice as great as the current federal standard. Make sure the product you want to buy qualifies. Don't take a salesperson's word for it. Also, more generous credits are available to homeowners who have installed solar water heating or electric power systems in their homes. Age Yoshikawa also mentioned that there is a tax credit for having a child. That tax credit was created for low and middle income taxpayers. We asked Agey about the requirements. It is subject to an income threshold and the amount of credit you can take each year goes as your income approaches that threshold amount. For example, a married couple filing jointly with one qualifying child gets no child tax credit if their adjusted gross income exceeds $130,000. We asked Asia Yashikawa what about tax credits to couples who adopt children? Yes, there is a tax credit for people who adopt the children. The credit is equal to 100% of adoption expenses up to an annual ceiling. The ceiling was $10,960 per child in 2006."}

summarizer = pipeline('summarization', device='cuda')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.

summary = summarizer(transcription['text'], min_length=256, max_length=1024)

Your max_length is set to 1024, but your input_length is only 904. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=452)

summary

[{'summary_text': " Eri Aji Yoshikawa, author of lower taxes in 7 easy steps, discusses tax credits . A $1000 tax credit is always going to be more valuable, maybe three or four times more valuable . Tax credits come and go, some tax credits have been around for years and are more or less permanent . There is a tax credit for having a child, but it is subject to an income threshold and the amount of credit you can take each year goes as your income approaches that threshold . The average credit for a hybrid car is $2,000. Homoes must buy fuel efficient windows, roofing, insulation and heating and cooling systems during 2006 and 2007 . Homoes can get a credit for all taxis is $500. And no more than $200 of the credit can be for windows. Homos must buy these energy efficient products during 2006 or 2007. The total component credit you could get for all taxes is $5.50. And the total credit is $1,500. For all taxis. For more than 50% of the tax credits, you can get an energy efficient windows. For example, an electric heat pump water heater qualifies for the credit only if its energy efficiency is over twice as great as the current federal standard. Don't take a sales"}]

Using GPUs with transformers

Contents

Using GPUs with transformers#

Example: Zero-Shot classification#

Example: Text to Speech#

Exercise: podcast summarizer 🎙️#