実現したいこと
前提
ボールドテキスト
transformerのtrainerを初期化しようとしたところ、エラーが出てしまいました。
エラーの解決法がわからないため、質問させていただきました。
発生している問題・エラーメッセージ
ImportError Traceback (most recent call last)
<ipython-input-5-54b0704cc2d8> in <cell line: 96>()
94 from transformers import Trainer, TrainingArguments
95
---> 96 training_args = TrainingArguments(
97 output_dir="./KantaiBERT",
98 overwrite_output_dir=True,
4 frames
/usr/local/lib/python3.10/dist-packages/transformers/training_args.py in _setup_devices(self)
1839 if not is_sagemaker_mp_enabled():
1840 if not is_accelerate_available(min_version="0.20.1"):
-> 1841 raise ImportError(
1842 "Using the Trainer
with PyTorch
requires accelerate>=0.20.1
: Please run pip install transformers[torch]
or pip install accelerate -U
"
1843 )
ImportError: Using the Trainer
with PyTorch
requires accelerate>=0.20.1
: Please run pip install transformers[torch]
or pip install accelerate -U
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
該当のソースコード
python
1#@title Step 1: Loading the Dataset 2#1.Load kant.txt using the Colab file manager 3#2.Downloading the file from GitHub 4!curl -L https://raw.githubusercontent.com/PacktPublishing/Transformers-for-Natural-Language-Processing/master/Chapter03/kant.txt --output "kant.txt" 5!pip install accelerate -U 6#@title Step 2:Installing Hugging Face Transformers 7# We won't need TensorFlow here 8!pip uninstall -y tensorflow 9# Install `transformers` from master 10!pip install git+https://github.com/huggingface/transformers 11!pip list | grep -E 'transformers|tokenizers' 12# transformers version at notebook update --- 2.9.1 13# tokenizers version at notebook update --- 0.7.0 14 15#@title Step 3: Training a Tokenizer 16from pathlib import Path 17 18from tokenizers import ByteLevelBPETokenizer 19 20paths = [str(x) for x in Path(".").glob("**/*.txt")] 21# Initialize a tokenizer 22tokenizer = ByteLevelBPETokenizer() 23 24# Customize training 25tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2, special_tokens=[ 26 "<s>", 27 "<pad>", 28 "</s>", 29 "<unk>", 30 "<mask>",]) 31 32#@title Step 4: Saving the files to disk 33import os 34token_dir = '/content/KantaiBERT' 35if not os.path.exists(token_dir): 36 os.makedirs(token_dir) 37tokenizer.save_model('KantaiBERT') 38 39#@title Step 5 Loading the Trained Tokenizer Files 40from tokenizers.implementations import ByteLevelBPETokenizer 41from tokenizers.processors import BertProcessing 42 43tokenizer = ByteLevelBPETokenizer( 44 "./KantaiBERT/vocab.json", 45 "./KantaiBERT/merges.txt", 46) 47 48#@title Step 6: Checking Resource Constraints: GPU and NVIDIA 49!nvidia-smi 50#@title Checking that PyTorch Sees CUDAnot 51import torch 52torch.cuda.is_available() 53 54#@title Step 7: Defining the configuration of the Model 55from transformers import RobertaConfig 56 57config = RobertaConfig( 58 vocab_size=52_000, 59 max_position_embeddings=514, 60 num_attention_heads=12, 61 num_hidden_layers=6, 62 type_vocab_size=1, 63) 64 65print(config) 66 67#@title Step 8: Re-creating the Tokenizer in Transformers 68from transformers import RobertaTokenizer 69tokenizer = RobertaTokenizer.from_pretrained("./KantaiBERT", max_length=512) 70 71#@title Step 9: Initializing a Model From Scratch 72from transformers import RobertaForMaskedLM 73 74model = RobertaForMaskedLM(config=config) 75print(model) 76 77#@title Step 10: Building the Dataset 78from transformers import LineByLineTextDataset 79 80dataset = LineByLineTextDataset( 81 tokenizer=tokenizer, 82 file_path="./kant.txt", 83 block_size=128, 84) 85 86#@title Step 11: Defining a Data Collator 87from transformers import DataCollatorForLanguageModeling 88 89data_collator = DataCollatorForLanguageModeling( 90 tokenizer=tokenizer, mlm=True, mlm_probability=0.15 91) 92 93#@title Step 12: Initializing the Trainer 94from transformers import Trainer, TrainingArguments 95 96training_args = TrainingArguments( 97 output_dir="./KantaiBERT", 98 overwrite_output_dir=True, 99 num_train_epochs=1, 100 per_device_train_batch_size=64, 101 save_steps=10_000, 102 save_total_limit=2, 103) 104 105trainer = Trainer( 106 model=model, 107 args=training_args, 108 data_collator=data_collator, 109 train_dataset=dataset, 110) 111#@title Step 13: Pre-training the Model 112trainer.train()
試したこと
96行目のtraining_args = TrainingArgumentsの部分でエラーが出ました。
pip install transformers[torch]or
pip install accelerate -U`と書かれていたため、二つともインストールしたのですが、動きませんでした。
補足情報(FW/ツールのバージョンなど)
google colabをmacbookで使用しています。
長文ですがよろしくお願いします。

回答1件
あなたの回答
tips
プレビュー