gpt-2

tokenizer.save_pretrained TypeError: Object of type property is not JSON serializable

tokenizer.save_pretrained TypeError: Object of type property is not JSON serializable Question: I am trying to save the GPT2 tokenizer as follows: from transformers import GPT2Tokenizer, GPT2LMHeadModel tokenizer = GPT2Tokenizer.from_pretrained("gpt2") tokenizer.pad_token = GPT2Tokenizer.eos_token dataset_file = "x.csv" df = pd.read_csv(dataset_file, sep=",") input_ids = tokenizer.batch_encode_plus(list(df["x"]), max_length=1024,padding=’max_length’,truncation=True)["input_ids"] # saving the tokenizer tokenizer.save_pretrained("tokenfile") I am getting the following error: TypeError: …

Total answers: 1

Hugging face – Efficient tokenization of unknown token in GPT2

Hugging face – Efficient tokenization of unknown token in GPT2 Question: I am trying to train a dialog system using GPT2. For tokenization, I am using the following configuration for adding the special tokens. from transformers import ( AdamW, AutoConfig, AutoTokenizer, PreTrainedModel, PreTrainedTokenizer, get_linear_schedule_with_warmup, ) SPECIAL_TOKENS = { "bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "pad_token": "[PAD]", "additional_special_tokens": …

Total answers: 2

Huggingface Transformer – GPT2 resume training from saved checkpoint

Huggingface Transformer – GPT2 resume training from saved checkpoint Question: Resuming the GPT2 finetuning, implemented from run_clm.py Does GPT2 huggingface has a parameter to resume the training from the saved checkpoint, instead training again from the beginning? Suppose the python notebook crashes while training, the checkpoints will be saved, but when I train the model …

Total answers: 2