huggingface

Import error in training arguments in Colaboratory

Import error in training arguments in Colaboratory Question: I am using Google Colaboratory for my NLP project. I installed transformers and other libraries, but I got an error. from transformers import Trainer, TrainingArguments batch_size = 64 logging_steps = len(stationary_dataset_encoded["train"]) // batch_size model_name = f"{model_ckpt}-finetuned-stationary-update" training_args = TrainingArguments(output_dir=model_name, num_train_epochs=10, learning_rate=2e-5, per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, weight_decay=0.01, evaluation_strategy="epoch", disable_tqdm=False, logging_steps=logging_steps, …

Total answers: 1

How to interpret the model_max_len attribute of the PreTrainedTokenizer object in Huggingface Transformers

How to interpret the model_max_len attribute of the PreTrainedTokenizer object in Huggingface Transformers Question: I’ve been trying to check the maximum length allowed by emilyalsentzer/Bio_ClinicalBERT, and after these lines of code: model_name = "emilyalsentzer/Bio_ClinicalBERT" tokenizer = AutoTokenizer.from_pretrained(model_name) tokenizer I’ve obtained the following: PreTrainedTokenizerFast(name_or_path=’emilyalsentzer/Bio_ClinicalBERT’, vocab_size=28996, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side=’right’, truncation_side=’right’, special_tokens={‘unk_token’: ‘[UNK]’, ‘sep_token’: ‘[SEP]’, ‘pad_token’: ‘[PAD]’, ‘cls_token’: …

Total answers: 1

Loading Hugging face model is taking too much memory

Loading Hugging face model is taking too much memory Question: I am trying to load a large Hugging face model with code like below: model_from_disc = AutoModelForCausalLM.from_pretrained(path_to_model) tokenizer_from_disc = AutoTokenizer.from_pretrained(path_to_model) generator = pipeline("text-generation", model=model_from_disc, tokenizer=tokenizer_from_disc) The program is quickly crashing after the first line because it is running out of memory. Is there a way …

Total answers: 1

Sending large file to gcloud worked on another internet connection but not mine

Sending large file to gcloud worked on another internet connection but not mine Question: So I am doing this to send my 400 megabyte ai model to the cloud model_file = pickle.dumps(model) blob = bucket.blob("models/{user_id}.pickle") blob.upload_from_string(model_file) it takes a long time to process then i get three errors: ssl.SSLWantWriteError: The operation did not complete (write) …

Total answers: 1

tokenizer.push_to_hub(repo_name) is not working

tokenizer.push_to_hub(repo_name) is not working Question: I’m trying to puch my tokonizer to my huggingface repo… it consist of the model vocab.Json (I’m making a speech recognition model) My code: vocab_dict["|"] = vocab_dict[" "] del vocab_dict[" "] vocab_dict["[UNK]"] = len(vocab_dict) vocab_dict["[PAD]"] = len(vocab_dict) len(vocab_dict) import json with open(‘vocab.json’, ‘w’) as vocab_file: json.dump(vocab_dict, vocab_file) from transformers import …

Total answers: 3

"nvidia/tts_en_fastpitch" not getting installed in python

"nvidia/tts_en_fastpitch" not getting installed in python Question: I am trying to create TTS with Nvidia NeMo tts_en_fastpitch model in python. But can not install fastpitch model. These are the errors: from nemo.collections.tts.models import HifiGanModel from nemo.collections.tts.models import FastPitchModel spec_generator = FastPitchModel.from_pretrained("nvidia/tts_en_fastpitch") model = HifiGanModel.from_pretrained(model_name="nvidia/tts_hifigan") [NeMo W 2023-01-21 18:49:02 optimizers:55] Apex was not found. Using the …

Total answers: 1

Specifying Huggingface model as project dependency

Specifying Huggingface model as project dependency Question: Is it possible to install huggingface models as a project dependency? Currently it is downloaded automatically by the SentenceTransformer library, but this means in a docker container it downloads every time it starts. This is the model I am trying to use: https://huggingface.co/sentence-transformers/all-mpnet-base-v2 I have tried specifying the …

Total answers: 1

AttributeError: module 'dill._dill' has no attribute 'log'

AttributeError: module 'dill._dill' has no attribute 'log' Question: I am using a python nlp module to train a dataset and ran into the following error: File "/usr/local/lib/python3.9/site-packages/nlp/utils/py_utils.py", line 297, in save_code dill._dill.log.info("Co: %s" % obj) AttributeError: module ‘dill._dill’ has no attribute ‘log’ I noticed similar posts where no attribute ‘extend’ and no attribute ‘stack’ where …

Total answers: 1

How to split input text into equal size of tokens, not character length, and then concatenate the summarization results for Hugging Face transformers

How to split input text into equal size of tokens, not character length, and then concatenate the summarization results for Hugging Face transformers Question: I am using the below methodology to summarize longer than 1024 token size long texts. Current method splits the text by half. I took this from another user’s post and modified …

Total answers: 1