nlp

readline only print half of the results in a csv file

readline only print half of the results in a csv file Question: As titled, I have a csv file with 6 columns. For NLP processing I need to extract the 6th column(which is a review comment column) and transform it to a list of list of words using NLP.The code below is given by the …

Total answers: 1

Trying to find human names in a file using ntlk

Trying to find human names in a file using ntlk Question: I’d like to extract human names from a text file. I’m getting a blank line as output for some reason. Here is my code: import nltk import re nltk.download(‘names’) nltk.download(‘punkt’) from nltk.corpus import names # Create a list of male and female names from …

Total answers: 1

Can stop phrases be removed while doing text processing in python?

Can stop phrases be removed while doing text processing in python? Question: On the task that I’m working on, involves finding the cosine similarity using tfidf between a base transcript and other sample transcripts. I am removing stop words for this. But I would also like to remove certain stop phrases that are unique to …

Total answers: 1

Create different dataframe inside of a 'for' loop

Create different dataframe inside of a 'for' loop Question: I have a dataset that looks something like the following. I would like to create dataframes that contains only texts for each authors, for example as you can see the df1 contains only texts from the author0, etc. Is there any way to do that for …

Total answers: 1

How to do inference with fined-tuned huggingface models?

How to do inference with fined-tuned huggingface models? Question: I have fine-tuned a Huggingface model using the IMDB dataset, and I was able to use the trainer to make predictions on the test set by doing trainer.predict(test_ds_encoded). However, when doing the same thing with the inference set that has a dummy label feature (all -1s …

Total answers: 1

Create datasets based on authors from another dataset

Create datasets based on authors from another dataset Question: I have a dataset in the following format text author title ————————————- dt = text0 author0 title0 text1 author1 title1 . . . . . . . . . and I would like to create different separate datasets which contain only texts of one author. For …

Total answers: 1

Compare each string with all other strings in a dataframe

Compare each string with all other strings in a dataframe Question: I have this dataframe: mylist = [ "₹67.00 to Rupam Sweets using Bank Account XXXXXXXX5343<br>11 Feb 2023, 20:42:25", "₹66.00 to Rupam Sweets using Bank Account XXXXXXXX5343<br>10 Feb 2023, 21:09:23", "₹32.00 to Nagori Sajjad Mohammed Sayyed using Bank Account XXXXXXXX5343<br>9 Feb 2023, 07:06:52", "₹110.00 to …

Total answers: 2

Search DataFrame column for words in list

Search DataFrame column for words in list Question: I am trying to create a new DataFrame column that contains words that match between a list of keywords and strings in a df column… data = { ‘Sandwich Opinions’:[‘Roast beef is overrated’,’Toasted bread is always best’,’Hot sandwiches are better than cold’] } df = pd.DataFrame(data) keywords …

Total answers: 2

How can I optimize KNN, GNB nd SVC sklearn algorithms to reduce exec time?

How can I optimize KNN, GNB nd SVC sklearn algorithms to reduce exec time? Question: I’m currently evaluating which classifier have the best performance for movie reviews sentiment analysis task. So far I have evaluate Logistic Regression, Linear Regression, Random Forest and Decision tree but I also want to consider KNN, GNB and SVC models …

Total answers: 1

python text parsing to split list into chunks including preceding delimiters

python text parsing to split list into chunks including preceding delimiters Question: What I Have After OCR’ing some public Q&A deposition pdfs which have a Q&A form, I have raw text like the following: text = """nannQ So I do first want to bring up exhibit No. 46, which is in the binder in front …

Total answers: 3