data-mining

Dataframe drop rows where multiple columns have the same value

Dataframe drop rows where multiple columns have the same value Question: My dataframe has the columns A, B, C, label1, label2, label3. I just want to drop the rows where label1 = label2 = label3. The label value can be 0, 1, 2, 3 and nan The best solution I’ve found so far is this …

Total answers: 1

Finding string with multiple condition between two data frame in python

Finding string with multiple condition between two data frame in python Question: I have two dataframe df1 and df2. df1 has 4 columns. >df1 Neighborhood Street Begin Street End Street 8th Ave 6th St Church St Mlk blvd ….. >df2 Intersection Roadway Mlk blvd Hue St. I want to add a new column Count in …

Total answers: 1

Python Sentiment Analysis given a dataset with Facebook Posts

Python Sentiment Analysis given a dataset with Facebook Posts Question: I have a dataset containing raw facebook posts and comments. What I would like to do is to perform sentiment analysis with Python 3 (NTLK ?) in order to label each post and each comment against some categories (a sort of clustering in unsupervised mode). …

Total answers: 2

How can I get the text in Selenium?

How can I get the text in Selenium? Question: I want to get the text of an element in selenium. First I did this: team1_names = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CSS_SELECTOR, ".home span")) ) for kir in team1_names: print(kir.text) It didn’t work out. So I tried this: team1_name = driver.find_elements_by_css_selector(‘.home span’) print(team1_name.getText()) so team1_name.text doesn’t work either. …

Total answers: 1

PCA For categorical features?

PCA For categorical features? Question: In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link: When to use One Hot Encoding vs LabelEncoder vs DictVectorizor? It states that one hot encoding …

Total answers: 7

Scikit-learn: How to run KMeans on a one-dimensional array?

Scikit-learn: How to run KMeans on a one-dimensional array? Question: I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional …

Total answers: 2

scikit-learn DBSCAN memory usage

scikit-learn DBSCAN memory usage Question: UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI’s DBSCAN implimentation to do my clustering rather than scikit-learn’s. It can be run from the command line and with proper indexing, performs this task within …

Total answers: 5

Python tools for out-of-core computation/data mining

Python tools for out-of-core computation/data mining Question: I am interested in python mining data sets too big to sit in RAM but sitting within a single HD. I understand that I can export the data as hdf5 files, using pytables. Also the numexpr allows for some basic out-of-core computation. What would come next? Mini-batching when …

Total answers: 4

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn Question: I’m using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier but my problem is that I have a mix of categorical data (ex: “Registered online”, “Accepts …

Total answers: 6