data-analysis

Mean centering before PCA

Mean centering before PCA Question: I am unsure if this kind of question (related to PCA) is acceptable here or not. However, it is suggested to do MEAN CENTER before PCA, as known. In fact, I have 2 different classes (Each different class has different participants.). My aim is to distinguish and classify those 2 …

Total answers: 3

How to sum rows that start with the same string

How to sum rows that start with the same string Question: I used pandas to clean a csv: import pandas as pd import numpy as np df = pd.read_csv(r’C:UsersLeo90Downloadsdata-export.csv’,encoding=’utf-8′, header=None, sep=’n’) df = df[0].str.split(‘,’, expand=True) df=df.iloc[:,[0,1,2,3,4,5,6,7]] df=df.replace(to_replace=’None’,value=np.nan).dropna() df=df.reset_index(drop=True) columnNames = df.iloc[0] df = df[1:] df.columns = columnNames df.groupby(‘path’).head() The processed data like the screenshot below …

Total answers: 2

How to amend defined function to calculate wanted output (Pandas)

How to amend defined function to calculate wanted output (Pandas) Question: I am trying to calculate the following ‘new_field’ column by triple looping through the ‘name’, ‘val_id’ and ‘fac_id’ column with the following conditions. 1.Within each ‘val_id’ loop if ‘product’ == ‘CL’ then min of ‘val_against’ and ‘our_val_amt’ e.g. min( val_against (134), our_val_amt (424)) therefore …

Total answers: 1

"Found input variables with inconsistent numbers of samples" Have I done something wrong during the train_test_split?

"Found input variables with inconsistent numbers of samples" Have I done something wrong during the train_test_split? Question: I am trying to logistic Regression Model, and run some test but I keep getting this error. Not really sure what I have done differently to everyone else from sklearn import preprocessing X = df.iloc[:,:len(df.columns)-1] y = df.iloc[:,len(df.columns)-1]ere …

Total answers: 1

why does the program return different value when I order the list differently?

why does the program return different value when I order the list differently? Question: Im trying to learn how to analyze large data better, and I wanted to make a program where by inputting a CSV of keywords you can look for the occurance of each in a second data csv. I setup this code …

Total answers: 2

Creating neat .csv file from a giant dictionary

Creating neat .csv file from a giant dictionary Question: I have created dictionary from ESPN’s API with a bunch of data I need, and it is all neat and organized. I need this data in .csv for my friend to do machine learning on it, but I do not know where to begin. Here is …

Total answers: 1

display all files in data frame using python pandas

display all files in data frame using python pandas Question: I am trying to create a data frame from a data set of 1000 .txt files, then loop through the files and gets the title, Author, language, etc to form a single data frame. from glob import glob files = glob(‘dataset/*.txt’) files.sort() files for n …

Total answers: 1