data-cleaning

Error when trying to create a list of files

Error when trying to create a list of files Question: I have a folder that contains 20 csv files. Each file has about 10 columns and thousands of rows. The csv files look something like the following: gene p-value xyz acan 0.05 123 mmp2 0.02 456 mmp9 0.07 789 nnos 0.09 123 gfap 0.01 456 …

Total answers: 1

Removing Specific Lines in Json File

Removing Specific Lines in Json File Question: I am trying to clean the json file down below. I want to remove all the dict key value pairs for which the key is "Company" in the "Stores" list. { "Company": "Apple", "Stores": [ {"Company" : "Apple", "Location" : "-", "Sales": "-", "Total_Employees": "-" }, {"Company" : …

Total answers: 2

Outlier detection of time-series data

Outlier detection of time-series data Question: I have a pandas dataframe where I want to detect outliers on a single column. Please bear in mind that I am not experienced when it comes to data handling/cleaning. The dataframe looks like this: Time MW 2019-01-01 00:00:00 1234.0 2019-01-01 01:00:00 1234.5 2019-01-01 02:00:00 1235.2 2019-01-01 03:00:00 1235.1 …

Total answers: 2

Label conflict in classification machine learning problem

how can I remove Label conflict in classification problem? Question: I have identical samples with different labels and this has occurred due to either mislabeled data, If the data is mislabeled, it can confuse the model and can result in lower performance of the model. It’s a binary classification problem. if my input table is …

Total answers: 1

Kaggle Data Clean Up

Kaggle Data Clean Up Question: I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of ‘joke’ answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way …

Total answers: 1

Remove words from list but keep the ones only made up from the list

Remove words from list but keep the ones only made up from the list Question: I have one dataframe containing strings and one list of words that I want to remove from the dataframe. However, I would like to also keep the strings from the df which are entirely made up of words from the …

Total answers: 3

How to change values in specific rows/columns to NaN based on condition?

How to change values in specific rows/columns to NaN based on condition? Question: I’ve got some strange values in my date column of my dataset. I’m trying to change these unexpected values into NaN. I don’t know what these unexpected values will be, hence why I made df 2 – where I’m searching for months …

Total answers: 1

Fixing IndexingError to clean the data

Fixing IndexingError to clean the data Question: I’m trying to identify outliers in each housing type category, but encountering an issue. Whenever I run the code, I receive the following error: "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match). grouped = df.groupby(‘Type’) q1 …

Total answers: 2

Creating Consistent Time Format with Pandas

Creating Consistent Time Format with Pandas Question: My overall goal is to full the hour from each data point to list each beginning time. To do this, I know I need to clean my data so that it is all in a consistent format. I have been trying to use to_datetime and df[time].dt.hour to pull …

Total answers: 2