data-cleaning

How can I fill null values with a mean using Pandas?

How can I fill null values with a mean using Pandas? Question: Having a hard time understanding why the apply function isn’t working here. I’m trying to fill the null values for SalePrice with the mean sales price of their corresponding quality ratings (OverallQual) I expected the function to itterate through each row and return …

Total answers: 2

String content become random integer after using append()

String content become random integer after using append() Question: I’m writing a function to filter tweet data that contains search word. Here’s my code: def twitter_filter(df, search): coun = 0 date_ls = [] id_ls = [] content_ls = [] lan_ls = [] name_ls = [] retweet_ls = [] cleaned_tweet_ls = [] for i, row in …

Total answers: 1

Delete rows where any column contains a certain string

Delete rows where any column contains a certain string Question: I am dealing with a dataset that uses ".." as a placeholder for null values. These null values span across all of my columns. My dataset looks as follows: Country Code Year GDP growth (%) GDP (constant) AFG 2010 3.5 .. AFG 2011 .. 2345 …

Total answers: 3

Removing rows where there is a value match

Removing rows where there is a value match Question: def remove_low_data_states(column_name): items = df[column_name].value_counts().reset_index() items.columns = [‘place’, ‘value’] print(f’Items in column: [{column_name}] with low data’) return list(items[items[‘value’].apply(lambda val: val < items.value.median())].place) remove_low_data_states(‘col1’) — > returns [‘hello’, ‘bye’] Orignal table col1 col2 col3 hello 2 4 world 2 4 bye 2 4 Updated table col1 col2 …

Total answers: 1

How to join multiple dataframe columns based on row index to specified column?

How to join multiple dataframe columns based on row index to specified column? Question: PROBLEM STATEMENT: I’m trying to join multiple pandas data frame columns, based on row index, to a single column already in the data frame. Issues seem to happen when the data in a column is read in as np.nan. EXAMPLE: Original …

Total answers: 2

Generate a dataframe from a string

Generate a dataframe from a string Question: Inspired by this solution I have been using the following code to clean-up some data that I obtain using Beautiful Soup: nfl = soup.findAll(‘li’, "player") lines = ("{}. {}n".format(ind,span.get_text(strip=True).rstrip("+")) for ind, span in enumerate(nfl,1)) print("".join(lines)) The problem is that the output of this comes in the format of …

Total answers: 1

How to convert a list with special delimiter to a python dataframe?

How to convert a list with special delimiter to a python dataframe? Question: I have the following list where double brackets indicate a new element: [[[33.79277702, -104.3900481], [35.79415582, -104.39016576], [38.7939, -107.31792], [31.792589, -188.38847], [36.79221, -108.388367], [36.79238003, -108.38905313]], [[38.1726905, -54.85042496], [30.179095, -84.88893], [36.17621409, -84.78], [39.17534035, -84.8481921], [31.17427369, -84.8499793], [50.17466907, -84.8578298]], [[46.71949073, -109.69390116], [46.72091429, -109.69484574], [46.72077, -107.69432], …

Total answers: 1