exploratory-data-analysis

I want to create a new frequency column for each column in a pandas dataframe

I want to create a new frequency column for each column in a pandas dataframe Question: Let’s say I have a dataframe like this: colors animals yellow cat yellow cat red cat red cat blue cat I want to create a column for each column showing the frequency in which each value happens: colors colors_frequency …

Total answers: 2

filter rows from data where column salary has string datatype

filter rows from data where column salary has string datatype Question: id name salary 0 1 shyam 10000 1 2 ram 20000 2 3 ravi abc 3 4 abhay 30000 4 5 karan fgh expected: id name salary 2 3 ravi abc 4 5 karan fgh Asked By: Rupesh chauhan || Source Answers: We can …

Total answers: 1

Pandas Groupby and Compare rows to find maximum value

Pandas Groupby and Compare rows to find maximum value Question: I’ve a dataframe a b c one 6 11 one 7 12 two 8 23 two 9 14 three 10 15 three 20 25 I want to apply groupby at column a and then find the highest value in column c, so that, the highest …

Total answers: 3

The edge color of the histogram is not changing even though I declared it

The edge color of the histogram is not changing even though I declared it Question: I can’t change the border color even though I declared it as black. fig=plt.figure(figsize=(25, 10), tight_layout = True, edgecolor = ‘black’) plt.title(‘Distribution of Item Type’) plt.xlabel(‘Item Type’) plt.hist(BigMart_Data_Encoded[‘Item_Type’],bins = 15) Asked By: YamenAly || Source Answers: Try defining the edgecolor …

Total answers: 1

Pandas Groupby Operation For Condition Based Feature Creation

Pandas Groupby Operation For Condition Based Feature Creation Question: Having difficulties to create a feature based on the some groupby + conditions The data that I’ve looks similar to ir_id pli pli_missing err_type 0 name1 1.0 no UNKNOWN 1 name1 2.0 no NaN 2 name1 3.0 no NaN 3 name1 NaN yes UNKNOWN 4 name2 …

Total answers: 2

how to find amount of users when one user could had chosen many options?

how to find amount of users when one user could had chosen many options? Question: i have to answer two questions: How many is there users of SQL? How many of the users are using MySQL only The hard part of this is that any respodent could had chosen many options, so we can have …

Total answers: 1

Drop rows based on group by of another column in pandas

Drop rows based on group by of another column in pandas Question: Having a data set as below: I need to do the cartesian of product based on month and location. Need an output as below: I created a new dataframe-with the unique values of product. Then cross merged the df with dataset.need to drop …

Total answers: 2

Grouping of subset of multiple Column in pandas

Grouping of subset of multiple Column in pandas Question: Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method. Need to group the value ‘Tom’ from name and ‘UK’ from Country and fill the missing value in value using mode. Name Country Value …

Total answers: 3

Grouping of subset of a Column in pandas

Grouping of subset of a Column in pandas Question: Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method. Need to group the value ‘Tom’ from name and fill the missing value in value using mode. Name Value 0 Tom 20.0 1 Tom …

Total answers: 2

How can I use bamboolib in Databricks?

How can I use bamboolib in Databricks? Question: I would like to automatically do Exploratory Data Analysis using Azure Databricks, and I have seen the potential it has as shown for example in this post: https://towardsdatascience.com/the-easy-way-to-do-data-exploration-22b4b8e1dc20 But when following the same steps in Databricks the extension is not enabled. I have tested something like this: …

Total answers: 3