Create a dataframe from returned values from a function

Question

I have a function returns series.index and series.values, how to write the returned results to a dataframe ?

Generate random data

import string
import random
import pandas as pd

text = []
i = 0
while i < 20:
    text.extend(random.choice(string.ascii_letters[:4]))
    i += 1

boolean = ['True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False']
bool1 = random.sample(boolean, 20)
bool2 = random.sample(boolean, 20)
bool3 = random.sample(boolean, 20)
bool4 = random.sample(boolean, 20)

d = {'c1':text, 'c2':bool1, 'c3':bool2, 'c4':bool3, 'y':bool4}
dd = pd.DataFrame(data=d)

dd.head(2)

    c1  c2  c3  c4  y
0   b   False   False   False   True
1   a   True    True    False   True

The function

def relative_frequency(df, col):
    series = df.groupby(col)['y'].value_counts(normalize=True)
    true_cnt = series.xs('True', level=1)  # a series with single layer index
    max_index = true_cnt.index[true_cnt.argmax()]
    max_val = true_cnt[max_index]
    true_cnt_dropped = true_cnt.drop(max_index)
    ans = max_val / true_cnt_dropped
    ans.index = [(col + ' ' + max_index + '/' + idx) for idx in ans.index]
    return ans.index, ans.values

Run the function

for i in dd.columns[:-1]:
    print(relative_frequency(dd, i))

It returns

(Index(['c1 c/a', 'c1 c/b', 'c1 c/d'], dtype='object'), array([1.8 , 1.05, 1.2 ]))
(Index(['c2 False/True'], dtype='object'), array([1.5]))
(Index(['c3 True/False'], dtype='object'), array([2.33333333]))
(Index(['c4 False/True'], dtype='object'), array([1.5]))

I would like to build a dataframe like this

Asked By: Osca

||

Source

Answer 1

In the last part (where you run the function) do this instead –

Converts the output of the function into a Dataframe
df.T Transposes it (swaps rows and cols)
dfs.append() appends it to an empty list called dfs
df.concat combines them vertically as rows
Columns names are added

dfs = []

for i in dd.columns[:-1]:
    dfs.append(pd.DataFrame(relative_frequency(dd, i)).T)
    
result = pd.concat(dfs)
result.columns = ['features', 'relative_freq']
result

Answered By: Akshay Sehgal

Create a dataframe from returned values from a function

Question:

Answers: