Create a dataframe from returned values from a function

Question:

I have a function returns series.index and series.values, how to write the returned results to a dataframe ?

Generate random data

import string
import random
import pandas as pd

text = []
i = 0
while i < 20:
    text.extend(random.choice(string.ascii_letters[:4]))
    i += 1

boolean = ['True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False']
bool1 = random.sample(boolean, 20)
bool2 = random.sample(boolean, 20)
bool3 = random.sample(boolean, 20)
bool4 = random.sample(boolean, 20)

d = {'c1':text, 'c2':bool1, 'c3':bool2, 'c4':bool3, 'y':bool4}
dd = pd.DataFrame(data=d)

dd.head(2)

    c1  c2  c3  c4  y
0   b   False   False   False   True
1   a   True    True    False   True

The function

def relative_frequency(df, col):
    series = df.groupby(col)['y'].value_counts(normalize=True)
    true_cnt = series.xs('True', level=1)  # a series with single layer index
    max_index = true_cnt.index[true_cnt.argmax()]
    max_val = true_cnt[max_index]
    true_cnt_dropped = true_cnt.drop(max_index)
    ans = max_val / true_cnt_dropped
    ans.index = [(col + ' ' + max_index + '/' + idx) for idx in ans.index]
    return ans.index, ans.values

Run the function

for i in dd.columns[:-1]:
    print(relative_frequency(dd, i))

It returns

(Index(['c1 c/a', 'c1 c/b', 'c1 c/d'], dtype='object'), array([1.8 , 1.05, 1.2 ]))
(Index(['c2 False/True'], dtype='object'), array([1.5]))
(Index(['c3 True/False'], dtype='object'), array([2.33333333]))
(Index(['c4 False/True'], dtype='object'), array([1.5]))

I would like to build a dataframe like this

enter image description here

Asked By: Osca

||

Answers:

In the last part (where you run the function) do this instead –

  1. Converts the output of the function into a Dataframe
  2. df.T Transposes it (swaps rows and cols)
  3. dfs.append() appends it to an empty list called dfs
  4. df.concat combines them vertically as rows
  5. Columns names are added
dfs = []

for i in dd.columns[:-1]:
    dfs.append(pd.DataFrame(relative_frequency(dd, i)).T)
    
result = pd.concat(dfs)
result.columns = ['features', 'relative_freq']
result

enter image description here

Answered By: Akshay Sehgal
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.