storing result from function directly into DataFrame with return

Question:

I’m new to programming and python,

I’m trying to create a function to iterate over a dataframe and directly store results from the function to dataframe, so far here is what I’ve done:

def principal_loop2(df, col1, col2, col3):
for i, row in df_principal.iterrows():
    balance = row[col1]
    Suku_bunga = row[col2]
    terms = int(row[col3])
    periode = range(1, terms+1)
    if balance > 0:
        p = npf.ppmt(
        rate=Suku_bunga/12, per=periode, nper=terms, pv=-balance
        )
        return (p)

after running it I’m able to get the NumPy array from p and store it to a variable then transform it into dataframe, but that’s only work in the first datapoint since the return exits the function after it satisfies the first condition. What alternative I can do so Im able to get all the results from the function as a NumPy array or directly save it to dataframe

thank you

Asked By: Edo

||

Answers:

When making the transition to DataFrames, it’s important not to hold too tightly to programming patterns you use with things like lists and dicts.

In this case, you’re iterating over the rows of a DataFrame as if it is a list – that’s not illegal or anything, but in making this transition to DataFrames you really want to be running operations like this in a single line of code.

In this case, there are two things you want to do:

  1. move the code currently in your loop into a function that will take as its input the DataFrame – or more specifically, every row of the DataFrame.
  2. use the apply() method on the DataFrame to apply that function to every row in the DataFrame, as a single line of code.

So for the function, you’ll have something like this:

def someLambdaFunc(row):
    balance = row[col1]
    Suku_bunga = row[col2]
    terms = int(row[col3])
    periode = range(1, terms+1)
    if balance > 0:
        p = npf.ppmt(
        rate=Suku_bunga/12, per=periode, nper=terms, pv=-balance
        )
        return (p)

And your one liner, which is not obvious for someone new to Pandas, is:

df["NewValue"] = df.apply(lambda x: someLambdaFunc(x), axis=1)

This one liner just says, apply this function to the DataFrame, and the key bit, apply it to each row (by specifying axis=1). That’s what enables you to make column references inside your function.

When you want to just apply a function to a single column, typically you’d do something like:

df["OtherNewValue"] = df["SomeColumn"].apply(lambda x: simpleLambdaFunc(x))

In this simpler case, the simpleLambdaFunc does not have to have any column references or even awareness that it’s dealing with a Pandas Series. It just takes x as its argument, performs some operation(s), then returns a value for each element in the Pandas column.

Answered By: James_SO

Here’s what you need –

My data is made up, so the values might not be representative of the problem you are trying to solve, but it will work with the dataframe you have given the columns are located at the same index as assumed in solution below.

# payment against loan principal
# numpy_financial.ppmt(rate, per, nper, pv, fv=0, when='end')

data = {
    "Suku_bunga": [0.018, 0.018, 0.018, 0.018, 0.018, 0.018],
    "periode": [10, 10, 10, 10, 10, 10],
    "terms": [10, 10, 10, 10, 10, 10],
    "balance": [10000, 9000, 8000, 7000, 6000, 0]
}

data = pd.DataFrame(data)

import numpy_financial as npf

get_principal = lambda x: npf.ppmt(rate=x[0]/12, per=x[1], nper=x[2], pv=-x[3]) if x[3] > 0 else None
# where x[0] = Suku_bunga, x[1] = periode, x[2] = terms, x[3] = balance

data["principal"] = data.apply(get_principal, axis=1)

data

# Output

# Suku_bunga    periode terms   balance principal
# 0.018           10    10      10000   1006.758411
# 0.018           10    10      9000    906.082570
# 0.018           10    10      8000    805.406729
# 0.018           10    10      7000    704.730888
# 0.018           10    10      6000    604.055047
# 0.018           10    10      0       NaN
Answered By: Prashant
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.