To sum a row string from python dataframe

Question:

Dataframe was extracted from a CSV file, and is converted into a string through the following code:

import pandas as pd
import re

input_csv_file = "./CSV/Officers_and_Shareholders.csv"

df = pd.read_csv(input_csv_file, skiprows=10, on_bad_lines='skip', names= ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total'])
df.fillna('', inplace=True)
# df.drop([0, 3], inplace=True)
df.columns = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']

pattern_shareholding_numbers = re.compile(r'[d.]*d+')

shareholding_percentage_list = df["Jumlah Lembar Saham"].astype(str)
shareholding_percentage_thousand_separator_removed = df["Jumlah Lembar Saham"].str.replace('.', '')
shareholding_percentage_string = ' '.join(shareholding_percentage_thousand_separator_removed)
matches = pattern_shareholding_numbers.findall(shareholding_percentage_string)

print(matches)

So through the code on the above, an output of the following can be extracted from the CSV file, which looks like the following:

['3200000', '2900000', '2900000', '1000000']

The numbers shown on the above is a data extracted under the "Jumlah Lembar Saham" column, and the numbers are extracted from different rows within the dataframe. I was wondering if there is a method to add all of the numbers on the above, resulting in one number such as:

['10000000']
Asked By: htm_01

||

Answers:

A list like ['3200000', '2900000', '2900000', '1000000'] doesn’t contain numbers. It contains text (strings) that represent numbers. A list like [3200000, 2900000, 2900000, 1000000] contains numbers.

If you want to sum the numbers represented in a list of strings, you need to turn the strings into numbers, sum those, and then turn the result back into a string, if you need strings in your list.

So:

strings = ['3200000', '2900000', '2900000', '1000000']
numbers = list(map(int, strings))
total = sum(numbers)
total_string = str(sum(numbers))
result = [total_string]

print(strings, result)

Of course, you can do the whole thing in one step:

strings = ['3200000', '2900000', '2900000', '1000000']
result = [str(sum(map(int, strings )))]

print(strings, result)

In both cases, the output is:

['3200000', '2900000', '2900000', '1000000'] ['10000000']

Having said that, since you’re using pandas with DataFrame, you probably want those values to be numbers in the first place, not strings.

There’s many ways to skin a cat here, but pandas can easily converty a column of strings into numbers, allowing you to sum the result:

df['a'].astype(float).sum()

This takes column 'a', turns the values into floating point numbers, and sums them. Similarly, you could turn an entire DataFrame into numbers (float, int, etc.) or not define the data to be of type str to begin with.

Answered By: Grismar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.