To sum a row string from python dataframe
Question:
Dataframe was extracted from a CSV file, and is converted into a string through the following code:
import pandas as pd
import re
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
df = pd.read_csv(input_csv_file, skiprows=10, on_bad_lines='skip', names= ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total'])
df.fillna('', inplace=True)
# df.drop([0, 3], inplace=True)
df.columns = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
pattern_shareholding_numbers = re.compile(r'[d.]*d+')
shareholding_percentage_list = df["Jumlah Lembar Saham"].astype(str)
shareholding_percentage_thousand_separator_removed = df["Jumlah Lembar Saham"].str.replace('.', '')
shareholding_percentage_string = ' '.join(shareholding_percentage_thousand_separator_removed)
matches = pattern_shareholding_numbers.findall(shareholding_percentage_string)
print(matches)
So through the code on the above, an output of the following can be extracted from the CSV file, which looks like the following:
['3200000', '2900000', '2900000', '1000000']
The numbers shown on the above is a data extracted under the "Jumlah Lembar Saham" column, and the numbers are extracted from different rows within the dataframe. I was wondering if there is a method to add all of the numbers on the above, resulting in one number such as:
['10000000']
Answers:
A list like ['3200000', '2900000', '2900000', '1000000']
doesn’t contain numbers. It contains text (strings) that represent numbers. A list like [3200000, 2900000, 2900000, 1000000]
contains numbers.
If you want to sum the numbers represented in a list of strings, you need to turn the strings into numbers, sum those, and then turn the result back into a string, if you need strings in your list.
So:
strings = ['3200000', '2900000', '2900000', '1000000']
numbers = list(map(int, strings))
total = sum(numbers)
total_string = str(sum(numbers))
result = [total_string]
print(strings, result)
Of course, you can do the whole thing in one step:
strings = ['3200000', '2900000', '2900000', '1000000']
result = [str(sum(map(int, strings )))]
print(strings, result)
In both cases, the output is:
['3200000', '2900000', '2900000', '1000000'] ['10000000']
Having said that, since you’re using pandas
with DataFrame
, you probably want those values to be numbers in the first place, not strings.
There’s many ways to skin a cat here, but pandas
can easily converty a column of strings into numbers, allowing you to sum the result:
df['a'].astype(float).sum()
This takes column 'a'
, turns the values into floating point numbers, and sums them. Similarly, you could turn an entire DataFrame
into numbers (float
, int
, etc.) or not define the data to be of type str
to begin with.
Dataframe was extracted from a CSV file, and is converted into a string through the following code:
import pandas as pd
import re
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
df = pd.read_csv(input_csv_file, skiprows=10, on_bad_lines='skip', names= ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total'])
df.fillna('', inplace=True)
# df.drop([0, 3], inplace=True)
df.columns = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
pattern_shareholding_numbers = re.compile(r'[d.]*d+')
shareholding_percentage_list = df["Jumlah Lembar Saham"].astype(str)
shareholding_percentage_thousand_separator_removed = df["Jumlah Lembar Saham"].str.replace('.', '')
shareholding_percentage_string = ' '.join(shareholding_percentage_thousand_separator_removed)
matches = pattern_shareholding_numbers.findall(shareholding_percentage_string)
print(matches)
So through the code on the above, an output of the following can be extracted from the CSV file, which looks like the following:
['3200000', '2900000', '2900000', '1000000']
The numbers shown on the above is a data extracted under the "Jumlah Lembar Saham" column, and the numbers are extracted from different rows within the dataframe. I was wondering if there is a method to add all of the numbers on the above, resulting in one number such as:
['10000000']
A list like ['3200000', '2900000', '2900000', '1000000']
doesn’t contain numbers. It contains text (strings) that represent numbers. A list like [3200000, 2900000, 2900000, 1000000]
contains numbers.
If you want to sum the numbers represented in a list of strings, you need to turn the strings into numbers, sum those, and then turn the result back into a string, if you need strings in your list.
So:
strings = ['3200000', '2900000', '2900000', '1000000']
numbers = list(map(int, strings))
total = sum(numbers)
total_string = str(sum(numbers))
result = [total_string]
print(strings, result)
Of course, you can do the whole thing in one step:
strings = ['3200000', '2900000', '2900000', '1000000']
result = [str(sum(map(int, strings )))]
print(strings, result)
In both cases, the output is:
['3200000', '2900000', '2900000', '1000000'] ['10000000']
Having said that, since you’re using pandas
with DataFrame
, you probably want those values to be numbers in the first place, not strings.
There’s many ways to skin a cat here, but pandas
can easily converty a column of strings into numbers, allowing you to sum the result:
df['a'].astype(float).sum()
This takes column 'a'
, turns the values into floating point numbers, and sums them. Similarly, you could turn an entire DataFrame
into numbers (float
, int
, etc.) or not define the data to be of type str
to begin with.