Unable to split data
Question:
I have a data like below:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
Now, I want to sum up the elements that appear before the space and maintain the max_sum
track with the sum of the next elements that appear before the empty line. So for me, it should be the sum of 1000,2000,3000 = 6000
compared with the initial max_sum for eg 0
, and now sum the next element i.e 4000
, and keep comparing with the max_sum which could be like max(6000, 4000) = 6000
and keep on doing the same but need to reset the sum if I encounter a empty line.
Below is my code:
max_num = 0
sum = 0
for line in data:
# print(line)
sum = sum + int(line)
if line in ['n', 'rn']:
sum=0
max_num = max(max_num, sum)
This gives an error:
sum = sum + int(line)
ValueError: invalid literal for int() with base 10: 'n'
Answers:
You are trying to cast empty lines to int:
max_num = 0
sum = 0
for line in data:
print(line)
if line.strip():
sum = sum + int(line)
if line in ['n', 'rn']:
sum=0
max_num = max(max_num, sum)
There are lines that are just composed of ‘n’, which you are trying to convert into int.
You should move your test for line up the int
conversion, and continue
without casting to int if the line is ‘n’ or ‘rn’
Here’s a quick oneliner:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
max(
sum(
int(i) for i in l.split('n')
) for l in data.split('nn')
)
which gives 24000
First it divides based on nn
and then based on n
. Sums all elements in the groups and then chooses the biggest value.
Don’t use builtin names like sum
, here you need to split the data in n
you will get list then you can loop over and remove space using strip()
then if line has some digits it will sum it else it will assign 0.
max_num = 0
sum_val = 0
for line in data.split("n"):
line = line.strip()
sum_val = int(line) + sum_val if line and line.isdigit() else 0
max_num = max(max_num, sum_val)
print(max_num)
You can try:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
"""
data = data.splitlines()
max_sum = 0
group = []
for data_index, single_data in enumerate(data):
single_data = single_data.replace(" ","")
if single_data == "":
if max_sum < sum(group):
max_sum = sum(group)
group = []
else:
group.append(int(single_data))
print(max_sum)
Output:
24000
Note that int() is impervious to leading and trailing whitespace – e.g., int(‘n99n’) will result in 99 without error. However, a string comprised entirely of whitespace will result in ValueError. That’s what is happening here. You’re trying to parse a string that just contains a newline character.
You can take advantage of ValueError for these data as follows:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
current_sum = 0
max_sum = float('-inf')
for t in data.splitlines():
try:
x = int(t)
current_sum += x
except ValueError:
max_sum = max(max_sum, current_sum)
current_sum = 0
print(f'Max sum = {max(max_sum, current_sum)}')
Output:
Max sum = 24000
I have a data like below:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
Now, I want to sum up the elements that appear before the space and maintain the max_sum
track with the sum of the next elements that appear before the empty line. So for me, it should be the sum of 1000,2000,3000 = 6000
compared with the initial max_sum for eg 0
, and now sum the next element i.e 4000
, and keep comparing with the max_sum which could be like max(6000, 4000) = 6000
and keep on doing the same but need to reset the sum if I encounter a empty line.
Below is my code:
max_num = 0
sum = 0
for line in data:
# print(line)
sum = sum + int(line)
if line in ['n', 'rn']:
sum=0
max_num = max(max_num, sum)
This gives an error:
sum = sum + int(line)
ValueError: invalid literal for int() with base 10: 'n'
You are trying to cast empty lines to int:
max_num = 0
sum = 0
for line in data:
print(line)
if line.strip():
sum = sum + int(line)
if line in ['n', 'rn']:
sum=0
max_num = max(max_num, sum)
There are lines that are just composed of ‘n’, which you are trying to convert into int.
You should move your test for line up the int
conversion, and continue
without casting to int if the line is ‘n’ or ‘rn’
Here’s a quick oneliner:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
max(
sum(
int(i) for i in l.split('n')
) for l in data.split('nn')
)
which gives 24000
First it divides based on nn
and then based on n
. Sums all elements in the groups and then chooses the biggest value.
Don’t use builtin names like sum
, here you need to split the data in n
you will get list then you can loop over and remove space using strip()
then if line has some digits it will sum it else it will assign 0.
max_num = 0
sum_val = 0
for line in data.split("n"):
line = line.strip()
sum_val = int(line) + sum_val if line and line.isdigit() else 0
max_num = max(max_num, sum_val)
print(max_num)
You can try:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
"""
data = data.splitlines()
max_sum = 0
group = []
for data_index, single_data in enumerate(data):
single_data = single_data.replace(" ","")
if single_data == "":
if max_sum < sum(group):
max_sum = sum(group)
group = []
else:
group.append(int(single_data))
print(max_sum)
Output:
24000
Note that int() is impervious to leading and trailing whitespace – e.g., int(‘n99n’) will result in 99 without error. However, a string comprised entirely of whitespace will result in ValueError. That’s what is happening here. You’re trying to parse a string that just contains a newline character.
You can take advantage of ValueError for these data as follows:
data = """1000
2000
3000
4000
5000
6000
7000
8000
9000
10000"""
current_sum = 0
max_sum = float('-inf')
for t in data.splitlines():
try:
x = int(t)
current_sum += x
except ValueError:
max_sum = max(max_sum, current_sum)
current_sum = 0
print(f'Max sum = {max(max_sum, current_sum)}')
Output:
Max sum = 24000