Need to sort a numeric file and store last digit of each value in a list or array
Question:
I need to sort a numeric file that contains thousands of lines of numbers such as below.
Need the last digit to be represented in a list or array with the first 4 digits.
66542
66543
66546
66781
66783
66784
66787
would like to output to appear as:
6654[236]
6678[1347]
or something similar to shorten the file
I have tried the following but I am still way off as it only outputs last digit in an array [2, 3, 6, 1, 3, 4, 7]
#!/usr/bin/env python3
import re
# Open the file and read the numbers
with open('number-file.txt', 'r') as file:
numbers = file.readlines()
# Initialize an empty array to store the last digits# Loop
last_digits = []
# Loop through the numbers and store the last digit of each number in the array
for number in numbers:
last_digit = int(number.strip()) % 10
last_digits.append(last_digit)
print(last_digits)
'
Answers:
from collections import defaultdict
with open("number-file.txt", "r") as infile:
number_lines = infile.readlines()
results = defaultdict(list)
for line in number_lines:
k = line[:4]
v = int(line[4:].strip())
results[k].append(v)
# pretty print the results
for k, v in results.items():
print(f"{k}{v}")
For each line, we split each integer into the first 4 and the remaining digits, use the first part as the key to a defaultdict, and append the latter to a list. Then we print those results.
6654[2, 3, 6]
6678[1, 3, 4, 7]
If you need your output to appear exactly as you stated, you could instead do something like:
for k, v in results.items():
print(f"{k}[{''.join([str(x) for x in v])}]")
6654[236]
6678[1347]
If your numbers are in order, you could use itertools.groupby
to group by the first 4 digits and collect the last digits for each group:
from itertools import groupby
# simulated input data
numbers = ['66542', '66543', '66546', '66781', '66783', '66784', '66787']
parts = [(n[:4],n[4]) for n in numbers]
results = { k: [v[1] for v in g] for k, g in groupby(parts, key=lambda t:t[0]) }
Output:
{'6654': ['2', '3', '6'], '6678': ['1', '3', '4', '7']}
This can be formatted as desired:
'n'.join(f"{k}[{''.join(v)}]" for k, v in results.items())
Output:
6654[236]
6678[1347]
I need to sort a numeric file that contains thousands of lines of numbers such as below.
Need the last digit to be represented in a list or array with the first 4 digits.
66542
66543
66546
66781
66783
66784
66787
would like to output to appear as:
6654[236]
6678[1347]
or something similar to shorten the file
I have tried the following but I am still way off as it only outputs last digit in an array [2, 3, 6, 1, 3, 4, 7]
#!/usr/bin/env python3
import re
# Open the file and read the numbers
with open('number-file.txt', 'r') as file:
numbers = file.readlines()
# Initialize an empty array to store the last digits# Loop
last_digits = []
# Loop through the numbers and store the last digit of each number in the array
for number in numbers:
last_digit = int(number.strip()) % 10
last_digits.append(last_digit)
print(last_digits)
'
from collections import defaultdict
with open("number-file.txt", "r") as infile:
number_lines = infile.readlines()
results = defaultdict(list)
for line in number_lines:
k = line[:4]
v = int(line[4:].strip())
results[k].append(v)
# pretty print the results
for k, v in results.items():
print(f"{k}{v}")
For each line, we split each integer into the first 4 and the remaining digits, use the first part as the key to a defaultdict, and append the latter to a list. Then we print those results.
6654[2, 3, 6]
6678[1, 3, 4, 7]
If you need your output to appear exactly as you stated, you could instead do something like:
for k, v in results.items():
print(f"{k}[{''.join([str(x) for x in v])}]")
6654[236]
6678[1347]
If your numbers are in order, you could use itertools.groupby
to group by the first 4 digits and collect the last digits for each group:
from itertools import groupby
# simulated input data
numbers = ['66542', '66543', '66546', '66781', '66783', '66784', '66787']
parts = [(n[:4],n[4]) for n in numbers]
results = { k: [v[1] for v in g] for k, g in groupby(parts, key=lambda t:t[0]) }
Output:
{'6654': ['2', '3', '6'], '6678': ['1', '3', '4', '7']}
This can be formatted as desired:
'n'.join(f"{k}[{''.join(v)}]" for k, v in results.items())
Output:
6654[236]
6678[1347]