Python – Iterate over a file of items and Split('_) -> now group like names and get a total count

Question:

a file has 50 files in a naming convention to where I can split off ‘_’ and grab the [3] element from the split to get the names of the produce from the file.

I have the file path imported and the files in a variable
for & split each element[3] off into a variable

=> now I am having trouble getting the count of all element[3] by the name (Books, Papers, Pencils…) and get the count of the occurrences:

[output]
apple 4
orange 9
banana 2
…..

So far I get to splitting the items and printing the list out; but need to group them and get the count per group.

Far as I go so far

import os
# local path to files.
path_in = '/Users/username/Downloads/my_list' 
for file in os.listdir(path_in):
        # split produce item by name via '_' delimiter.
        split_file = file.lower().split('_') # 3rd Element only since produce.
Asked By: iOS Newbie

||

Answers:

Added produce_count to keep track of how many times each produce name shows up in the third position. For each file, it makes sure there are enough parts in the split list to avoid errors. If everything’s okay, it adds one to the count for that specific produce.

import os
from collections import defaultdict

# local path to files.
path_in = '/Users/username/Downloads/my_list' 

# Initialize the dictionary for counting occurrences
produce_count = defaultdict(int)

for file in os.listdir(path_in):
    # split produce item by name via '_' delimiter.
    split_file = file.lower().split('_') # 3rd Element only since produce.
    
    # Check if there are enough elements to avoid IndexError
    if len(split_file) > 2:
        produce_name = split_file[2]
        produce_count[produce_name] += 1

# Print the results
for produce, count in produce_count.items():
    print(f'{produce} {count}')

Assuming you have the following files in your directory:

user_01_apple_001.txt
user_02_orange_002.txt
user_03_apple_003.txt
user_04_banana_004.txt
user_05_apple_005.txt
user_06_apple_006.txt
user_07_orange_007.txt
user_08_orange_008.txt
user_09_banana_009.txt
user_10_orange_010.txt

Output:

apple 4
orange 4
banana 2
Answered By: Alex

You can keep count on a dictionary with default value.

import os
from collections import defaultdict

# local path to files.
path_in = '/Users/username/Downloads/my_list' 
counter = defaultdict(lambda *args: 0)
for file in os.listdir(path_in):
        # split produce item by name via '_' delimiter.
        split_file = file.lower().split('_') # 3rd Element only since produce.
        counter[split_file] += 1

All types of produces will be in the keys of counter and how many times they occur in its values()

Answered By: Brener Ramos
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.