Python – Get newest dict value where string = string

Question:

I have this code and it works. But I want to get two different files.

file_type returns either NP or KL. So I want to get the NP file with the max value and I want to get the KL file with the max value.

The dict looks like

{"Blah_Blah_NP_2022-11-01_003006.xlsx": "2022-03-11",
"Blah_Blah_KL_2022-11-01_003006.xlsx": "2022-03-11"}

This is my code and right now I am just getting the max date without regard to time. Since the date is formatted how it is and I don’t care about time, I can just use max().

I’m having trouble expanding the below code to give me the greatest NP file and the greatest KL file. Again, file_type returns the NP or KL string from the file name.

file_dict = {}
file_path = Path(r'\placeReport')
for file in file_path.iterdir():
    if file.is_file():
        path_object = Path(file)
        filename = path_object.name
        stem = path_object.stem
        file_type = file_date = stem.split("_")[2]
        file_date = stem.split("_")[3]
        file_dict.update({filename: file_date})
        newest = max(file_dict, key=file_dict.get)
return newest

I basically want newest where file_type = NP and also newest where file_type = KL

Asked By: UneRoue

||

Answers:

You could construct another dict containing only the items you need:

file_dict_NP = {key:value for key, value in file_dict.items() if 'NP' in key}

And then do the same thing on it:

newest_NP = max(file_dict_NP, key=file_dict_NP.get)

Answered By: Artem Ermakov

You could filter the dictionary into two dictionaries (or however many you need if there’s more types) and then get the max date for any of those.

But the whole operation can be done efficiently in only few lines:

from pathlib import Path
from datetime import datetime


def get_newest():
    maxs = {}
    for file in Path(r'./examples').iterdir():
        if file.is_file():
            *_, t, d, _ = file.stem.split('_')
            d = datetime(*map(int, d.split('-')))
            maxs[t] = d if t not in maxs else max(d, maxs[t])
    return maxs


print(get_newest())

This:

  • collects the maximum date for each type into a dict maxs
  • loops over the files like you did (but in a location where I created some examples following your pattern)
  • only looks at the files, like your code
  • assumes the files all meet your pattern, and splits them over '_', only keeping the next to last part as the date and the part before it as the type
  • converts the date into a datetime object
  • keeps whichever is greater, the new date or a previously stored one (if any)

Result:

{'KL': datetime.datetime(2023, 11, 1, 0, 0), 'NP': datetime.datetime(2022, 11, 2, 0, 0)}

The files in the folder:

Blah_Blah_KL_2022-11-01_003006.txt
Blah_Blah_KL_2023-11-01_003006.txt
Blah_Blah_NP_2022-11-02_003051.txt
Blah_Blah_NP_2022-11-01_003006.txt
Blah_Blah_KL_2021-11-01_003006.txt

In the comments you asked

no idea how the above code it getting the diff file types and the max. Is it just looing for all the diff types in general? It’s hard to know what each piece is with names like s, d, t, etc. Really lost on *_, t, d, _ = and also d = datetime(*map(int, d.split(‘-‘)))

That’s a fair point, I prefer short names when I think the meaning is clear, but a descriptive name might have been better. t is for type (and type would be a bad name, shadowing type, so perhaps file_type). d is for date, or dt for datetime might have been better. I don’t see s?

The *_, t, d, _ = is called ‘extended tuple unpacking‘, it takes all the results from what follows and only keeps the 3rd and 2nd to last, as t and d respectively, and throws the rest away. The _ takes up a position, but the underscore indicates we "don’t care" about whatever is in that position. And the *_ similarly gobbles up all values at the start, as explained in the linked PEP article.

The d = datetime(*map(int, d.split('-'))) is best read from the inside out. d.split('-') just takes a date string like '2022-11-01' and splits it. The map(int, ...) that’s applied to the result applies the int() function to every part of that result – so it turns ('2022', '11', '01') into (2022, 11, 1). The * in front of map() spreads the results as parameters to datetime – so, datetime(2022, 11, 1) would be called in this example.

This is what I both like and hate about Python – as you get better at it, there are very concise (and arguably beautiful – user @ArtemErmakov seems to agree) ways to write clean solutions. But they become hard to read unless you know most of the basics of the language. They’re not easy to understand for a beginner, which is arguably a bad feature of a language.

To answer the broader question: since the loop takes each file, gets the type (like ‘KL’) from it and gets the date, it can then check the dictionary, add the date if the type is new, or if the type was already in the dictionary, update it with the maximum of the two, which is what this line does:

maxs[t] = d if t not in maxs else max(d, maxs[t])

I would recommend you keep asking questions – and whenever you see something like this code, try to break it down into all it small parts, and see what specific parts you don’t understand. Python is a powerful language.

As a bonus, here is the same solution, but written a bit more clearly to show what is going on:

from pathlib import Path
from datetime import datetime


def get_newest_too():
    maximums = {}
    for file_path in Path(r'./examples').iterdir():
        if file_path.is_file():
            split_file = file_path.stem.split('_')
            file_type = split_file[-3]
            date_time_text = split_file[-2]
            date_time_parts = (int(part) for part in date_time_text.split('-'))
            date_time = datetime(*date_time_parts)  # spreading is just right here
            if file_type in maximums:
                maximums[file_type] = max(date_time, maximums[file_type])
            else:
                maximums[file_type] = date_time
    return maximums


print(get_newest_too())

Edit: From the comments, it became clear that you had trouble selecting the actual file of each specific type for which the date was the maximum for that type.

Here’s how to do that:

from pathlib import Path
from datetime import datetime


def get_newest():
    maxs = {}
    for file in Path(r'./examples').iterdir():
        if file.is_file():
            *_, t, d, _ = file.stem.split('_')
            d = datetime(*map(int, d.split('-')))
            maxs[t] = (d, file) if t not in maxs else max((d, file), maxs[t])
    return {f: d for _, (d, f) in maxs.items()}


print(get_newest())

Result:

{WindowsPath('examples/Blah_Blah_KL_2023-11-01_003006.txt'): datetime.datetime(2023, 11, 1, 0, 0), WindowsPath('examples/Blah_Blah_NP_2022-11-02_003051.txt'): datetime.datetime(2022, 11, 2, 0, 0)}
Answered By: Grismar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.