Pair files using Python

Question:

I have a folder with several .tif files that I would like to pair to perform some functions inside a for loop.

For example:

smp001_GFP.tif

smp001_mCherry.tif
(this should be a pair)

smp002_GFP.tif

smp002_mCherry.tif
(this another pair)

I would like the for loop to iterate over each pair and perform some functions. For example:

**for** pair **in** folder:
         img_GFP=cv2.imread(pair.__contains__("GFP"))
         img_mCherry=cv2.imread(pair.__contains__("mCherry"))

I’ve been told that I could pair the files using dictionaries, but which strategy will you recommend to do so?

Thanks!

Asked By: Claudia Salat

||

Answers:

Some additional info/code would be helpful, but to give a general idea, what you can do is create a dictionary and then loop through your file names and create a new key for each numbered pair. Essentially:

pairs_dict = {}
for file_name in folder:
    # Get the prefix for the pair
    # assuming the filename format 'smp000_...'
    key = file_name.split('_')[0] # grabs 'smpXXX'
    # Then create a key in our dictionary for it. 
    pairs_dict[key] = []
...
for pair_prefix in list(pairs_dict.keys()):
    # 'get_file()' being whatever function the module 
    # you use has for grabbing files by name
    img_GFP = get_file(pair_prefix + '_GFP.tif')
    img_mCherry = get_file(pair_prefix + '_mCherry.tif')

Answered By: Ethan Ray

Nested dicts would work well. The outer dict keys 001, 002, etc… would map to inner dicts that hold {"GFP":filename, "mCherry:filename} items. If you use defaultdict for the outer dict, it will automatically create the inner dicts on first access. Use a regular expression to get the identifiers from the string.

import re
from collections import defaultdict
import os

tif_name_re = re.compile(r"smp(d+)_(GFP|mCherry).tif")
tif_map = defaultdict(dict)

for name in os.listdir("some/directory"):
    m = tif_name_re.match(name)
    if m:
        tif_map[m.group(1)][m.group(2)] = m.group(0)

for key,value in tif_map.items():
    print(key, value)

Output

001 {'GFP': 'smp001_GFP.tif', 'mCherry': 'smp001_mCherry.tif'}
002 {'GFP': 'smp002_GFP.tif', 'mCherry': 'smp002_mCherry.tif'}
Answered By: tdelaney

Here’s a different view. Let’s assume that the GFP and mCherry parts of the filenames are irrelevant but that the common part is actually that which precedes the underscore.

If that’s the case then:

from glob import glob
from os.path import basename, join

DIRECTORY = './tifs' # directory contains the tif files
result = dict()
 
for filename in sorted(map(basename, glob(join(DIRECTORY, '*.tif')))):
    key, _ = filename.split('_')
    result.setdefault(key, []).append(filename)

print(result)

Output:

{'smp002': ['smp002_mCherry.tif', 'smp002_GFP.tif'], 'smp001': ['smp001_mCherry.tif', 'smp001_GFP.tif']}

This gives us a dictionary keyed on the preamble and the "pairs" as a list for each key

Answered By: Fred
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.