how to remove string ending with specific string

Question:

I have file names like

ios_g1_v1_yyyymmdd
ios_g1_v1_h1_yyyymmddhhmmss
ios_g1_v1_h1_YYYYMMDDHHMMSS
ios_g1_v1_g1_YYYY
ios_g1_v1_j1_YYYYmmdd
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1

I would like to remove only the suffix when it matches the string YYYYMMDDHHMMSS OR yyyymmdd OR YYYYmmdd OR YYYY

my expected output would be

ios_g1_v1
ios_g1_v1_h1
ios_g1_v1_h1
ios_g1_v1_g1
ios_g1_v1_j1
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1

How can I achieve this in python using regex ? i tried with something like below, but it didn’t work

word_trimmed_stage1 = re.sub('.*[^YYYYMMDDHHMMSS]$', '', filename)

Asked By: BigD

||

Answers:

Try removing everything after the last _ detected.

Answered By: Dawid

You can be explicit and use the exact patterns that you have identified, optionally case insensitive with re.I:

files = ['ios_g1_v1_yyyymmdd',
 'ios_g1_v1_h1_yyyymmddhhmmss',
 'ios_g1_v1_h1_YYYYMMDDHHMMSS',
 'ios_g1_v1_g1_YYYY',
 'ios_g1_v1_j1_YYYYmmdd',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

files2 = [re.sub('_(?:YYYYMMDDHHMMSS|yyyymmdd|YYYYmmdd|YYYY)$', '', x, flags=re.I)
          for x in files]

NB. with re.I you only need one of yyyymmdd/YYYYmmdd.

Compressed variant:

files2 = [re.sub('_YYYY(?:MMDD(?:HHMMSS)?)?$', '', x, flags=re.I) for x in files]

Output:

['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']
Answered By: mozway

To remove a string ending with "YYYYMMDDHHMMSS" or one of the other specified formats, you can use the rstrip method. This method will remove all characters in the specified string that appear at the end of the target string.

Here’s an example of how you can use it:
s = "abcdefgYYYYMMDDHHMMSS"
suffix = "YYYYMMDDHHMMSS"

You can also use to remove the other specified formats by replacing "YYYYMMDDHHMMSS" with the appropriate format string.

Answered By: Ashmit Prajapati

Disclaimer: this is a non regex approach; @mozway posted a good regex approach

files = ['ios_g1_v1_yyyymmdd',
 'ios_g1_v1_h1_yyyymmddhhmmss',
 'ios_g1_v1_h1_YYYYMMDDHHMMSS',
 'ios_g1_v1_g1_YYYY',
 'ios_g1_v1_j1_YYYYmmdd',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

lst=[]
for filenames in files:
  k=[]
  for x in range(len(filenames)-1):
    if filenames[x]=='y' or filenames[x]=='Y':
        if filenames[x+1]=='y' or filenames[x+1]=='Y':
            break
    else:
        k.append(filenames[x])
  if k[-1]=='_':
    lst.append(''.join(k)[:-1])
  else:
    lst.append(''.join(k))
    
print(lst)

#['ios_g1_v1', 'ios_g1_v1_h1', 'ios_g1_v1_h1', 'ios_g1_v1_g1', 'ios_g1_v1_j1', 'ios_g1_v', 'ios_g1_v1_t1_h', 'ios_g1_v1_t1_f']
Answered By: Talha Tayyab

IIUC, your pattern involves Year, Month, Day, Hour, Minute, Second characters with any number of repeated characters in that order, starting with an underscore and case-insensitive.

Try this pattern r"_Y+M*D*H*M*S*"

import re

regex_pattern = r"_Y+M*D*H*M*S*"
result = [re.sub(regex_pattern,'',i, flags=re.IGNORECASE) for i in l]
result
['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

EXPLANATION

  1. The _ matches the underscore at start of the patter
  2. The flags=re.IGNORECASE makes this pattern search case-insensitive
  3. The Y+ matches at least 1 instance of Y
  4. Then the M*D*H*M*S* match any instances of these specific characters after the initial Y in that order (starting 0 instances)
Answered By: Akshay Sehgal

This can be another approach

renamed_files = []
for filename in files:
    if filename.split("_")[-1].lower().startswith("y"):
        renamed_files.append("_".join(filename.split("_")[:-1]))
    else:
        renamed_files.append(filename)
        
print(renamed_files)

You can also make good use of list() function instead of append one element at a time:

renamed_files = list(
    "_".join(filename.split("_")[:-1])
    if filename.split("_")[-1].lower().startswith("y")
    else filename
    for filename in files
    )

Both approach should produce the same output:
Output:

['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']
Answered By: Jamiu Shaibu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.