Most efficient way to split substrings into individual words

Question:

I have several strings that I want to loop through and tokenize

ImagesCarrier FreeCatalog #AvailabilitySize / PriceQty240-B-001MG/CF240-B-002/CF240-B-010/CF240-B-500/CFWith CarrierCatalog #AvailabilitySize / PriceQty240-B-002240-B-010Request a Quote

All of the string splitting methods I have used so far give me an output that isnt quite right. For example:

import re
s = "ImagesCarrier FreeCatalog #AvailabilitySize / PriceQty240-B-001MG/CF240-B-002/CF240-B-010/CF240-B-500/CFWith CarrierCatalog #AvailabilitySize / PriceQty240-B-002240-B-010Request a Quote"
 
# printing original String
print("The original string is : " + str(s))
 
# using sub() to solve the problem, lambda used tp add spaces
res = re.sub("[A-Za-z]+", lambda ele: " " + ele[0] + " ", s)
 
# printing result
print("The space added string : " + str(res))


The original string is : ImagesCarrier FreeCatalog #AvailabilitySize / PriceQty240-B-001MG/CF240-B-002/CF240-B-010/CF240-B-500/CFWith CarrierCatalog #AvailabilitySize / PriceQty240-B-002240-B-010Request a Quote

The space added string :  ImagesCarrier   FreeCatalog  # AvailabilitySize  /  PriceQty 240- B -001 MG / CF 240- B -002/ CF 240- B -010/ CF 240- B -500/ CFWith   CarrierCatalog  # AvailabilitySize  /  PriceQty 240- B -002240- B -010 Request   a   Quote 

But as you can see some words are still missed like PriceQty, AvailabilitySize, ImagesCarrier FreeCatalog, etc. Is there a better way to do this or a way that I can specify a keyword list that will iterate through all characters and split when matched? Ideally I would like to end up with something like this:

Images Carrier Free Catalog # Availability Size / Price Qty 240-B-001MG/CF 240-B-002/CF 240-B-010/CF 240-B-500/CF With Carrier Catalog # Availability Size / Price Qty 240-B-002 240-B-010 Request a Quote
Asked By: Munrock

||

Answers:

You can use lookarounds:

re.sub(r'(?<=S)(?=[A-Z][a-z])|(?<=[A-Za-z])(?=d)', ' ', s)

Demo: https://replit.com/@blhsing/DownrightTwinTelephones

Answered By: blhsing
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.