delete text and all new line characters between 2 words in pyhton

Question:

I have the following text as given

nOUTPUTFORMAT n  
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn  
'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n  
'spark.sql.create.version'='2.4.0-cdh6.3.2', n  
'spark.sql.sources.schema.numPartCols'='1', n  'spark.sql.sources.schema.numParts'='1'

I want to delete everything from words LOCATION till beginning of TBLPROPERTIES.
I am trying to use regex, but I have been unsuccesful till now.

nOUTPUTFORMAT n  
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nTBLPROPERTIES (n  
'spark.sql.create.version'='2.4.0-cdh6.3.2', n  
'spark.sql.sources.schema.numPartCols'='1', n  
'spark.sql.sources.schema.numParts'='1'

Thanks in advance for your suggestions.

Asked By: Shiva

||

Answers:

import re
text = "nOUTPUTFORMAT n'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n'spark.sql.create.version'='2.4.0-cdh6.3.2', n'spark.sql.sources.schema.numPartCols'='1', n'spark.sql.sources.schema.numParts'='1'"
text = re.sub(r'LOCATION.*TBLPROPERTIES', 'TBLPROPERTIES', text, flags=re.DOTALL)
print(text)

See if this works.

Answered By: Sifat