delete text and all new line characters between 2 words in pyhton
Question:
I have the following text as given
nOUTPUTFORMAT n
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn
'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n
'spark.sql.create.version'='2.4.0-cdh6.3.2', n
'spark.sql.sources.schema.numPartCols'='1', n 'spark.sql.sources.schema.numParts'='1'
I want to delete everything from words LOCATION till beginning of TBLPROPERTIES.
I am trying to use regex, but I have been unsuccesful till now.
nOUTPUTFORMAT n
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nTBLPROPERTIES (n
'spark.sql.create.version'='2.4.0-cdh6.3.2', n
'spark.sql.sources.schema.numPartCols'='1', n
'spark.sql.sources.schema.numParts'='1'
Thanks in advance for your suggestions.
Answers:
import re
text = "nOUTPUTFORMAT n'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n'spark.sql.create.version'='2.4.0-cdh6.3.2', n'spark.sql.sources.schema.numPartCols'='1', n'spark.sql.sources.schema.numParts'='1'"
text = re.sub(r'LOCATION.*TBLPROPERTIES', 'TBLPROPERTIES', text, flags=re.DOTALL)
print(text)
See if this works.
I have the following text as given
nOUTPUTFORMAT n
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn
'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n
'spark.sql.create.version'='2.4.0-cdh6.3.2', n
'spark.sql.sources.schema.numPartCols'='1', n 'spark.sql.sources.schema.numParts'='1'
I want to delete everything from words LOCATION till beginning of TBLPROPERTIES.
I am trying to use regex, but I have been unsuccesful till now.
nOUTPUTFORMAT n
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nTBLPROPERTIES (n
'spark.sql.create.version'='2.4.0-cdh6.3.2', n
'spark.sql.sources.schema.numPartCols'='1', n
'spark.sql.sources.schema.numParts'='1'
Thanks in advance for your suggestions.
import re
text = "nOUTPUTFORMAT n'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'nLOCATIONn'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge'nTBLPROPERTIES (n'spark.sql.create.version'='2.4.0-cdh6.3.2', n'spark.sql.sources.schema.numPartCols'='1', n'spark.sql.sources.schema.numParts'='1'"
text = re.sub(r'LOCATION.*TBLPROPERTIES', 'TBLPROPERTIES', text, flags=re.DOTALL)
print(text)
See if this works.