How could I remove newlines from all quoted pieces of text in a file?
Question:
I have exported a CSV file from a database. Certain fields are longer text chunks, and can contain newlines. What would be the simplest way of removing only newlines from this file that are inside double quotes, but preserving all others?
I don’t care if it uses a Bash command line one liner or a simple script as long as it works.
For example,
"Value1", "Value2", "This is a longer piece
of text with
newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
The newlines inside of the longer piece of text should be removed, but not the newline separating the two rows.
Answers:
Here’s a solution in Python:
import re
pattern = re.compile(r'".*?"', re.DOTALL)
print pattern.sub(lambda x: x.group().replace('n', ''), text)
See it working online: ideone
In Python:
import csv
with open("input.csv", newline="") as input,
open("output.csv", "w", newline="") as output:
w = csv.writer(output)
for record in csv.reader(input):
w.writerow(tuple(s.remove("n") for s in record))
This is very simplistic but might work for you:
# cat <<! | sed ':a;/"$/{P;D};N;s/n//g;ba'
> "Value1", "Value2", "This is a longer piece
> of text with
> newlines in it.", "Value3"
> "Value4", "Value5", "Another value", "value6"
> !
"Value1", "Value2", "This is a longer piece of text with newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
Here is the adjustment of Sven’s response for Python 3 on Windows
with open(src, "rt") as input, open(dest, "wt", newline='', encoding='utf-8') as output:
w = csv.writer(output)
for record in csv.reader(input):
w.writerow(tuple(s.replace('n', '') for s in record))
how about a perl
one-liner
perl -pe 's/[^"]n//' input.csv
output
"Value1", "Value2", "This is a longer piec of text wit newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
and not to forget reformat spaces
perl -pe 's/[^"]n//; s/s+/ /' input.csv
output
"Value1", "Value2", "This is a longer piec of text wit newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
I have exported a CSV file from a database. Certain fields are longer text chunks, and can contain newlines. What would be the simplest way of removing only newlines from this file that are inside double quotes, but preserving all others?
I don’t care if it uses a Bash command line one liner or a simple script as long as it works.
For example,
"Value1", "Value2", "This is a longer piece
of text with
newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
The newlines inside of the longer piece of text should be removed, but not the newline separating the two rows.
Here’s a solution in Python:
import re
pattern = re.compile(r'".*?"', re.DOTALL)
print pattern.sub(lambda x: x.group().replace('n', ''), text)
See it working online: ideone
In Python:
import csv
with open("input.csv", newline="") as input,
open("output.csv", "w", newline="") as output:
w = csv.writer(output)
for record in csv.reader(input):
w.writerow(tuple(s.remove("n") for s in record))
This is very simplistic but might work for you:
# cat <<! | sed ':a;/"$/{P;D};N;s/n//g;ba'
> "Value1", "Value2", "This is a longer piece
> of text with
> newlines in it.", "Value3"
> "Value4", "Value5", "Another value", "value6"
> !
"Value1", "Value2", "This is a longer piece of text with newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
Here is the adjustment of Sven’s response for Python 3 on Windows
with open(src, "rt") as input, open(dest, "wt", newline='', encoding='utf-8') as output:
w = csv.writer(output)
for record in csv.reader(input):
w.writerow(tuple(s.replace('n', '') for s in record))
how about a perl
one-liner
perl -pe 's/[^"]n//' input.csv
output
"Value1", "Value2", "This is a longer piec of text wit newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
and not to forget reformat spaces
perl -pe 's/[^"]n//; s/s+/ /' input.csv
output
"Value1", "Value2", "This is a longer piec of text wit newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"