Remove excess delimiter from CSV file
Question:
I’m using Python 3 to clean up a CSV
file that sometimes has four entries per line. For some reason the datalogger
didn’t insert a new line, and this happened periodically, not sure why.
So I’m trying to remove bad characters from the CSV
and have had success, but the lines with four entries instead of two I wanted to find the delimiter and replace it with a new line.
Sounds simple enough, but I don’t possess the code-fu, and I’m wondering if anyone could help. 馃檪
thanks
import csv
with open('outty1.csv', 'w', newline='') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(["Date", "Temperature", "Humidity"])
text = open("temperature.csv", "r")
text = ''.join([i for i in text])
.replace("脙驴脙驴", ",")
for i in text:
if i.count(',')>1:
text.replace(",", "/n")
x = open("outty1.csv","a")
x.writelines(text)
x.close()
A sample of the temperature log before parsing it 馃檪 .
1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882157,24.31每每1629882217,23.52
1629882277,23.38每每1629882337,23.72
1629882397,23.87每每1629882457,23.92
1629882517,23.98每每1629882577,24.02
1629882637,24.08每每1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63每每1629883718,25.85
1629883778,26.08每每1629883838,26.31
1629883898,26.53每每1629883958,26.74
1629884018,26.96每每1629884078,27.12
1629884138,27.26每每1629884198,27.38
1629884258,27.48每每1629884318,27.56
1629884378,27.63每每1629884438,27.69
1629884498,27.73.
This is my progress once I run the program
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28,1629882097,24.29
1629882157,24.31,1629882217,23.52
1629882277,23.38,1629882337,23.72
1629882397,23.87,1629882457,23.92
1629882517,23.98,1629882577,24.02
1629882637,24.08,1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63,1629883718,25.85
1629883778,26.08,1629883838,26.31
1629883898,26.53,1629883958,26.74
1629884018,26.96,1629884078,27.12
1629884138,27.26,1629884198,27.38
1629884258,27.48,1629884318,27.56
1629884378,27.63,1629884438,27.69
1629884498,27.73
and the fixed sample output, I saw the answer once I pasted the input and compared the out put LOL 馃檪
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28
1629882097,24.29
1629882157,24.31
1629882217,23.52
1629882277,23.38
1629882337,23.72
1629882397,23.87
1629882457,23.92
1629882517,23.98
1629882577,24.02
1629882637,24.08
1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63
1629883718,25.85
1629883778,26.08
1629883838,26.31
1629883898,26.53
1629883958,26.74
1629884018,26.96
1629884078,27.12
1629884138,27.26
1629884198,27.38
1629884258,27.48
1629884318,27.56
1629884378,27.63
1629884438,27.69
1629884498,27.73
1629884558,27.75
the old code
text = ''.join([i for i in text])
.replace("脙驴脙驴", ",")
the new code
text = ''.join([i for i in text])
.replace("脙驴脙驴", "n")
Answers:
If you’re looking for another option to consider, here’s something to try:
data = """1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882757,24.15
"""
# spliting on the garbage chars has a side-effect of removing them
a = data.split('每每')
# then simply join() to reassemble the original data
b = 'n'.join(a)
Or, as a one-liner:
fixed_data = 'n'.join(data.split('每每'))
I’m using Python 3 to clean up a CSV
file that sometimes has four entries per line. For some reason the datalogger
didn’t insert a new line, and this happened periodically, not sure why.
So I’m trying to remove bad characters from the CSV
and have had success, but the lines with four entries instead of two I wanted to find the delimiter and replace it with a new line.
Sounds simple enough, but I don’t possess the code-fu, and I’m wondering if anyone could help. 馃檪
thanks
import csv
with open('outty1.csv', 'w', newline='') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(["Date", "Temperature", "Humidity"])
text = open("temperature.csv", "r")
text = ''.join([i for i in text])
.replace("脙驴脙驴", ",")
for i in text:
if i.count(',')>1:
text.replace(",", "/n")
x = open("outty1.csv","a")
x.writelines(text)
x.close()
A sample of the temperature log before parsing it 馃檪 .
1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882157,24.31每每1629882217,23.52
1629882277,23.38每每1629882337,23.72
1629882397,23.87每每1629882457,23.92
1629882517,23.98每每1629882577,24.02
1629882637,24.08每每1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63每每1629883718,25.85
1629883778,26.08每每1629883838,26.31
1629883898,26.53每每1629883958,26.74
1629884018,26.96每每1629884078,27.12
1629884138,27.26每每1629884198,27.38
1629884258,27.48每每1629884318,27.56
1629884378,27.63每每1629884438,27.69
1629884498,27.73.
This is my progress once I run the program
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28,1629882097,24.29
1629882157,24.31,1629882217,23.52
1629882277,23.38,1629882337,23.72
1629882397,23.87,1629882457,23.92
1629882517,23.98,1629882577,24.02
1629882637,24.08,1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63,1629883718,25.85
1629883778,26.08,1629883838,26.31
1629883898,26.53,1629883958,26.74
1629884018,26.96,1629884078,27.12
1629884138,27.26,1629884198,27.38
1629884258,27.48,1629884318,27.56
1629884378,27.63,1629884438,27.69
1629884498,27.73
and the fixed sample output, I saw the answer once I pasted the input and compared the out put LOL 馃檪
Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28
1629882097,24.29
1629882157,24.31
1629882217,23.52
1629882277,23.38
1629882337,23.72
1629882397,23.87
1629882457,23.92
1629882517,23.98
1629882577,24.02
1629882637,24.08
1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63
1629883718,25.85
1629883778,26.08
1629883838,26.31
1629883898,26.53
1629883958,26.74
1629884018,26.96
1629884078,27.12
1629884138,27.26
1629884198,27.38
1629884258,27.48
1629884318,27.56
1629884378,27.63
1629884438,27.69
1629884498,27.73
1629884558,27.75
the old code
text = ''.join([i for i in text])
.replace("脙驴脙驴", ",")
the new code
text = ''.join([i for i in text])
.replace("脙驴脙驴", "n")
If you’re looking for another option to consider, here’s something to try:
data = """1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882757,24.15
"""
# spliting on the garbage chars has a side-effect of removing them
a = data.split('每每')
# then simply join() to reassemble the original data
b = 'n'.join(a)
Or, as a one-liner:
fixed_data = 'n'.join(data.split('每每'))