Remove excess delimiter from CSV file

Question:

I’m using Python 3 to clean up a CSV file that sometimes has four entries per line. For some reason the datalogger didn’t insert a new line, and this happened periodically, not sure why.

So I’m trying to remove bad characters from the CSV and have had success, but the lines with four entries instead of two I wanted to find the delimiter and replace it with a new line.

Sounds simple enough, but I don’t possess the code-fu, and I’m wondering if anyone could help. 馃檪
thanks

import csv 


with open('outty1.csv', 'w', newline='') as outcsv:
    writer = csv.writer(outcsv)
    writer.writerow(["Date", "Temperature", "Humidity"])

text = open("temperature.csv", "r")
text = ''.join([i for i in text]) 
    .replace("脙驴脙驴", ",")


for i in text:
    if i.count(',')>1:
        text.replace(",", "/n")

x = open("outty1.csv","a")
x.writelines(text)
x.close()

A sample of the temperature log before parsing it 馃檪 .

1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882157,24.31每每1629882217,23.52
1629882277,23.38每每1629882337,23.72
1629882397,23.87每每1629882457,23.92
1629882517,23.98每每1629882577,24.02
1629882637,24.08每每1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63每每1629883718,25.85
1629883778,26.08每每1629883838,26.31
1629883898,26.53每每1629883958,26.74
1629884018,26.96每每1629884078,27.12
1629884138,27.26每每1629884198,27.38
1629884258,27.48每每1629884318,27.56
1629884378,27.63每每1629884438,27.69
1629884498,27.73.

This is my progress once I run the program

Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28,1629882097,24.29
1629882157,24.31,1629882217,23.52
1629882277,23.38,1629882337,23.72
1629882397,23.87,1629882457,23.92
1629882517,23.98,1629882577,24.02
1629882637,24.08,1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63,1629883718,25.85
1629883778,26.08,1629883838,26.31
1629883898,26.53,1629883958,26.74
1629884018,26.96,1629884078,27.12
1629884138,27.26,1629884198,27.38
1629884258,27.48,1629884318,27.56
1629884378,27.63,1629884438,27.69
1629884498,27.73

and the fixed sample output, I saw the answer once I pasted the input and compared the out put LOL 馃檪

Date,Temperature,Humidity
1629881977,24.27
1629882037,24.28
1629882097,24.29
1629882157,24.31
1629882217,23.52
1629882277,23.38
1629882337,23.72
1629882397,23.87
1629882457,23.92
1629882517,23.98
1629882577,24.02
1629882637,24.08
1629882697,24.12
1629882757,24.15
1629882817,24.19
1629882877,24.24
1629882937,24.31
1629882997,24.36
1629883057,24.40
1629883117,24.44
1629883177,24.38
1629883237,24.50
1629883298,24.60
1629883358,24.72
1629883418,24.88
1629883478,25.05
1629883538,25.23
1629883598,25.42
1629883658,25.63
1629883718,25.85
1629883778,26.08
1629883838,26.31
1629883898,26.53
1629883958,26.74
1629884018,26.96
1629884078,27.12
1629884138,27.26
1629884198,27.38
1629884258,27.48
1629884318,27.56
1629884378,27.63
1629884438,27.69
1629884498,27.73
1629884558,27.75

the old code

text = ''.join([i for i in text]) 
    .replace("脙驴脙驴", ",")

the new code

text = ''.join([i for i in text]) 
    .replace("脙驴脙驴", "n")
Asked By: user1815179

||

Answers:

If you’re looking for another option to consider, here’s something to try:

data = """1629881977,24.27
1629882037,24.28每每1629882097,24.29
1629882757,24.15
"""

# spliting on the garbage chars has a side-effect of removing them
a = data.split('每每')

# then simply join() to reassemble the original data
b = 'n'.join(a)

Or, as a one-liner:

fixed_data = 'n'.join(data.split('每每'))
Answered By: Seth
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.