Replace carriage returns in list python

Question:

I have a list of values and need to remove errant carriage returns whenever they occur in a list of values.

the format of the file that I am looking to remove these in is as follows.

field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|val
ue 3|value 4|value 5
value 1|value 2|value 3|va
lue 4|value 5

I am looking to address a situation like the one above where there are errant carriage returns in the 3rd and 4th values for the last 2 rows of data.

I have seen a few posts for how to address this but so far nothing has worked for this situation. I have pasted the code I have attempted so far.

import os
import sys

filetoread = 'C:temptest.dat'
filetowrite = 'C:emptest_updated.dat'

'''
Attempt 1
'''
with open(filetoread, "r+b") as inf:
    with open(filetowrite, "w") as fixed:
        for line in inf:
            fixed.write(line)


'''
Attempt 2
'''           
for line in filetoread:
    line = line.replace("n", "")


'''
Attempt 3
'''
with open(filetoread, "r") as inf:
    for line in inf:
        if "n" in line:
            line = line.replace("n", "")
Asked By: Tony Nesavich

||

Answers:

The n character is a line feed. r is the carriage return:

http://www.asciitable.com/

http://en.cppreference.com/w/cpp/language/escape

So,

> line.replace("n", "")

should be

 line.replace("r", "")

Do check if it’s really r alone, or the rn pair. Windows/DOS uses rn,
Mac & Co uses r, Linux uses n alone

Answered By: jcoppens

Note: I’m assuming you have extra newlines ('n') not carriage returns ('r').

def remove_newlines_in_fields(data, ncols, sep):
    sep_count = 0
    for c in data:
        if c == sep:
            sep_count += 1
        if c == 'n':
            if sep_count == ncols - 1:
                yield c
                sep_count = 0
        else:
            yield c

Also note that if you have newlines in your rightmost column this won’t work properly. (The partial column will be prepended to the next row.)

Here it is in action:

>>> s = '''field1|field2|field3|field4|field5
... value 1|value 2|value 3|value 4|value 5
... value 1|value 2|value 3|value 4|value 5
... value 1|value 2|val
... ue 3|value 4|value 5
... value 1|value 2|value 3|va
... lue 4|value 5'''
>>> print(''.join(remove_newlines_in_fields(s, 5, '|')))
field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
Answered By: Steven Rumbalski

You have to count the number of fields, to match 5 per line:

import re
with open(filetoread, "r+b") as inf:
    with open(filetowrite, "w") as fixed:
        for l in re.finditer('(?:.*?|){4}(?:.*?)n', inf.read(), re.DOTALL):
            fixed.write(l.group(0).replace('n','') + 'n')
Answered By: Daniel

The following will remove any carriage return characters embedded in each field:

with open(filetoread, "rb") as inf:
    with open(filetowrite, "w") as fixed:
        for line in (line.rstrip() for line in inf):
            fields = (field.replace('r', '') for field in line.split('|'))
            fixed.write('|'.join(fields) + 'n')
Answered By: martineau

**if the line you read from a text file is empty with ^M at the end, in only that case, python will read as two empty lines:

infile:**
Cookie: login=admin; session=oNvChuTLIyFhParkQ0c4UswT^M
^M
{"order":["descending","time"],"where":{"access_logs":{"time":{"<=":1675900799,">=":1673308800}},"users":{},"groups":{},"time_zones":{}},"object":"access_logs","fields":["COUNT(*)"],"join":"LEFT"}

output of: for line in infile:print(‘LINE:’+line+’!’)

LINE:Cookie: login=admin; session=oNvChuTLIyFhParkQ0c4UswT!
LINE:!
LINE:!
LINE:!
LINE:{"order":["descending","time"],"where":{"access_logs":{"time":{"<=":1675900799,">=":1673308800}},"users":{},"groups":{},"time_zones":{}},"object":"access_logs","fields":["COUNT(*)"],"join":"LEFT"}!

Answered By: user3706854
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.