issue with *.ics splitting strings with more than one line *Python*

Question:

I have tried as many methods I could find, and always got the same result, but there must be a fix for this?

I am downloading an ICS from a website, were one of the lines "Summary", is split in two.
When I load this into a string these two lines get automaticly joined into 1 string, unless there are "n".

so I have tried to replace both "n" and "r", but there is no change on my issue.

Code

from icalendar import Calendar, Event
from datetime import datetime
import icalendar
import urllib.request
import re
from clear import clear_screen

cal = Calendar()

def download_ics():
    url = "https://www.pogdesign.co.uk/cat/download_ics/7d903a054695a48977d46683f29384de"
    file_name = "pogdesign.ics"
    urllib.request.urlretrieve(url, file_name)

def get_start_time(time):
    time = datetime.strftime(time, "%A - %H:%M")
    return time

def get_time(time):
    time = datetime.strftime(time, "%H:%M")
    return time

def check_Summary(text):
    #newline = re.sub('[rn]', '', text)
    newline = text.translate(str.maketrans("", "", "rn"))
    return newline

def main():
    download_ics()
    clear_screen()
    e = open('pogdesign.ics', 'rb')
    ecal = icalendar.Calendar.from_ical(e.read())
    for component in ecal.walk():
        if component.name == "VEVENT":
            summary = check_Summary(component.get("SUMMARY"))
            print(summary)
            print("t Start : " + get_start_time(component.decoded("DTSTART")) + " - " + get_time(component.decoded("DTEND")))

            print()
    e.close()

if __name__ == "__main__":
    main()

output

Young Sheldon S06E11 – Ruthless, Toothless, and a Week ofBed Rest
Start : Friday – 02:00 – 02:30

The Good Doctor S06E11 – The Good Boy
Start : Tuesday – 04:00 – 05:00

National Treasure: Edge of History S01E08 – Family Tree
Start : Thursday – 05:59 – 06:59

National Treasure: Edge of History S01E09 – A Meeting withSalazar
Start : Thursday – 05:59 – 06:59

The Last of Us S01E03 – Long Long Time
Start : Monday – 03:00 – 04:00

The Last of Us S01E04 – Please Hold My Hand
Start : Monday – 03:00 – 04:00

Anne Rice’s Mayfair Witches S01E04 – Curiouser and Curiouser
Start : Monday – 03:00 – 04:00

Anne Rice’s Mayfair Witches S01E05 – The Thrall
Start : Monday – 03:00 – 04:00

The Ark S01E01 – Everyone Wanted to Be on This Ship
Start : Thursday – 04:00 – 05:00

I have looked through all kinds of solutions, like converting the text to "utf-8" and "ISO-8859-8".
I have tried some functions I found in the icalendar.
have even asked ChatGPT for help.

as you might see on the first line on the output:
Young Sheldon S06E11 – Ruthless, Toothless, and a Week ofBed Rest
and
National Treasure: Edge of History S01E09 – A Meeting withSalazar

These two lines in the downloaded ics, is on two seperate lines, and i cannot manage to make them split, or not join at all…

Asked By: Vebjørn Endresen

||

Answers:

So far as the icalendar.Calendar class is concerned, that ical is incorrectly formatted.

icalendar.Calendar.from_ical() calls icalendar.Calendar.parser.Contentlines.from_ical() which is

    def from_ical(cls, ical, strict=False):
        """Unfold the content lines in an iCalendar into long content lines.
        """
        ical = to_unicode(ical)
        # a fold is carriage return followed by either a space or a tab
        return cls(uFOLD.sub('', ical), strict=strict)

where uFOLD is re.compile('(r?n)+[ t]')

That means it’s removing each series of newlines that is followed by one space or tab character – not replacing it with a space. The ical file you’re retrieving has e.g.

SUMMARY:Young Sheldon S06E11 - \nRuthless\, Toothless\, and a Week ofrn Bed Restrn

so when ofrn Bed is matched it becomes ofBed.

This line-folding format is defined in RFC 2445 which gives the example

For example the line:

DESCRIPTION:This is a long description that exists on a long line.

Can be represented as:

DESCRIPTION:This is a lo
 ng description
  that exists on a long line.

which makes clear that the implementation in from_ical() is correct.

If you’re quite sure that the source ical will always fold lines on words, you could adjust for that by adding a space after each line fold, like:

    ecal = icalendar.Calendar.from_ical(e.read().replace(b'rn ', b'rn  '))
Answered By: Nic Wolff
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.