How to save in *.xlsx long URL in cell using Pandas
Question:
For example I read excel file into DataFrame with 2 columns(id and URL). URLs in input file are like text(without hyperlinks):
input_f = pd.read_excel("input.xlsx")
Watch what inside this DataFrame – everything was successfully read, all URLs are ok in input_f
. After that when I wan’t to save this file to_excel
input_f.to_excel("output.xlsx", index=False)
I got warning.
Pathworksheet.py:836: UserWarning: Ignoring URL ‘http:// here long URL’ with
link or location/anchor > 255 characters since it exceeds Excel’s
limit for URLS force_unicode(url))
And in output.xlsx cells with long URL were empty, and URLs become hyperlinks.
How to fix this?
Answers:
I tried it myself and got the same problem. You could try to create a temp csv file and then use xlsxwriter to create an excel file. Once done then delete the tmp file. xlsxwriter has a write_string method that will override the auto hyperlinking that excel does. This worked for me.
import pandas as pd
import csv
import os
from xlsxwriter.workbook import Workbook
inData = "C:/Users/martbar/Desktop/test.xlsx"
tmp = "C:/Users/martbar/Desktop/tmp.csv"
exFile = "C:/Users/martbar/Desktop/output.xlsx"
#read in data
df = pd.read_excel(inData)
#send to csv
df.to_csv(tmp, index=False)
#convert to excel
workbook = Workbook(exFile)
worksheet = workbook.add_worksheet()
with open(tmp, 'r') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
#if you use write instead of write_string you will get the error
worksheet.write_string(r, c, col)
workbook.close()
#delete tmp file
os.remove(tmp)
You can create an ExcelWriter object with the option not to convert strings to urls:
writer = pandas.ExcelWriter(r'file.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})
df.to_excel(writer)
writer.close()
From the docs in the section: "Passing XlsxWriter constructor options to Pandas", 'strings_to_urls': False
is now specified like this:
writer = pd.ExcelWriter('pandas_example.xlsx',
engine='xlsxwriter',
engine_kwargs={'options': {'strings_to_urls': False}})
and then keep doing what the accepted response suggests here:
df.to_excel(writer)
writer.close()
For example I read excel file into DataFrame with 2 columns(id and URL). URLs in input file are like text(without hyperlinks):
input_f = pd.read_excel("input.xlsx")
Watch what inside this DataFrame – everything was successfully read, all URLs are ok in input_f
. After that when I wan’t to save this file to_excel
input_f.to_excel("output.xlsx", index=False)
I got warning.
Pathworksheet.py:836: UserWarning: Ignoring URL ‘http:// here long URL’ with
link or location/anchor > 255 characters since it exceeds Excel’s
limit for URLS force_unicode(url))
And in output.xlsx cells with long URL were empty, and URLs become hyperlinks.
How to fix this?
I tried it myself and got the same problem. You could try to create a temp csv file and then use xlsxwriter to create an excel file. Once done then delete the tmp file. xlsxwriter has a write_string method that will override the auto hyperlinking that excel does. This worked for me.
import pandas as pd
import csv
import os
from xlsxwriter.workbook import Workbook
inData = "C:/Users/martbar/Desktop/test.xlsx"
tmp = "C:/Users/martbar/Desktop/tmp.csv"
exFile = "C:/Users/martbar/Desktop/output.xlsx"
#read in data
df = pd.read_excel(inData)
#send to csv
df.to_csv(tmp, index=False)
#convert to excel
workbook = Workbook(exFile)
worksheet = workbook.add_worksheet()
with open(tmp, 'r') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
#if you use write instead of write_string you will get the error
worksheet.write_string(r, c, col)
workbook.close()
#delete tmp file
os.remove(tmp)
You can create an ExcelWriter object with the option not to convert strings to urls:
writer = pandas.ExcelWriter(r'file.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})
df.to_excel(writer)
writer.close()
From the docs in the section: "Passing XlsxWriter constructor options to Pandas", 'strings_to_urls': False
is now specified like this:
writer = pd.ExcelWriter('pandas_example.xlsx',
engine='xlsxwriter',
engine_kwargs={'options': {'strings_to_urls': False}})
and then keep doing what the accepted response suggests here:
df.to_excel(writer)
writer.close()