Get Excel cell background color in pandas read_excel?

Question:

I have an .xls Excel file with cells having background colors. I am reading that file into pandas with read_excel. Is there any way to get the background colors of cells?

Asked By: csaladenes

||

Answers:

Brute-forced it through xlrd, as per Mark‘s suggestion:

from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
  for column in range(sheet.ncols):
    cell = sheet.cell(row, column)  
    fmt = wb.xf_list[cell.xf_index]
    bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol) 

Yet, there must be a better way thorugh pandas directly…

Answered By: csaladenes

I edited the code snippet from @csaladenes’s response above based on this link, and it works for my xls file (the original resulted in all cells showing the same color index, though they have different background colors):

import xlrd
import numpy as np
wb = xlrd.open_workbook(file, formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
bgcol=np.zeros([sheet.nrows,sheet.ncols])
for row in range(sheet.nrows):
    for col in range(sheet.ncols):
        c = sheet.cell(row, col)
        cif = sheet.cell_xf_index(row, col)
        iif = wb.xf_list[cif]
        cbg = iif.background.pattern_colour_index
        bgcol[row,col] = cbg
Answered By: BossaNova

The Solution suggested above works only for xls file, not for xlsx file. This raises a NotImplementedError: formatting_info=True not yet implemented. Xlrd library is still not updated to work for xlsx files. So you have to Save As and change the format every time which may not work for you.

Here is a solution for xlsx files using openpyxl library. A2 is the cell whose color code we need to find out.

import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex) 
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
Answered By: Sumit Pokhrel

Improving on Sumit’s answer (which should be the accepted one in my opinion), you can obtain the color for the whole column by using list comprehension:

import openpyxl
from openpyxl import load_workbook
excel_file = 'my_file.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['my_sheet']
# extract color from column A.
color_in_hex = [cell.fill.start_color.index for cell in sh['A:A']]
Answered By: CAPSLOCK
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.