How to copy text with formatting using Python + Selenium?

Question:

I’m using Python and Selenium and I need to copy text from a webpage to the OS Windows clipboard with formatting.

For example, when you copy text from a webpage by pressing the Ctrl+C key combination, and then paste it into Microsoft Word using the Ctrl+V key combination, you can see that the text is copied with formatting.

I want to achieve the same result with a Python + Selenium script that navigate to a website, and copy the formatted text to the clipboard. Then, I want manually open Microsoft Word, press Ctrl+V, and paste the text with its formatting.

Here’s an example of my code, but it only copies the styles and not the formatting:

import time
import pyperclip
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

# Configure Chrome options
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

# Start Chrome driver
driver = webdriver.Chrome(options=options)

# Go to the webpage
driver.get("https://example.com/")
time.sleep(2) # wait for page to load

# Find the element containing the text
element = driver.find_element("xpath", "/html/body/div")
text = element.get_attribute("innerHTML")

# ----> OR: text = driver.find_element("tag name", "body").text # also does not copy from formatting

# Copy text to clipboard
pyperclip.copy(text)

# Quit driver
driver.quit()

Question: How can I copy text with formatting using Selenium?

Note: I know that I can simulate keystrokes

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from time import sleep

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Navigate to example.com
driver.get("https://en.wikipedia.org/wiki/List_of_lists_of_lists")

# Find the element that contains the text to copy
element = driver.find_element(By.XPATH, '//*[@id="mw-content-text"]/div[1]/ul[4]')

# Select all the text on the page and copy it to the clipboard
actions = ActionChains(driver)
actions.move_to_element(element)
ActionChains(driver).click(element).key_down(Keys.CONTROL).send_keys("a").key_up(Keys.CONTROL).key_down(Keys.CONTROL).send_keys("c").key_up(Keys.CONTROL).perform()

# Quit the driver
driver.quit()

but this option is not suitable, because I do not know how to select a separate element and not the whole document

Answers:

You can use klembord package to copy the text with rich text instead of pyperclip,
using the set_with_rich_text(text,html) method.

text (str): Plain text to set selection to.

html (str): HTML formatted rich text to set selection to.

Here you can just use both plain text and html as your text, if you don’t care about the plain text format. But if you need the plain text just simply use element.text as argument for text parameter
klembord.set_with_rich_text(element.text,text).

import time
import klembord
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)
driver.get("https://example.com/")
time.sleep(2) 

element = driver.find_element("xpath", "/html/body/div")
text = element.get_attribute("innerHTML")

# Copy using klembord to keep the formatting
# element.text is plain text and text is html
klembord.set_with_rich_text(element.text,text)

driver.quit()

Output:
Formatted text pasted on libre office writer

Answered By: Sreyas
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.