Python code to automate desktop activities in windows

Question:

I want to automate desktop activities in Windows environment using Python. How it can be done? Some examples will also be helpful.

By desktop activities, I mean actions such as taking control over mouse and keyboard, access active windows properties, double-click on an icon on the desktop, minimize and maximize windows, enter data to an input popup window through keyboard, etc.

Asked By: stallion

||

Answers:

There are different ways of automating user interfaces in Windows that can be accessed via Python (using ctypes or some of the Python windows bindings):

  1. Raw windows APIs — Get/SetCursorPos for the mouse, HWND APIs like GetFocus and GetForegroundWindow

  2. AutoIt — an automation scripting language: Calling AutoIt Functions in Python

  3. Microsoft Active Accessibility (MSAA) / WinEvent — an API for interrogating a UI through the accessibility APIs in Win95.

  4. UI/Automation (UIA) — a replacement for MSAA introduced in Vista (available for XP SP3 IIRC).

Automating a user interface to test it is a non-trivial task. There are a lot of gotchas that can trip you up.

I would suggest testing your automation framework in an automated way so you can verify that it works on the platforms you are testing (to identify failures in the automation API vs failures in the application).

Another consideration is how to deal with localization. Note also that the names for Minimize/Maximize/… are localized as well, and can be in a different language to the application (system vs. user locale)!

In pseudo-code, an MSAA program to minimize an application would look something like:

window = AccessibleObjectFromWindow(FindWindow("My Window"))
titlebar = [x for x in window.AccessibleChildren if x.accRole == TitleBar]
minimize = [x for x in titlebar[0].AccessibleChildren if x.Name == "Minimize"]
if len(minimize) != 0: # may already be minimized
    mimimize[0].accDoDefaultAction()

MSAA accessible items are stored as (object: IAccessible, childId: int) pairs. Care is needed here to get the calls correct (e.g. get_accChildCount only uses the IAccessible, so when childId is not 0 you must return 0 instead of calling get_accChildCount)!

IAccessible calls can return different error codes to indicate "this object does not support this property" — e.g. DISP_E_MEMBERNOTFOUND or E_NOTIMPL.

Be aware of the state of the window. If the window is maximized then minimized, restore will restore the window to its maximized state, so you need to restore it again to get it back to the normal/windowed state.

The MSAA and UIA APIs don’t support right mouse button clicks, so you need to use a Win32 API to trigger it.

The MSAA model does not support treeview heirarchy information — it displays it as a flat list. On the other hand, UIA will only enumerate elements that are visible so you will not be able to access elements in the UIA tree that are collapsed.

Answered By: reece

Have a look at SIKULI.

Sikuli is a visual technology to automate and test graphical user
interfaces (GUI) using images (screenshots).

SIKULI uses a very clever combination of taking screenshots, and embedding them into your python (it’s jython, actually) script.


Take screenshots:

enter image description here

and use them in your code:

enter image description here

Answered By: sloth

You can try Automa.

It’s a Windows GUI automation tool written in Python which is very simple to use. For example, you can do the following:

# to double click on an icon on the desktop
doubleclick("Recycle Bin")

# to maximize
click("Maximize")

# to input some text and press ENTER
write("Some text", into="Label of the text field")
press(ENTER)

The full list of available commands can be found here.

Disclaimer: I’m one of Automa’s developers.

Answered By: Tytus

You can use PyAutoGUI which provide a cross-platform Python way to perform GUI automation.

Mouse Control

Here is a simple code to move the mouse to the middle of the screen:

import pyautogui
screenWidth, screenHeight = pyautogui.size()
pyautogui.moveTo(screenWidth / 2, screenHeight / 2)

Related question: Controlling mouse with Python.

Keyboard Control

Example:

pyautogui.typewrite('Hello world!')                 # prints out "Hello world!" instantly
pyautogui.typewrite('Hello world!', interval=0.25)  # prints out "Hello world!" with a quarter second delay after each character

Message Box Functions

It provides JavaScript-style message boxes.

And other.


For other suggestions, check: Python GUI automation library for simulating user interaction in apps.

Answered By: kenorb

Take a look at BotCity Framework, an open-source RPA framework. 

It’s just python (no intermediary code, no jython, etc).

The example below executes SAP and logs in:

from botcity.core import DesktopBot
from botcity.maestro import AlertType, AutomationTaskFinishStatus, Column

class Bot(DesktopBot):
    def action(self, execution):
        self.execute("saplogon.exe")
        
        # #{image:"login"}
    
        if not self.find( "user", matching=0.97, waiting_time=10000):
            self.not_found("user")
        self.click_relative(172, 5)
        
        self.paste(user)
        self.tab()
        self.paste(pass)
        self.enter()
        
if __name__ == '__main__':
    Bot.main()

As Sikuli, you have a tool to crop elements and have visual clues about the interface and UI elements. But in this case, it’s a tool for editing .py files (not intermediary code) so you can use any python lib in your automation.

Answered By: rickm

You can try ClointFusion

It’s again a Python based RPA platform which internally makes use of PyAutoGUI among other packages.

It has a friendly Browser based Drag & Drop BOT Builder: DOST

You can find more than 100 easy to use functions:

  1. 6 gui functions to take any input from user
  2. 4 functions on Mouse Operations
  3. 6 functions on Window Operations (works only in Windows OS)
  4. 5 functions on Window Objects (works only in Windows OS)
  5. 8 functions on Folder Operations
  6. 28 functions on Excel Operations
  7. 3 functions on Keyboard Operations
  8. 5 functions on Screenscraping Operations
  9. 11 functions on Browser Operations
  10. 4 functions on Alert Messages
  11. 3 functions on String Operations
  12. Loads of miscellaneous functions related to emoji, capture photo, flash (pop-up) messages etc

Disclaimer: I’m one of developers of ClointFusion

Answered By: Mayur Patil