Python 3.10 script calling a Powershell script – How to store output after a certain string

Question:

I am hoping someone can help me with this issue as I am lost.
I am calling a Powershell script that produces several lines of output, this in an extract:

7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15

Scanning the drive:
7 folders, 21 files, 21544 bytes (22 KiB)

Creating archive: conf.tar
Creating archive: conf2.tar

Removing tar file after upload...
Generating Links:
--------------------------------------------------------------
Link_1
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
--------------------------------------------------------------
Link_2
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..

My Python script calls the Powershell script this way:

import subprocess, sys
p = subprocess.Popen(["powershell.exe", 
              "script.ps1"], 
              stdout=sys.stdout, shell=True)              
p_out, p_err = p.communicate()
print(p_out)

And I can see the output on screen when I run the python script from a Powershell CLI.
Is there a way to extract those links from the output and pass them to Python?

Asked By: MBud

||

Answers:

You should have all in p_out as string (so you should already have it in Python) and now you should use Python’s functions to extract it from this string. You can split to lines and search line with https at the beginning. OR you can use regex.

p_out  = '''7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15

Scanning the drive:
7 folders, 21 files, 21544 bytes (22 KiB)

Creating archive: conf.tar
Creating archive: conf2.tar

Removing tar file after upload...
Generating Links:
--------------------------------------------------------------
Link_1
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
--------------------------------------------------------------
Link_2
https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..'''

lines = p_out.split('n')

links = []

for line in lines:
    if line.startswith('http'):
        line = line.strip() # remove 'n' and spaces
        links.append(line)
        
for url in links:        
    print('url:', url)

Result:

url: https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..
url: https://some-repository.s3.ap-northeast-2.amazonaws.com/test/conf2.tar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXXXXXXXXXXXXX..

And if you don’t have it in p_out then you should check if p_err.

Answered By: furas
  • In order to capture stdout and stderr output, you must replace stdout=sys.stdout with stdout=PIPE, stderr=PIPE.

    • By contrast, stdout=sys.stdout passes output from the PowerShell call directly through to the console (terminal), so p_out and p_err ended up as None.
  • There is no need for shell=True (calling via the platform’s default shell) in your case – it only slows things down.

  • Adding universal_newlines=True makes Python automatically report the collected stdout and stderr output as strings; in v3.7+, you can use the conceptually clearer alias text=True

  • While you could extract the lines of interest in Python code afterwards, a small addition to your PowerShell call allows you to do that at the source.

Therefore:

import subprocess

p = subprocess.Popen(
  ['powershell', '-NoProfile', '-Command', "(./script.ps1) -match '^https://'" ], 
  stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True
)

# Wait for the process to terminate and collect its stdout and stderr output.
p_out, p_err = p.communicate()

# Split the single multi-line string that contains the links
# into individual lines.
lines = p_out.splitlines()

print(lines)

Note:

  • PowerShell CLI parameters used:

    • -NoProfile isn’t strictly necessary, but advisable, because it suppresses loading of PowerShell’s profiles, which can both help performance and makes for a predictable execution environment.
    • -Command isn’t strictly necessary with powershell.exe, the Windows PowerShell CLI, as it is the implied default; however, it is necessary if you call the PowerShell (Core) 7+ CLI, pwsh.exe, which now defaults to -File instead.
  • The PowerShell code used to extract the links:

    • Since your script invokes an external program, 7z.exe, that program’s stdout is reported line by line by PowerShell.
    • When the regex-based -match operator is given an array as its LHS operand, it acts as a filter. Therefore, only those lines that start with (^) string https:// are returned.
Answered By: mklement0
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.