How to feed multi-line grep output to xargs with multiple parameters from each line

Question:

I have a CSV file to grep from, where the number of columns may vary, e.g.:

$ grep -v 'Coba' ETFs.csv | grep 'Sector'
CAPE,Ossiam Shiller Barclays Cape Europe Sector Value TR,EUPE,FWB
EUPB1,Ossiam Shiller Barclays Cape Global Sector Value,EUPB,FWB

For each output line of grep, I need to run a Python script and submit column #3 with the flag -t and column #2 with the flag -n. For simplicity’s sake, let’s assume the script prints submitted parameters. Thus, upon receiving the output above, the script should be executed two times like this:

script.py -t EUPE -n Ossiam Shiller Barclays Cape Europe Sector Value TR
script.py -t EUPB -n Ossiam Shiller Barclays Cape Global Sector Value

My problem is that xargs handles the entire grep output as one parameter. The expected behavior is to process input line-by-line, split the line content by coma and submit it to the script in any desired order. How to do that?

Example Python script:

import argparse, sys

aparser = argparse.ArgumentParser()
aparser.add_argument('-t', '--ticker', required=True)
aparser.add_argument('-n', '--name')

args = aparser.parse_args()
print(f"START {sys.argv[0]} for {args.ticker} {args.name}: ", end='')

UPDATE 23.3

What currently works is to submit only one parameter as the ticker flag -t, since -n name flag is optional. Nevertheless, the real case scenario is to submit more than one flag at once, e.g., ETF ticker and name, screener, exchange, etc.

$ grep -v 'Coba' ETFs.csv | grep 'Sector' | cut -d ',' -f3 | xargs -I {} python script.py -t {}

as pointed out in the commentary, it can be expressed with awk too.

$ awk -F, '! /Coba/ && /Sector/ {print $3}' ETFs.csv | xargs -I {} python script.py -t {}

The solution suggested by @SHERLOCK actually looks logical. We extract different parts of the same input. However, the second parameter submission does not work, resulting in argparse usage reminder.

awk -F, '! /Coba/ && /Sector/ {printf "%s,%sn", $3, $2}' ETFs.csv | xargs -t -I {} bash -c 'python script.py -t $(echo "{}" | awk -F "," "{print $1}") -n $(echo "{}" | awk -F "," "{print $2}")'

Harmful code injection like rm -rf should not be of concern here, as I always start by inspecting the CSV output and only then pipe it to the script.

Asked By: Frumda Grayforce

||

Answers:

You can use the ‘-d’ flag with xargs to specify the delimiter as ‘,’ and the ‘-I’ flag to specify a placeholder for the input string. After which, you can use ‘awk’ to extract the desired columns from each line and pass them as arguments to the Python script.

Here’s what you can do:

grep -v 'Coba' ETFs.csv | grep 'Sector' | xargs -d',' -I {} sh -c 'script.py -t $(echo "{}" | awk -F"," "{print $3}") -n "$(echo "{}" | awk -F"," "{print $2}")"'

This will split each input line by ‘,’ then use awk to extract the third and second columns and pass them as arguments to script.py. The second column is enclosed in double quotes to handle cases where it contains spaces.

You may need to modify the command slightly depending on your shell and the exact format of your CSV file.

Answered By: SHERLOCK

A BashFAQ #1 while read loop is a better choice than xargs:

while IFS=, read -r col3 col2 _ <&3; do
    python script.py -t "$col3" -n "$col2"
done 3< <(awk -F, '! /Coba/ && /Sector/ {printf "%s,%sn", $3, $2}' ETFs.csv)

This way we’re not starting a separate shell per line, but just have the same shell that was already running your script loop over awk’s stdout from Python. Thus, both security issues and quoting bugs are addressed.


Some notes:

  • Using file descriptor 3 instead of the default fd 0 (stdin) for the redirection leaves stdin free so script.py can use it for things like prompting the user without reading data from ETFs.py unintentionally.

  • The _ following col2 causes subsequent fields to be discarded (well — put into the variable _, which is conventional in scripts to use for junk intended to be ignored) rather than appended to the value in col2.

  • See BashFAQ #24 to understand why the while read ...; done < <(awk ...) idiom is used instead of awk ... | while read ...; done.

  • If the set of lines that awk is being used to filter out isn’t huge, you could just get rid of awk entirely and use bash for the filtering:

    while IFS=, read -r col1 col2 col3 _ <&3; do
       [[ $col2 = *Coba* ]] && continue   # skip lines with Coba in description
       [[ $col2 = *Sector* ]] || continue # skip lines without Sector in description
       python script.py -t "$col3" -n "$col2"
    done 3<ETFs.csv
    
  • The most obvious bug in the original xargs approach given in the question (besides the security issues which are described as addressed by human review) is missing quotes around the command substitutions — "$(...)", not just bare $(...), prevents words in the resulting string from being split into separate arguments (and then expanded as globs based on filesystem contents in the current working directory). Because the descriptions of your ETFs in the second column contain spaces, that’s critical for this data.

Answered By: Charles Duffy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.