grep: write error: Broken pipe with subprocess

Question:

I get couple of grep:write errors when I run this code.
What am I missing?

This is only part of it:

     while d <= datetime.datetime(year, month, daysInMonth[month]):
        day = d.strftime("%Y%m%d")
        print day
        results = [day]
        first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True, stdout=subprocess.PIPE, )
        output1=first.communicate()[0]
        d += delta
        day = d.strftime("%Y%m%d")
        second=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' "+ monthDir +"/"+day+"*.txt | grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt", shell=True,  stdout=subprocess.PIPE, )
        output2=second.communicate()[0]
        articleList = (output1.split('n'))
        articleList2 = (output2.split('n'))
        results.append( len(articleList)+len(articleList2))
        w.writerow(tuple(results))
        d += delta
Asked By: Jiyda Moussa

||

Answers:

To find the files matching two patterns, the command structure should be:

grep -l pattern1 $(grep -l pattern2 files)

$(command) substitutes the output of the command into the command line.

So your script should be:

first=subprocess.Popen("grep -Eliw 'Algeria|Bahrain' $("+ grep -Eliw 'Protest|protesters' "+ monthDir +"/"+day+"*.txt)", shell=True, stdout=subprocess.PIPE, )

and similarly for second

Answered By: Barmar

If you are just looking for whole words, you could use the count() member function;

# assuming names is a list of filenames
for fn in names:
    with open(fn) as infile:
        text = infile.read().lower()
    # remove puntuation
    text = text.replace(',', '')
    text = text.replace('.', '')
    words = text.split()
    print "Algeria:", words.count('algeria')
    print "Bahrain:", words.count('bahrain')
    print "protesters:", words.count('protesters')
    print "protest:", words.count('protest')

If you want more powerful filtering, use re.

Answered By: Roland Smith

When you do

A | B

in a shell, process A’s output is piped into process B as input. If process B shuts down before reading all of process A’s output (e.g. because it found what it was looking for, which is the function of the -l option), then process A may complain that its output pipe was prematurely closed.

These errors are basically harmless, and you can work around them by redirecting stderr in the subprocesses to /dev/null.

A better approach, though, may simply be to use Python’s powerful regex capabilities to read the files:

def fileContains(fn, pat):
    with open(file) as f:
        for line in f:
            if re.search(pat, line):
                return True
    return False

first = []
for file in glob.glob(monthDir +"/"+day+"*.txt"):
    if fileContains(file, 'Algeria|Bahrain') and fileContains(file, 'Protest|protesters'):
        file.append(first)
Answered By: nneonneo

Add stderr args in the Popen function based on the python version the stderr value will change. This will support if the python version is less than 3

first=subprocess.Popen("grep -Eliw ‘Algeria|Bahrain’ "+ monthDir +"/"+day+".txt | grep -Eliw ‘Protest|protesters’ "+ monthDir +"/"+day+".txt", shell=True, stdout=subprocess.PIPE, stderr = subprocess.STDOUT)

Answered By: Manikandan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.