when I execute Linux split file command from Python , Files got split, but data being trucated

Question:

I am trying to execute Linux file Split command based on number of lines from Python. Below is the code

cmd = 'split -a 4 --verbose --lines 1000 --additional-suffix=.csv master.csv split_file_'
p = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

p.stdin.close()
print "Stdout:", p.stdout.read()
print "Stderr:", p.stderr.read()

When I try to execute the command in Linux directly I am getting files generated correctly.
But when executed from Python I am having issue.

Let say I have 2001 records in master.csv I am getting 2 files generated with 1000 lines in one file and 756 lines in another file. even the data in last line in second file is truncated.

Let say I have 1001 records in master.csv I am getting 1 file generated with 756 lines in one file even the data in last line is truncated.

I am using python2.7

Asked By: chandramohan

||

Answers:

Popen has a maximum buffer size. It can be adjusted with bufsize but it isn’t recommended.

You would be better off writing the files to disk and reading them into your script or implementing the split function yourself in python

Answered By: brunson

After couple of Struggles and googling I manage to fix the issue.

My Implementation was just to execute .py file from linux command line
So when executing directly .py file linux command (split) inside .py failed/partially executed.

So inorder to solve the issue, I moved split file command to shell script and then called py files from shell script only.

#!/bin/bash
python pythonfile1.py
split -a 4 --verbose --lines 50000 --additional-suffix=.csv csvfiles/master.csv csvfiles/split/split_file_
python pythonfile1.py

Not a perfect solution for actual issue but just a alternative to solve my issue

Answered By: chandramohan