Download chunk of the large file using pysftp in Python

Question:

I have one use case in which I want to read only top 5 rows of a large CSV file which is present in one of my sftp server and I don’t want to download the complete file to just read the top 5 rows. I am using pysftp in Python to interact with my SFTP server. Do we have any way in which I can download only the chunk of the file instead of downloading the complete file in pysftp?

If there are any other libraries in Python or any technique I can use, please guide me. Thanks

Asked By: Shubham Bansal

||

Answers:

Yes, it is possible to download only a portion of a file from an SFTP server using pysftp. One way to do this is to use the getfo method, which allows you to download a file and write its contents to a file-like object. You can use this method in combination with the io module’s StringIO class, which allows you to create a file-like object in memory that you can read from and write to.

Here is an example of how you might use these methods to download the first 5 lines of a CSV file from an SFTP server:

 import pysftp
import io

# Connect to the SFTP server
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
with pysftp.Connection('sftp.example.com', username='user', password='pass', cnopts=cnopts) as sftp:
# Open the CSV file on the SFTP server
with sftp.open('path/to/file.csv', 'r') as f:
    # Create a file-like object in memory
    output = io.StringIO()
    # Download the first 5 lines of the file and write them to the file-like object
    for i in range(5):
        line = f.readline()
        output.write(line)
    # Reset the file pointer to the beginning of the file-like object
    output.seek(0)
    # Read the contents of the file-like object
    print(output.read())

This example reads the first 5 lines of the file and writes them to a file-like object in memory. You can then read the contents of the file-like object using the read method, or you can process the lines in any other way that you like

Answered By: michael sichilongo

First, do not use pysftp. It’s dead unmaintained project. Use Paramiko instead. See pysftp vs. Paramiko.

If you want to read data from specific point in the file, you can open a file-like object representing the remote file using Paramiko SFTPClient.open method (or equivalent pysftp Connection.open) and then use it as if you were accessing data from any local file:

  • Use .seek to set read pointer to the desired offset.
  • Use .read to read data.
with sftp.open("/remote/path/file", "r", bufsize=32768) as f:
    f.seek(offset)
    data = f.read(count)

For the purpose of bufsize, see:
Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow

Answered By: Martin Prikryl