Faster way to make S3 "folder hierarchy" than parsing of filenames?
Question:
I want to make a relatively basic tool to browse a bucket in S3 as a file hierarchy rather than simply a list of filenames with slashes in them.
Currently, I am using boto to get the list of keynames in a bucket and then parsing the keynames to make a nested dictionary of the “folders” and files. However, that process takes so long! Even just going through each key to get a list of all higher level folders takes 15+ minutes.
How do tools such as cyberduck give a list of folders so quickly?
Answers:
Check this link: http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
listObjects()
has a parameter called delimiter
, which could be set to /
and resulting list of objects will look exactly as a tree of files. I think this is what you’re looking for.
Using s3-tree might be helpful.
https://pypi.org/project/s3-tree/
Example:
$ s3-tree bucketname
bucketname
├── asset-manifest.json
├── favicon.ico
├── index.html
├── manifest.json
├── precache-manifest.e8c8442b93de34204de5f9b23fa0174b.js
├── service-worker.js
└── static
├── css
│ ├── main.43b5e879.chunk.css
│ └── main.43b5e879.chunk.css.map
├── js
│ ├── 1.f6579156.chunk.js
│ ├── 1.f6579156.chunk.js.map
│ ├── main.36bbb0f4.chunk.js
│ ├── main.36bbb0f4.chunk.js.map
│ ├── runtime~main.229c360f.js
│ └── runtime~main.229c360f.js.map
└── media
├── her.37588412.png
├── me.e69004b8.png
└── us.f114bc8d.jpg
4 directories, 17 files
I want to make a relatively basic tool to browse a bucket in S3 as a file hierarchy rather than simply a list of filenames with slashes in them.
Currently, I am using boto to get the list of keynames in a bucket and then parsing the keynames to make a nested dictionary of the “folders” and files. However, that process takes so long! Even just going through each key to get a list of all higher level folders takes 15+ minutes.
How do tools such as cyberduck give a list of folders so quickly?
Check this link: http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
listObjects()
has a parameter called delimiter
, which could be set to /
and resulting list of objects will look exactly as a tree of files. I think this is what you’re looking for.
Using s3-tree might be helpful.
https://pypi.org/project/s3-tree/
Example:
$ s3-tree bucketname
bucketname
├── asset-manifest.json
├── favicon.ico
├── index.html
├── manifest.json
├── precache-manifest.e8c8442b93de34204de5f9b23fa0174b.js
├── service-worker.js
└── static
├── css
│ ├── main.43b5e879.chunk.css
│ └── main.43b5e879.chunk.css.map
├── js
│ ├── 1.f6579156.chunk.js
│ ├── 1.f6579156.chunk.js.map
│ ├── main.36bbb0f4.chunk.js
│ ├── main.36bbb0f4.chunk.js.map
│ ├── runtime~main.229c360f.js
│ └── runtime~main.229c360f.js.map
└── media
├── her.37588412.png
├── me.e69004b8.png
└── us.f114bc8d.jpg
4 directories, 17 files