How to exclude one or more file types when creating a Quilt data package from a large directory of pre-existing files
Question:
If I’m creating a Quilt data package programmatically via the command line interface (CLI) or Python library for a directory with thousands of different files, and I want to exclude one or more specific unwanted file types from the data package (such as .txt
or .DS_Store
), how do I do that?
Answers:
[Disclaimer: I currently work at Quilt Data]
Create a .quiltignore
file in a similar way that you would use a .gitignore
file when using Git to version control your codebase. .quiltignore
is a special file which, when included in a directory, can be used to filter out files that are included when you call quilt3.Package.set_dir
from inside that directory. Here’s a simple example:
> ls -a
.DS_Store foo.txt bar.txt image1.tiff image2.tiff
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
└─.DS_Store
└─foo.txt
└─bar.txt
└─image1.tiff
└─image2.tiff
> echo .DS_Store >> .quiltignore
> echo '*.txt' >> .quiltignore
> ls -a
.DS_Store foo.txt bar.txt image1.tiff image2.tiff
.quiltignore
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
└─image1.tiff
└─image2.tiff
└─.quiltignore
This can be used to keep non-data files or hidden OS-level files in the directory out of the data package. This is very useful when, for example, your data and your code live in the same directory [Reference].
The .quiltignore
syntax is exactly the same as that of the familiar .gitignore
. Refer to the git documentation for instructions on how to use it.
If I’m creating a Quilt data package programmatically via the command line interface (CLI) or Python library for a directory with thousands of different files, and I want to exclude one or more specific unwanted file types from the data package (such as .txt
or .DS_Store
), how do I do that?
[Disclaimer: I currently work at Quilt Data]
Create a .quiltignore
file in a similar way that you would use a .gitignore
file when using Git to version control your codebase. .quiltignore
is a special file which, when included in a directory, can be used to filter out files that are included when you call quilt3.Package.set_dir
from inside that directory. Here’s a simple example:
> ls -a
.DS_Store foo.txt bar.txt image1.tiff image2.tiff
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
└─.DS_Store
└─foo.txt
└─bar.txt
└─image1.tiff
└─image2.tiff
> echo .DS_Store >> .quiltignore
> echo '*.txt' >> .quiltignore
> ls -a
.DS_Store foo.txt bar.txt image1.tiff image2.tiff
.quiltignore
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
└─image1.tiff
└─image2.tiff
└─.quiltignore
This can be used to keep non-data files or hidden OS-level files in the directory out of the data package. This is very useful when, for example, your data and your code live in the same directory [Reference].
The .quiltignore
syntax is exactly the same as that of the familiar .gitignore
. Refer to the git documentation for instructions on how to use it.