How to exclude one or more file types when creating a Quilt data package from a large directory of pre-existing files

Question:

If I’m creating a Quilt data package programmatically via the command line interface (CLI) or Python library for a directory with thousands of different files, and I want to exclude one or more specific unwanted file types from the data package (such as .txt or .DS_Store), how do I do that?

Asked By: tatlar

||

Answers:

[Disclaimer: I currently work at Quilt Data]

Create a .quiltignore file in a similar way that you would use a .gitignore file when using Git to version control your codebase. .quiltignore is a special file which, when included in a directory, can be used to filter out files that are included when you call quilt3.Package.set_dir from inside that directory. Here’s a simple example:

> ls -a
.DS_Store         foo.txt         bar.txt         image1.tiff         image2.tiff
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
 └─.DS_Store
 └─foo.txt
 └─bar.txt
 └─image1.tiff
 └─image2.tiff
> echo .DS_Store >> .quiltignore
> echo '*.txt' >> .quiltignore
> ls -a
.DS_Store         foo.txt         bar.txt         image1.tiff         image2.tiff
.quiltignore
> python -c "import quilt3; print(quilt3.Package().set_dir('/', './'))"
(local Package)
 └─image1.tiff
 └─image2.tiff
 └─.quiltignore

This can be used to keep non-data files or hidden OS-level files in the directory out of the data package. This is very useful when, for example, your data and your code live in the same directory [Reference].

The .quiltignore syntax is exactly the same as that of the familiar .gitignore. Refer to the git documentation for instructions on how to use it.

Answered By: tatlar