Script to read parameters from file and mkdir and cp accordingly?
Question:
I have a bunch of files in directories with a file that includes important data like author and title.
/data/unorganised_texts/a-long-story
Many files in the directories, but most importantly each directory includes Data.yaml
with contents like this:
Category:
Name: Space
Author: Jôëlle Frankschiff
References:
Title: Historical
Title: Future
Title: A “long” story!
I need to match these lines as variables $category, $author, $title and make an appropriate structure and copy the directory like so:
/data/organised_texts/$category/$author/$title
Here is my attempt in bash, but probably going wrong in multiple places and as suggested would be better in python.
#!/bin/bash
for dir in /data/unorganised_texts/*/
while IFS= read -r line || [[ $category ]]; do
[[ $category =~ “Category:” ]] && echo "$category" && mkdir /data/organised_texts/$category
[[ $author ]]; do
[[ $author =~ “Author:” ]] && echo "$Author"
[[ $title ]]; do
[[ $title =~ “Title:” ]] && echo "$title" && mkdir /data/organised_texts/$category/$title && cp $dir/* /data/organised_texts/$category/$title/
done <"$dir/Data.yaml"
Here is my bash version, as I was experimenting with readarray
and command eval
and bash version was important:
ubuntu:~# bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Thanks!
Answers:
One bash
idea:
unset cat auth title
while read -r label value
do
case "${label}" in
"Category:") cat="${value}" ;;
"Author:") auth="${value}" ;;
"Title:") title="${value}" ;;
esac
if [[ -n "${cat}" && -n "${auth}" && -n "${title}" ]]
then
mkdir -p "${cat}/${auth}/${title}"
# cp ... # OP can add the desired `cp` command at this point, or after breaking out of the `while` loop
break
fi
done < Data.yaml
NOTE: assumes none of the values include linefeeds
Results:
$ find . -type d
.
./Space
./Space/Jôëlle Frankschiff
./Space/Jôëlle Frankschiff/A “long” story!
- It looks you have unmatched do-done pairs.
- The expression
[[ $varname ]]
will cause a syntax error.
mkdir -p
can create directories recursively at a time.
Then would you please try the following:
#!/bin/bash
shopt -s dotglob # copy dotfiles in the directories as well
for dir in /data/unorganised_texts/*/; do
while IFS= read -r line; do # read a line of yaml file in "$dir"
[[ $line =~ ^[[:space:]] ]] && continue # skip indented (starting with a space) lines
read -r key val <<< "$line" # split on the 1st space into key and val
val=${val////_} # replace slash with underscore, just in case
if [[ $key = "Category:" ]]; then category="$val"
elif [[ $key = "Author:" ]]; then author="$val"
elif [[ $key = "Title:" ]]; then title="$val"
fi
done < "$dir/Data.yaml"
destdir="/data/organised_texts/$category/$author/$title" # destination directory
if [[ -d $destdir ]]; then # check the duplication
echo "$destdir already exists. skipped."
else
mkdir -p "$destdir" # create the destination directory
cp -a -- "$dir"/* "$destdir" # copy the contents to the destination
# echo "/data/organised_texts/$category/$author/$title" # remove "#" to see the progress
fi
done
Since the OP was interested in a python solution…
First lets make some test dirs:
pushd /tmp
mkdir t
pushd t
mkdir a-long-story
vim a-long-story/Data.yml # fill in here, or cp.
mkdir irrelevant_dir
mkdir -p irrelevant_dir/subdir
touch notadir
Then a simple python script. Python doesn’t (yet) have an ibuilt yaml parser, so pip install pyyaml
is needed before this:
from pathlib import Path
from shutil import copytree
from yaml import Loader, load # pip install pyyaml
def parse_yaml(f: Path) -> dict:
with f.open() as f:
return load(f, Loader)
# ROOT = Path("/data/unorganised_texts")
ROOT = Path("/tmp/t")
for subdir in (d for d in ROOT.iterdir() if d.is_dir()):
yamlf = subdir / "Data.yaml"
if yamlf.is_file():
print("Processing", yamlf)
data = parse_yaml(yamlf)
other_dirs = ROOT / data["Category"]["Name"] / data["Author"]
other_dirs.mkdir(exist_ok=True, parents=True)
outdir = other_dirs / data["Title"]
if outdir.exists():
print("skipping as would overwrite.")
else:
copytree(subdir, outdir)
This code probably doesn’t need any explanation even for someone new to python. But for completeness:
- we import a stdlib class (Path) and fn (copytree)
- we import a 3rd party fn (load) and class (Loader)
- we define a function to parse yaml. This is probably redundant, but it does add a level of commentary, and lets us easily add more logic here if required later.
ROOT.iterdir()
yields up all the dirs at one level in ROOT
. We filter these with a generator comprehension to strip out bare files.
- if we find a the yaml we’re expecting we make the outdirs, and then if we’re not going to overwrite, copy our current directory into the output.
There is nothing remotely wrong with doing this in bash. These days I’d have written this python version instead, because a. I know python much better than my very rusty bash, and b. it solves the problem ‘properly’ (e.g. we parse the YAML with a yaml parser), which sometimes makes things more robust.
Note btw that the type hints are optional and ignored at runtime.
I have a bunch of files in directories with a file that includes important data like author and title.
/data/unorganised_texts/a-long-story
Many files in the directories, but most importantly each directory includes Data.yaml
with contents like this:
Category:
Name: Space
Author: Jôëlle Frankschiff
References:
Title: Historical
Title: Future
Title: A “long” story!
I need to match these lines as variables $category, $author, $title and make an appropriate structure and copy the directory like so:
/data/organised_texts/$category/$author/$title
Here is my attempt in bash, but probably going wrong in multiple places and as suggested would be better in python.
#!/bin/bash
for dir in /data/unorganised_texts/*/
while IFS= read -r line || [[ $category ]]; do
[[ $category =~ “Category:” ]] && echo "$category" && mkdir /data/organised_texts/$category
[[ $author ]]; do
[[ $author =~ “Author:” ]] && echo "$Author"
[[ $title ]]; do
[[ $title =~ “Title:” ]] && echo "$title" && mkdir /data/organised_texts/$category/$title && cp $dir/* /data/organised_texts/$category/$title/
done <"$dir/Data.yaml"
Here is my bash version, as I was experimenting with readarray
and command eval
and bash version was important:
ubuntu:~# bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Thanks!
One bash
idea:
unset cat auth title
while read -r label value
do
case "${label}" in
"Category:") cat="${value}" ;;
"Author:") auth="${value}" ;;
"Title:") title="${value}" ;;
esac
if [[ -n "${cat}" && -n "${auth}" && -n "${title}" ]]
then
mkdir -p "${cat}/${auth}/${title}"
# cp ... # OP can add the desired `cp` command at this point, or after breaking out of the `while` loop
break
fi
done < Data.yaml
NOTE: assumes none of the values include linefeeds
Results:
$ find . -type d
.
./Space
./Space/Jôëlle Frankschiff
./Space/Jôëlle Frankschiff/A “long” story!
- It looks you have unmatched do-done pairs.
- The expression
[[ $varname ]]
will cause a syntax error. mkdir -p
can create directories recursively at a time.
Then would you please try the following:
#!/bin/bash
shopt -s dotglob # copy dotfiles in the directories as well
for dir in /data/unorganised_texts/*/; do
while IFS= read -r line; do # read a line of yaml file in "$dir"
[[ $line =~ ^[[:space:]] ]] && continue # skip indented (starting with a space) lines
read -r key val <<< "$line" # split on the 1st space into key and val
val=${val////_} # replace slash with underscore, just in case
if [[ $key = "Category:" ]]; then category="$val"
elif [[ $key = "Author:" ]]; then author="$val"
elif [[ $key = "Title:" ]]; then title="$val"
fi
done < "$dir/Data.yaml"
destdir="/data/organised_texts/$category/$author/$title" # destination directory
if [[ -d $destdir ]]; then # check the duplication
echo "$destdir already exists. skipped."
else
mkdir -p "$destdir" # create the destination directory
cp -a -- "$dir"/* "$destdir" # copy the contents to the destination
# echo "/data/organised_texts/$category/$author/$title" # remove "#" to see the progress
fi
done
Since the OP was interested in a python solution…
First lets make some test dirs:
pushd /tmp
mkdir t
pushd t
mkdir a-long-story
vim a-long-story/Data.yml # fill in here, or cp.
mkdir irrelevant_dir
mkdir -p irrelevant_dir/subdir
touch notadir
Then a simple python script. Python doesn’t (yet) have an ibuilt yaml parser, so pip install pyyaml
is needed before this:
from pathlib import Path
from shutil import copytree
from yaml import Loader, load # pip install pyyaml
def parse_yaml(f: Path) -> dict:
with f.open() as f:
return load(f, Loader)
# ROOT = Path("/data/unorganised_texts")
ROOT = Path("/tmp/t")
for subdir in (d for d in ROOT.iterdir() if d.is_dir()):
yamlf = subdir / "Data.yaml"
if yamlf.is_file():
print("Processing", yamlf)
data = parse_yaml(yamlf)
other_dirs = ROOT / data["Category"]["Name"] / data["Author"]
other_dirs.mkdir(exist_ok=True, parents=True)
outdir = other_dirs / data["Title"]
if outdir.exists():
print("skipping as would overwrite.")
else:
copytree(subdir, outdir)
This code probably doesn’t need any explanation even for someone new to python. But for completeness:
- we import a stdlib class (Path) and fn (copytree)
- we import a 3rd party fn (load) and class (Loader)
- we define a function to parse yaml. This is probably redundant, but it does add a level of commentary, and lets us easily add more logic here if required later.
ROOT.iterdir()
yields up all the dirs at one level inROOT
. We filter these with a generator comprehension to strip out bare files.- if we find a the yaml we’re expecting we make the outdirs, and then if we’re not going to overwrite, copy our current directory into the output.
There is nothing remotely wrong with doing this in bash. These days I’d have written this python version instead, because a. I know python much better than my very rusty bash, and b. it solves the problem ‘properly’ (e.g. we parse the YAML with a yaml parser), which sometimes makes things more robust.
Note btw that the type hints are optional and ignored at runtime.