Script to read parameters from file and mkdir and cp accordingly?

Question:

I have a bunch of files in directories with a file that includes important data like author and title.

/data/unorganised_texts/a-long-story

Many files in the directories, but most importantly each directory includes Data.yaml with contents like this:

Category:
  Name: Space
Author: Jôëlle Frankschiff
References:
  Title: Historical
  Title: Future
Title: A “long” story!

I need to match these lines as variables $category, $author, $title and make an appropriate structure and copy the directory like so:

/data/organised_texts/$category/$author/$title

Here is my attempt in bash, but probably going wrong in multiple places and as suggested would be better in python.

#!/bin/bash
for dir in /data/unorganised_texts/*/
while IFS= read -r line || [[ $category ]]; do
    [[ $category =~ “Category:” ]] && echo "$category" && mkdir /data/organised_texts/$category
[[ $author ]]; do
    [[ $author =~ “Author:” ]] && echo "$Author"
    [[ $title ]]; do
        [[ $title =~ “Title:” ]] && echo "$title" && mkdir /data/organised_texts/$category/$title && cp $dir/* /data/organised_texts/$category/$title/
done <"$dir/Data.yaml"

Here is my bash version, as I was experimenting with readarray and command eval and bash version was important:

ubuntu:~# bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

Thanks!

Asked By: jakethedog

||

Answers:

One bash idea:

unset cat auth title

while read -r label value
do
    case "${label}" in
        "Category:")  cat="${value}" ;;
        "Author:")    auth="${value}" ;;
        "Title:")     title="${value}" ;;
    esac

    if [[ -n "${cat}" && -n "${auth}" && -n "${title}" ]]
    then
        mkdir -p "${cat}/${auth}/${title}"
        # cp ...                                # OP can add the desired `cp` command at this point, or after breaking out of the `while` loop
        break
    fi
done < Data.yaml

NOTE: assumes none of the values include linefeeds

Results:

$ find . -type d
.
./Space
./Space/Jôëlle Frankschiff
./Space/Jôëlle Frankschiff/A “long” story!
Answered By: markp-fuso
  • It looks you have unmatched do-done pairs.
  • The expression [[ $varname ]] will cause a syntax error.
  • mkdir -p can create directories recursively at a time.

Then would you please try the following:

#!/bin/bash

shopt -s dotglob                                                # copy dotfiles in the directories as well
for dir in /data/unorganised_texts/*/; do
    while IFS= read -r line; do                                 # read a line of yaml file in "$dir"
        [[ $line =~ ^[[:space:]] ]] && continue                 # skip indented (starting with a space) lines
        read -r key val <<< "$line"                             # split on the 1st space into key and val
        val=${val////_}                                        # replace slash with underscore, just in case
        if [[ $key = "Category:" ]]; then category="$val"
        elif [[ $key = "Author:" ]]; then author="$val"
        elif [[ $key = "Title:" ]]; then title="$val"
        fi
    done < "$dir/Data.yaml"

    destdir="/data/organised_texts/$category/$author/$title"    # destination directory
    if [[ -d $destdir ]]; then                                  # check the duplication
        echo "$destdir already exists. skipped."
    else
        mkdir -p "$destdir"                                     # create the destination directory
        cp -a -- "$dir"/* "$destdir"                            # copy the contents to the destination
#       echo "/data/organised_texts/$category/$author/$title"   # remove "#" to see the progress
    fi
done
Answered By: tshiono

Since the OP was interested in a python solution…

First lets make some test dirs:

pushd /tmp
mkdir t
pushd t
mkdir a-long-story
vim a-long-story/Data.yml # fill in here, or cp.
mkdir irrelevant_dir
mkdir -p irrelevant_dir/subdir
touch notadir

Then a simple python script. Python doesn’t (yet) have an ibuilt yaml parser, so pip install pyyaml is needed before this:

from pathlib import Path
from shutil import copytree

from yaml import Loader, load  # pip install pyyaml


def parse_yaml(f: Path) -> dict:
    with f.open() as f:
        return load(f, Loader)


# ROOT = Path("/data/unorganised_texts")
ROOT = Path("/tmp/t")

for subdir in (d for d in ROOT.iterdir() if d.is_dir()):
    yamlf = subdir / "Data.yaml"
    if yamlf.is_file():
        print("Processing", yamlf)
        data = parse_yaml(yamlf)
        other_dirs = ROOT / data["Category"]["Name"] / data["Author"]
        other_dirs.mkdir(exist_ok=True, parents=True)
        outdir = other_dirs / data["Title"]
        if outdir.exists():
            print("skipping as would overwrite.")
        else:
            copytree(subdir, outdir)

This code probably doesn’t need any explanation even for someone new to python. But for completeness:

  • we import a stdlib class (Path) and fn (copytree)
  • we import a 3rd party fn (load) and class (Loader)
  • we define a function to parse yaml. This is probably redundant, but it does add a level of commentary, and lets us easily add more logic here if required later.
  • ROOT.iterdir() yields up all the dirs at one level in ROOT. We filter these with a generator comprehension to strip out bare files.
  • if we find a the yaml we’re expecting we make the outdirs, and then if we’re not going to overwrite, copy our current directory into the output.

There is nothing remotely wrong with doing this in bash. These days I’d have written this python version instead, because a. I know python much better than my very rusty bash, and b. it solves the problem ‘properly’ (e.g. we parse the YAML with a yaml parser), which sometimes makes things more robust.

Note btw that the type hints are optional and ignored at runtime.

Answered By: 2e0byo