Python Yaml: Iterating through text values

Question:

I have this docker-compose.yaml file. I want to iterate through parameters that can be customized by a user. Meaning that I’m only interested in Values, specifically text values.

Examples of such values would be:

  1. debug,disable-ssl-client-postgres found under the services: callback: environment: SPRING_PROFILES_ACTIVE: key
  2. ./ found under the services: callback: build: context: key
  3. - postgres_volume:/data/postgres under the services: postgres: volumes: key
services:
  callback:
    build:
      context: ./
      dockerfile: ./services/callback/Dockerfile
    depends_on:
      - postgres
      - efgs-fake
    restart: unless-stopped
    ports:
      - "8010:8080"
    environment:
      SPRING_PROFILES_ACTIVE: debug,disable-ssl-client-postgres
      POSTGRESQL_SERVICE_PORT: '5432'
      POSTGRESQL_SERVICE_HOST: postgres
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_PASSWORD_CALLBACK: ${POSTGRES_CALLBACK_PASSWORD}
      POSTGRESQL_USER_CALLBACK: ${POSTGRES_CALLBACK_USER}
      POSTGRESQL_PASSWORD_FLYWAY: ${POSTGRES_FLYWAY_PASSWORD}
      POSTGRESQL_USER_FLYWAY: ${POSTGRES_FLYWAY_USER}
      SSL_CALLBACK_KEYSTORE_PATH: file:/secrets/ssl.p12
      SSL_CALLBACK_KEYSTORE_PASSWORD: 123456
      SSL_FEDERATION_TRUSTSTORE_PATH: file:/secrets/contains_efgs_truststore.jks
      SSL_FEDERATION_TRUSTSTORE_PASSWORD: 123456
      FEDERATION_GATEWAY_KEYSTORE_PATH: file:/secrets/ssl.p12
      FEDERATION_GATEWAY_KEYSTORE_PASS: 123456
      FEDERATION_GATEWAY_BASE_URL: https://efgs-fake:8014
      # for local testing: FEDERATION_GATEWAY_BASE_URL: https://host.docker.internal:8014
    volumes:
      - ./docker-compose-test-secrets:/secrets
  submission:
    build:
      context: ./
      dockerfile: ./services/submission/Dockerfile
    depends_on:
      - postgres
      - verification-fake
    ports:
      - "8000:8080"
      - "8006:8081"
    environment:
      SPRING_PROFILES_ACTIVE: debug,disable-ssl-client-postgres
      POSTGRESQL_SERVICE_PORT: '5432'
      POSTGRESQL_SERVICE_HOST: postgres
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_PASSWORD_SUBMISION: ${POSTGRES_SUBMISSION_PASSWORD}
      POSTGRESQL_USER_SUBMISION: ${POSTGRES_SUBMISSION_USER}
      POSTGRESQL_PASSWORD_FLYWAY: ${POSTGRES_FLYWAY_PASSWORD}
      POSTGRESQL_USER_FLYWAY: ${POSTGRES_FLYWAY_USER}
      VERIFICATION_BASE_URL: http://verification-fake:8004
      SUPPORTED_COUNTRIES: DE,FR
      SSL_SUBMISSION_KEYSTORE_PATH: file:/secrets/ssl.p12
      SSL_SUBMISSION_KEYSTORE_PASSWORD: 123456
      SSL_VERIFICATION_TRUSTSTORE_PATH: file:/secrets/contains_efgs_truststore.jks
      SSL_VERIFICATION_TRUSTSTORE_PASSWORD: 123456
    volumes:
      - ./docker-compose-test-secrets:/secrets
  distribution:
    build:
      context: ./
      dockerfile: ./services/distribution/Dockerfile
    depends_on:
     - postgres
     - objectstore
     - create-bucket
    environment:
      SUPPORTED_COUNTRIES: DE,FR
      SPRING_PROFILES_ACTIVE: debug,signature-dev,testdata,disable-ssl-client-postgres,local-json-stats
      POSTGRESQL_SERVICE_PORT: '5432'
      POSTGRESQL_SERVICE_HOST: postgres
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_PASSWORD_DISTRIBUTION: ${POSTGRES_DISTRIBUTION_PASSWORD}
      POSTGRESQL_USER_DISTRIBUTION: ${POSTGRES_DISTRIBUTION_USER}
      POSTGRESQL_PASSWORD_FLYWAY: ${POSTGRES_FLYWAY_PASSWORD}
      POSTGRESQL_USER_FLYWAY: ${POSTGRES_FLYWAY_USER}
      # Settings for the S3 compatible objectstore
      CWA_OBJECTSTORE_ACCESSKEY: ${OBJECTSTORE_ACCESSKEY}
      CWA_OBJECTSTORE_SECRETKEY: ${OBJECTSTORE_SECRETKEY}
      CWA_OBJECTSTORE_ENDPOINT: http://objectstore
      CWA_OBJECTSTORE_BUCKET: cwa
      CWA_OBJECTSTORE_PORT: 8000
      services.distribution.paths.output: /tmp/distribution
      # Settings for cryptographic artifacts
      VAULT_FILESIGNING_SECRET: ${SECRET_PRIVATE}
      FORCE_UPDATE_KEYFILES: 'false'
      STATISTICS_FILE_ACCESS_KEY_ID: fakeAccessKey
      STATISTICS_FILE_SECRET_ACCESS_KEY: secretKey
      STATISTICS_FILE_S3_ENDPOINT: https://localhost
      DSC_TRUST_STORE: /secrets/dsc_truststore
      DCC_TRUST_STORE: /secrets/dcc_truststore
    volumes:
      - ./docker-compose-test-secrets:/secrets
  download:
    build:
      context: ./
      dockerfile: ./services/download/Dockerfile
    depends_on:
      - postgres
    ports:
      - "8011:8080"
    environment:
      SPRING_PROFILES_ACTIVE: debug,disable-ssl-server,disable-ssl-client-postgres,disable-ssl-client-verification,disable-ssl-client-verification-verify-hostname,disable-ssl-efgs-verification
      POSTGRESQL_SERVICE_PORT: '5432'
      POSTGRESQL_SERVICE_HOST: postgres
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_PASSWORD_CALLBACK: ${POSTGRES_CALLBACK_PASSWORD}
      POSTGRESQL_USER_CALLBACK: ${POSTGRES_CALLBACK_USER}
      POSTGRESQL_PASSWORD_FLYWAY: ${POSTGRES_FLYWAY_PASSWORD}
      POSTGRESQL_USER_FLYWAY: ${POSTGRES_FLYWAY_USER}
      FEDERATION_GATEWAY_KEYSTORE_PATH: file:/secrets/ssl.p12
      FEDERATION_GATEWAY_KEYSTORE_PASS: 123456
      SSL_FEDERATION_TRUSTSTORE_PATH: file:/secrets/contains_efgs_truststore.jks
      SSL_FEDERATION_TRUSTSTORE_PASSWORD: 123456
    volumes:
      - ./docker-compose-test-secrets:/secrets
  upload:
    build:
      context: ./
      dockerfile: ./services/upload/Dockerfile
    depends_on:
      - postgres
    ports:
      - "8012:8080"
    environment:
      SPRING_PROFILES_ACTIVE: disable-ssl-client-postgres, connect-efgs
      POSTGRESQL_SERVICE_PORT: '5432'
      POSTGRESQL_SERVICE_HOST: postgres
      POSTGRESQL_DATABASE: ${POSTGRES_DB}
      POSTGRESQL_PASSWORD_FLYWAY: ${POSTGRES_FLYWAY_PASSWORD}
      POSTGRESQL_USER_FLYWAY: ${POSTGRES_FLYWAY_USER}
      VAULT_EFGS_BATCHIGNING_SECRET: ${SECRET_PRIVATE}
      VAULT_EFGS_BATCHIGNING_CERTIFICATE: file:/secrets/efgs_signing_cert.pem
      SSL_FEDERATION_TRUSTSTORE_PATH: file:/secrets/contains_efgs_truststore.jks
      SSL_FEDERATION_TRUSTSTORE_PASSWORD: 123456
      FEDERATION_GATEWAY_KEYSTORE_PATH: file:/secrets/ssl.p12
      FEDERATION_GATEWAY_KEYSTORE_PASS: 123456
    volumes:
      - ./docker-compose-test-secrets:/secrets
  postgres:
    image: postgres:11.5
    restart: always
    ports:
      - "8001:5432"
    environment:
      PGDATA: /data/postgres
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_volume:/data/postgres
      - ./setup/setup-roles.sql:/docker-entrypoint-initdb.d/1-roles.sql
      - ./local-setup/create-users.sql:/docker-entrypoint-initdb.d/2-users.sql
      - ./local-setup/enable-test-data-docker-compose.sql:/docker-entrypoint-initdb.d/3-enable-testdata.sql
  pgadmin:
    container_name: pgadmin_container
    image: dpage/pgadmin4
    volumes:
       - pgadmin_volume:/root/.pgadmin
    ports:
      - "8002:80"
    restart: unless-stopped
    depends_on:
      - postgres
    environment:
      PGADMIN_DEFAULT_EMAIL: ${PGADMIN_DEFAULT_EMAIL}
      PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_DEFAULT_PASSWORD}
  objectstore:
    image: "zenko/cloudserver"
    restart: always
    volumes:
      - objectstore_volume:/data
    ports:
      - "8003:8000"
    environment:
      ENDPOINT: objectstore
      REMOTE_MANAGEMENT_DISABLE: 1
      SCALITY_ACCESS_KEY_ID: ${OBJECTSTORE_ACCESSKEY}
      SCALITY_SECRET_ACCESS_KEY: ${OBJECTSTORE_SECRETKEY}
  create-bucket:
    image: amazon/aws-cli
    environment:
      - AWS_ACCESS_KEY_ID=${OBJECTSTORE_ACCESSKEY}
      - AWS_SECRET_ACCESS_KEY=${OBJECTSTORE_SECRETKEY}
    entrypoint: [ "/root/scripts/wait-for-it/wait-for-it.sh", "objectstore:8000", "-t", "30", "--" ]
    volumes:
      - ./scripts/wait-for-it:/root/scripts/wait-for-it
    command: aws s3api create-bucket --bucket cwa --endpoint-url http://objectstore:8000 --acl public-read
    depends_on:
      - objectstore
  verification-fake:
    image: roesslerj/cwa-verification-fake:0.0.5
    restart: unless-stopped
    ports:
      - "8004:8004"
  efgs-fake:
    image: roesslerj/cwa-efgs-fake:0.0.5
    restart: unless-stopped
    ports:
      - "8014:8014"
volumes:
  postgres_volume:
  pgadmin_volume:
  objectstore_volume:

Does any existing framework like PyYaml allow me to print/iterate through values in such a way without specifying the keys? Is it possible with a Python list that PyYaml generates?

Motivation:
The idea is to print/count only the parameters that can be modified by a user. With a successful implementation I want to be able to say
"X parameters can be modified in this docker-compose.yaml" to then compare with other docker-compose.yaml files

EDIT:

Unfortunately simply iterating through keys doesn’t work, because keys are nested.

    for key,value in data.items():
        print(key, " : ", value)

simply prints out:

version  :  3
services  :  {'callback': {'build': {'context': './', 'dockerfile': './se...
volumes  :  {'postgres_volume': None, 'pgadmin_volume': None, 'objectstore_volume': None}

The key services contains further keys, such as callback, submission, etc. When I iterate over keys in the dictionary generated from my .yaml, I only can print three "root" keys (version, services, volumes) and their values which are child trees. It doesn’t deliver the result I’m looking for.

Asked By: vrubayka

||

Answers:

You can use yaml.parse to iterate over YAML events that are generated from the input. You can then filter out mapping keys, which are usually ScalarEvents, and output the other scalar events. If you need it, you can also track the path to the value.

This generator function implements such iteration; mind that it simplifies a few things, for example it can’t process aliases or complex mapping keys:

import yaml, sys, re
from yaml import ScalarEvent, SequenceStartEvent, SequenceEndEvent, MappingStartEvent, MappingEndEvent

def yaml_values(data):
    path = []
    state = []
    for event in yaml.parse(data):
      if isinstance(event, ScalarEvent):
        if state[-1] == "MapKey":
          path.append(event.value)
        else:
          yield (path, event.value)
      elif isinstance(event, SequenceStartEvent):
        state.append("Seq")
        path.append(0)
        continue
      elif isinstance(event, MappingStartEvent):
        state.append("MapKey")
        continue
      elif isinstance(event, SequenceEndEvent):
        state.pop()
        path.pop()
      elif isinstance(event, MappingEndEvent):
        state.pop()
      if len(state) > 0:
        if state[-1] == "MapKey":
          state[-1] = "MapVal"
        elif state[-1] == "MapVal":
          state[-1] = "MapKey"
          path.pop()
        elif state[-1] == "Seq":
          path[-1] += 1

Usage:

input = """
services:
  callback:
    build:
      context: ./
      dockerfile: ./services/callback/Dockerfile
    depends_on:
      - postgres
      - efgs-fake
    restart: unless-stopped
    ports:
      - "8010:8080"
"""

for (path, value) in yaml_values(input):
    print(path, value)

Output:

['services', 'callback', 'build', 'context'] ./
['services', 'callback', 'build', 'dockerfile'] ./services/callback/Dockerfile
['services', 'callback', 'depends_on', 0] postgres
['services', 'callback', 'depends_on', 1] efgs-fake
['services', 'callback', 'restart'] unless-stopped
['services', 'callback', 'ports', 0] 8010:8080
Answered By: flyx

Here’s a different approach for the case when you want YAML to load everything before iterating. This solution is more general since it applies to any dict/list based nested structure, regardless of whether it was loaded from YAML.

This approach is a bit slower, which should be neglectible in a typical Python application. You will get values according to YAML type guessing, so true will be a bool, 42 will be int etc., while with the other approach you’ll get every value as string.

import yaml, functools

def gen_values(path, data):
  if isinstance(data, list):
    return functools.reduce(lambda a, b: a + gen_values(path + [b[0]], b[1]), enumerate(data), [])
  elif isinstance(data, dict):
    return functools.reduce(lambda a, b: a + gen_values(path + [b[0]], b[1]), data.items(), [])
  else:
    return [(path, data)]

def yaml_values(data):
    return gen_values([], yaml.safe_load(data))

Usage identical to the other solution:

input = """
services:
  callback:
    build:
      context: ./
      dockerfile: ./services/callback/Dockerfile
    depends_on:
      - postgres
      - efgs-fake
    restart: unless-stopped
    ports:
      - "8010:8080"
"""

for (path, value) in yaml_values(input):
    print(path, value)

Output:

['services', 'callback', 'build', 'context'] ./
['services', 'callback', 'build', 'dockerfile'] ./services/callback/Dockerfile
['services', 'callback', 'depends_on', 0] postgres
['services', 'callback', 'depends_on', 1] efgs-fake
['services', 'callback', 'restart'] unless-stopped
['services', 'callback', 'ports', 0] 8010:8080
Answered By: flyx
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.