How can I parse a YAML file in Python

Question:

How can I parse a YAML file in Python?

Asked By: Szymon Lipiński

||

Answers:

The easiest and purest method without relying on C headers is PyYaml (documentation), which can be installed via pip install pyyaml:

#!/usr/bin/env python

import yaml

with open("example.yaml", "r") as stream:
    try:
        print(yaml.safe_load(stream))
    except yaml.YAMLError as exc:
        print(exc)

And that’s it. A plain yaml.load() function also exists, but yaml.safe_load() should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization use safe_load.

Note the PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.

Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View synk of oyaml here

Answered By: Jon

If you have YAML that conforms to the YAML 1.2 specification (released 2009) then you should use ruamel.yaml (disclaimer: I am the author of that package).
It is essentially a superset of PyYAML, which supports most of YAML 1.1 (from 2005).

If you want to be able to preserve your comments when round-tripping, you certainly should use ruamel.yaml.

Upgrading @Jon’s example is easy:

import ruamel.yaml as yaml

with open("example.yaml") as stream:
    try:
        print(yaml.safe_load(stream))
    except yaml.YAMLError as exc:
        print(exc)

Use safe_load() unless you really have full control over the input, need it (seldom the case) and know what you are doing.

If you are using pathlib Path for manipulating files, you are better of using the new API ruamel.yaml provides:

from ruamel.yaml import YAML
from pathlib import Path

path = Path('example.yaml')
yaml = YAML(typ='safe')
data = yaml.load(path)
Answered By: Anthon

Read & Write YAML files with Python 2+3 (and unicode)

# -*- coding: utf-8 -*-
import yaml
import io

# Define data
data = {
    'a list': [
        1, 
        42, 
        3.141, 
        1337, 
        'help', 
        u'€'
    ],
    'a string': 'bla',
    'another dict': {
        'foo': 'bar',
        'key': 'value',
        'the answer': 42
    }
}

# Write YAML file
with io.open('data.yaml', 'w', encoding='utf8') as outfile:
    yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)

# Read YAML file
with open("data.yaml", 'r') as stream:
    data_loaded = yaml.safe_load(stream)

print(data == data_loaded)

Created YAML file

a list:
- 1
- 42
- 3.141
- 1337
- help
- €
a string: bla
another dict:
  foo: bar
  key: value
  the answer: 42

Common file endings

.yml and .yaml

Alternatives

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Answered By: Martin Thoma
#!/usr/bin/env python

import sys
import yaml

def main(argv):

    with open(argv[0]) as stream:
        try:
            #print(yaml.load(stream))
            return 0
        except yaml.YAMLError as exc:
            print(exc)
            return 1

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
Answered By: Wojciech Sciesinski

First install pyyaml using pip3.

Then import yaml module and load the file into a dictionary called ‘my_dict’:

import yaml
with open('filename.yaml') as f:
    my_dict = yaml.safe_load(f)

That’s all you need. Now the entire yaml file is in ‘my_dict’ dictionary.

Answered By: Pal

I use ruamel.yaml. Details & debate here.

from ruamel import yaml

with open(filename, 'r') as fp:
    read_data = yaml.load(fp)

Usage of ruamel.yaml is compatible (with some simple solvable problems) with old usages of PyYAML and as it is stated in link I provided, use

from ruamel import yaml

instead of

import yaml

and it will fix most of your problems.

EDIT: PyYAML is not dead as it turns out, it’s just maintained in a different place.

Answered By: Oleksandr Zelentsov

Example:


defaults.yaml

url: https://www.google.com

environment.py

from ruamel import yaml

data = yaml.safe_load(open('defaults.yaml'))
data['url']
Answered By: Prashanth Sams

To access any element of a list in a YAML file like this:

global:
  registry:
    url: dtr-:5000/
    repoPath:
  dbConnectionString: jdbc:oracle:thin:@x.x.x.x:1521:abcd

You can use following python script:

import yaml

with open("/some/path/to/yaml.file", 'r') as f:
    valuesYaml = yaml.load(f, Loader=yaml.FullLoader)

print(valuesYaml['global']['dbConnectionString'])
Answered By: rinkush sharda

read_yaml_file function returning all data into a dictionary.

def read_yaml_file(full_path=None, relative_path=None):
    if relative_path is not None:
        resource_file_location_local = ProjectPaths.get_project_root_path() + relative_path
    else:
        resource_file_location_local = full_path

    with open(resource_file_location_local, 'r') as stream:
        try:
            file_artifacts = yaml.safe_load(stream)
        except yaml.YAMLError as exc:
            print(exc)
    return dict(file_artifacts.items())
Answered By: Arpan Saini

Suggestion: Use yq (available via pip)

I’m Not sure how it wasn’t suggested before, but I would highly recommend using yq which is a jq wrapper for YAML.

yq uses jq like syntax but works with yaml files as well as json.


Examples:

1 ) Read a value:

yq e '.a.b[0].c' file.yaml

2 ) Pipe from STDIN:

cat file.yaml | yq e '.a.b[0].c' -

3 ) Update a yaml file, inplace

yq e -i '.a.b[0].c = "cool"' file.yaml

4 ) Update using environment variables:

NAME=mike yq e -i '.a.b[0].c = strenv(NAME)' file.yaml

5 ) Merge multiple files:

yq ea '. as $item ireduce ({}; . * $item )' path/to/*.yml

6 ) Multiple updates to a yaml file:

yq e -i '
  .a.b[0].c = "cool" |
  .x.y.z = "foobar" |
  .person.name = strenv(NAME)
' file.yaml

(*) Read more on how to parse fields from yaml with based on jq filters.


Additional references:

https://github.com/mikefarah/yq/#install

https://github.com/kislyuk/yq

Answered By: RtmY

I made my own script for this. Feel free to use it, as long as you keep the attribution. The script can parse yaml from a file (function load), parse yaml from a string (function loads) and convert a dictionary into yaml (function dumps). It respects all variable types.

# © didlly AGPL-3.0 License - github.com/didlly

def is_float(string: str) -> bool:
    try:
        float(string)
        return True
    except ValueError:
        return False


def is_integer(string: str) -> bool:
    try:
        int(string)
        return True
    except ValueError:
        return False


def load(path: str) -> dict:
    with open(path, "r") as yaml:
        levels = []
        data = {}
        indentation_str = ""

        for line in yaml.readlines():
            if line.replace(line.lstrip(), "") != "" and indentation_str == "":
                indentation_str = line.replace(line.lstrip(), "").rstrip("n")
            if line.strip() == "":
                continue
            elif line.rstrip()[-1] == ":":
                key = line.strip()[:-1]
                quoteless = (
                    is_float(key)
                    or is_integer(key)
                    or key == "True"
                    or key == "False"
                    or ("[" in key and "]" in key)
                )

                if len(line.replace(line.strip(), "")) // 2 < len(levels):
                    if quoteless:
                        levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
                    else:
                        levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
                else:
                    if quoteless:
                        levels.append(f"[{line.strip()[:-1]}]")
                    else:
                        levels.append(f"['{line.strip()[:-1]}']")
                if quoteless:
                    exec(
                        f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
                        + " = {}"
                    )
                else:
                    exec(
                        f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
                        + " = {}"
                    )

                continue

            key = line.split(":")[0].strip()
            value = ":".join(line.split(":")[1:]).strip()

            if (
                is_float(value)
                or is_integer(value)
                or value == "True"
                or value == "False"
                or ("[" in value and "]" in value)
            ):
                if (
                    is_float(key)
                    or is_integer(key)
                    or key == "True"
                    or key == "False"
                    or ("[" in key and "]" in key)
                ):
                    exec(
                        f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
                    )
                else:
                    exec(
                        f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
                    )
            else:
                if (
                    is_float(key)
                    or is_integer(key)
                    or key == "True"
                    or key == "False"
                    or ("[" in key and "]" in key)
                ):
                    exec(
                        f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
                    )
                else:
                    exec(
                        f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
                    )
    return data


def loads(yaml: str) -> dict:
    levels = []
    data = {}
    indentation_str = ""

    for line in yaml.split("n"):
        if line.replace(line.lstrip(), "") != "" and indentation_str == "":
            indentation_str = line.replace(line.lstrip(), "")
        if line.strip() == "":
            continue
        elif line.rstrip()[-1] == ":":
            key = line.strip()[:-1]
            quoteless = (
                is_float(key)
                or is_integer(key)
                or key == "True"
                or key == "False"
                or ("[" in key and "]" in key)
            )

            if len(line.replace(line.strip(), "")) // 2 < len(levels):
                if quoteless:
                    levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
                else:
                    levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
            else:
                if quoteless:
                    levels.append(f"[{line.strip()[:-1]}]")
                else:
                    levels.append(f"['{line.strip()[:-1]}']")
            if quoteless:
                exec(
                    f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
                    + " = {}"
                )
            else:
                exec(
                    f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
                    + " = {}"
                )

            continue

        key = line.split(":")[0].strip()
        value = ":".join(line.split(":")[1:]).strip()

        if (
            is_float(value)
            or is_integer(value)
            or value == "True"
            or value == "False"
            or ("[" in value and "]" in value)
        ):
            if (
                is_float(key)
                or is_integer(key)
                or key == "True"
                or key == "False"
                or ("[" in key and "]" in key)
            ):
                exec(
                    f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
                )
            else:
                exec(
                    f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
                )
        else:
            if (
                is_float(key)
                or is_integer(key)
                or key == "True"
                or key == "False"
                or ("[" in key and "]" in key)
            ):
                exec(
                    f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
                )
            else:
                exec(
                    f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
                )

    return data


def dumps(yaml: dict, indent="") -> str:
    """A procedure which converts the dictionary passed to the procedure into it's yaml equivalent.

    Args:
        yaml (dict): The dictionary to be converted.

    Returns:
        data (str): The dictionary in yaml form.
    """

    data = ""

    for key in yaml.keys():
        if type(yaml[key]) == dict:
            data += f"n{indent}{key}:n"
            data += dumps(yaml[key], f"{indent}  ")
        else:
            data += f"{indent}{key}: {yaml[key]}n"

    return data


print(load("config.yml"))

Example

config.yml

level 0 value: 0

level 1:
  level 1 value: 1
  level 2:
    level 2 value: 2

level 1 2:
  level 1 2 value: 1 2
  level 2 2:
    level 2 2 value: 2 2

Output

{'level 0 value': 0, 'level 1': {'level 1 value': 1, 'level 2': {'level 2 value': 2}}, 'level 1 2': {'level 1 2 value': '1 2', 'level 2 2': {'level 2 2 value': 2 2}}}
Answered By: Didlly
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.