Open In App

Creating Custom Tag in Python PyYAML

Last Updated : 23 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

YAML, or YAML Markup Language is a data interchange format that is as readable as a text file, and one of the relations of JSON and XML. PyYAML is a YAML parser/ emitter library for Python that can handle parsing as well as the emission of YAML documents. Another nice feature of PyYAML is its ability to handle builtins, that lets you define new tags and work with more complex data in the YAML format.

What Are Custom Tags?

In YAML, the tags are indicators of the node data type. For instance, str stands for a string, and int denotes an integer. In PyYAML, there is a possibility to create the so-called Custom tags which allow for using custom tags to represent the more complex or even the domain-specific data types. This becomes especially useful when the configuration files or data formats use more than simple data types.

Why Use Custom Tags?

Custom tags are beneficial when you need to:

  • Encode complex data structures.
  • Represent domain-specific concepts.
  • Clean the data representation to assist in decreasing clutter.
  • The values should be kept coherent and sound.

Creating Custom Tag in Python PyYAML

To create and use custom tags in PyYAML, you need to follow these steps:

  1. Define the custom data structure.
  2. Create a Python class to represent the custom data structure.
  3. Implement the necessary logic to serialize and deserialize the custom data structure.
  4. Register the custom tag with PyYAML.

Step 1: Define the Custom Data Structure

Let's define a simple custom data structure for a point in a 2D space:

!!point
x: 10
y: 20

Step 2: Create a Python Class

Next, create a Python class to represent this custom data structure, Save this class in a file named point.py.

Python
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point(x={self.x}, y={self.y})"

Step 3: Implement Serialization and Deserialization

Create a file named custom_tags.py to implement the logic for serialising and deserialising the custom data structure and to register the custom tag with PyYAML.

Python
import yaml

def point_representer(dumper, data):
    return dumper.represent_mapping('!point', {'x': data.x, 'y': data.y})

def point_constructor(loader, node):
    values = loader.construct_mapping(node)
    return Point(values['x'], values['y'])

# Register the representer and constructor with PyYAML
yaml.add_representer(Point, point_representer)
yaml.add_constructor('!point', point_constructor)

Step 4: Using the Custom Tag

Now, you can use the custom tag in your YAML files and load them with PyYAML:

File: example.yaml

!point
x: 10
y: 20

Load the YAML data in main.py file

Python
import yaml
from custom_tags import Point

# Load the YAML data
with open('example.yaml', 'r') as file:
    point = yaml.load(file, Loader=yaml.FullLoader)
    print(point)  

# Dump the Point object back to YAML
yaml_string = yaml.dump(point)
print(yaml_string)

Step 5: Run the Python Script

Navigate to the my_project directory in your terminal and run the main.py script:

cd path/to/my_project
python main.py

output:

Point(x=10, y=20)
!!point
x: 10
y: 20

Advanced PyYAML Custom Tags

Let's consider a more advanced example where we define a custom tag for a 3D point:

File: 'point3d.py'

Python
class Point3D:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

    def __repr__(self):
        return f"Point3D(x={self.x}, y={self.y}, z={self.z})"

File: custom_tags.py

Add the following code to handle the 3D point:

Python
import yaml
from point import Point
from point3d import Point3D

# Existing Point serialization and deserialization
def point_representer(dumper, data):
    return dumper.represent_mapping('!point', {'x': data.x, 'y': data.y})

def point_constructor(loader, node):
    values = loader.construct_mapping(node)
    return Point(values['x'], values['y'])

# New Point3D serialization and deserialization
def point3d_representer(dumper, data):
    return dumper.represent_mapping('!point3d', {'x': data.x, 'y': data.y, 'z': data.z})

def point3d_constructor(loader, node):
    values = loader.construct_mapping(node)
    return Point3D(values['x'], values['y'], values['z']})

# Register the representers and constructors with PyYAML
yaml.add_representer(Point, point_representer)
yaml.add_constructor('!point', point_constructor)

yaml.add_representer(Point3D, point3d_representer)
yaml.add_constructor('!point3d', point3d_constructor)

File: example3d.yaml

!point3d
x: 10
y: 20
z: 30

File: main3d.py

Python
import yaml
from custom_tags import Point3D

# Load the YAML data
with open('example3d.yaml', 'r') as file:
    point3d = yaml.load(file, Loader=yaml.FullLoader)
    print(point3d)  # Output: Point3D(x=10, y=20, z=30)

# Dump the Point3D object back to YAML
yaml_string = yaml.dump(point3d)
print(yaml_string)

Run the Python Script

Navigate to the my_project directory in your terminal and run the main.py script:

cd path/to/my_project
python main3d.py

output:

Point3D(x=10, y=20, z=30)
!!point3d
x: 10
y: 20
z: 30

Advanced Features

Here are some of the key advanced features:

1. Custom Constructors and Representers

Custom constructors and representors allow you to define how YAML nodes are converted to Python objects and vice versa. This feature is particularly useful for handling complex data structures or domain-specific objects.

Example: Custom Constructor and Representer for a Date

Python
import yaml
from datetime import datetime

class CustomDate(datetime):
    pass

def date_constructor(loader, node):
    value = loader.construct_scalar(node)
    return CustomDate.strptime(value, '%Y-%m-%d')

def date_representer(dumper, data):
    value = data.strftime('%Y-%m-%d')
    return dumper.represent_scalar('!date', value)

yaml.add_constructor('!date', date_constructor)
yaml.add_representer(CustomDate, date_representer)

Usage:

Python
# YAML data with custom date tag
yaml_data = """
!date '2024-07-10'
"""

# Load the YAML data
date_obj = yaml.load(yaml_data, Loader=yaml.FullLoader)
print(date_obj)   

# Dump the date object back to YAML
yaml_string = yaml.dump(date_obj)
print(yaml_string)

Output:

2024-07-10 00:00:00

2. Custom Resolver

A custom resolver allows you to define how YAML tags are matched to Python types. This can be used to create more intuitive or concise YAML representations.

Example: Custom Resolver for Dates

Python
def date_resolver(loader, node):
    return loader.construct_scalar(node)

yaml.add_implicit_resolver('!date', date_resolver, ['\\d{4}-\\d{2}-\\d{2}'])


Usage:

Python
# YAML data with implicit date recognition
yaml_data = """
2024-07-10
"""

# Load the YAML data
date_obj = yaml.load(yaml_data, Loader=yaml.FullLoader)
print(date_obj)  

Output:

2024-07-10

3. Multi-Document YAML

PyYAML supports multi-document YAML files which allows you to load and dump multiple documents to a single file.

Example: Multi-Document YAML

Python
# Multi-document YAML data
yaml_data = """
---
name: Document 1
value: 123
---
name: Document 2
value: 456
"""

# Load multiple documents
documents = list(yaml.load_all(yaml_data, Loader=yaml.FullLoader))
print(documents)
# Dump multiple documents
yaml_string = yaml.dump_all(documents)
print(yaml_string)

Output :

[{'name': 'Document 1', 'value': 123}, {'name': 'Document 2', 'value': 456}]

Conclusion

Custom tags in PyYAML allow you to set up specific extensions of the YAML language and define new arbitrary structures and domains. Custom types can be defined in Python, and the serialization and deserialization logic required for YAML configurations can be provided by writing appropriate logic in these classes. That is why PyYAML can be considered as a flexible and stable solution for the configuration data management and interchange in Python-based software systems.


Next Article
Article Tags :
Practice Tags :

Similar Reads