"Respectful" YAML patching in Rust

Patching a YAML file programmatically is straightforward in principle: parse, modify, serialize. Ideally the process should also be respectful — that is, preserve the following properties of the initial file:

  1. Formatting. The same YAML value can be represented in multiple ways: how mappings and lists are indented, whether blank lines separate sections, how strings are quoted, and so on. For example, a list can be represented in block style

    items:
      - 1
      - 2
      - 3
    

    or in flow style

    items: [1, 2, 3]
    

    A general-purpose YAML library typically picks one canonical form when serializing and applies it to the entire document.

  2. Comments. One of YAML’s ergonomic advantages is that a value can have an associated inline note explaining why it’s set the way it is. Comments are typically erased at the deserialization stage and therefore have no chance to be serialized back.

Losing either property hurts. A dropped comment effectively loses historical context. Mangled formatting can render the resulting file invalid, or wipe out a layout that was carefully chosen for a specific situation (e.g. turn an intentional flow list into a block list).

Reaching for a popular general-purpose YAML library is the obvious move, but none of them preserve both:

So a more niche tool is needed.

The candidates

A search of crates.io and lib.rs for libraries that claim comment preservation turns up four candidates:

The experiment

The example below uses a simplified config for a trading bot. The assets are grouped into named groups with a catch-all default group:

# outer comment
asset_groups:
  group_abc:    # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    -  DOGE       # asset comment
    - PEPE
  default:
    # default group inner comment
    - 1INCH
    - ATOM
    - LINK

The toy CLI used here supports two operations:

Listing assets

The first test is a single list-assets invocation with four assets, picked to exercise three cases at once:

list-assets 1INCH,BTC,XRP,BNB

The expected output:

# outer comment
asset_groups:
  group_abc:    # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    -  DOGE       # asset comment
    - PEPE
  default:
    # default group inner comment
    - 1INCH
    - ATOM
    - BNB
    - LINK
    - XRP
yamlpath + yamlpatch — exact match
# outer comment
asset_groups:
  group_abc:    # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    -  DOGE       # asset comment
    - PEPE
  default:
    # default group inner comment
    - 1INCH
    - ATOM
    - BNB
    - LINK
    - XRP
yaml-edit — outer comment dropped, "default" misindented
asset_groups:
  group_abc:    # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    -  DOGE       # asset comment
    - PEPE
  default:
    # default group inner comment
                - 1INCH
    - ATOM
    - BNB
    - LINK
    - XRP
rust-yaml — multiple issues, disqualified
# outer comment
asset_groups: 
  group_abc: 
    - BTC
    - ETH
    - SOL
  group_xyz: 
    - DOGE
    - PEPE
  default: 
    - 1
    - 1INCH
    - ATOM
    - BNB
    - INCH
    - LINK
    - XRP
# group_abc comment

# outer comment

# outer comment

# group_abc comment

The comments are scattered (some end up at the bottom of the file, some duplicated), 1INCH is split into two list items (- 1 and - INCH), and the deliberate whitespace and inline comment on DOGE are both lost.

The library’s own comment_preservation_demo.rs exhibits the same comment-scattering behavior when run unmodified.

yamp — parsing issues, disqualified

No output is shown here because some of the comments in the input confuse yamp’s parser.

list-assets is the easier of the two operations since it only touches a single group and only adds. yamlpath + yamlpatch round-trip the file exactly. yaml-edit does violate both properties, but not severely enough to disqualify it on this test alone.

Delisting assets

delist-assets is the more demanding operation: any group can be modified, any asset can be removed, groups can be removed entirely. The test:

delist-assets DOGE,PEPE,BTC,SOL,ATOM,SHIB

That covers every interesting case at once:

The expected output:

# outer comment
asset_groups:
  group_abc:    # group_abc comment
    - ETH
  default:
    # default group inner comment
    - 1INCH
    - LINK
yamlpath + yamlpatch — almost, a single comment rearranged
# outer comment
asset_groups:
  group_abc:    # group_abc comment
    - ETH  # group_xyz outer comment
  default:
    # default group inner comment
    - 1INCH
    - LINK

When the now-empty group_xyz: key is removed, the standalone comment that was sitting on the line above it doesn’t get removed with it. Instead it migrates onto the nearest surviving content line as an inline comment. The output is valid YAML and no comment is lost, but the comment is now attached to the wrong list item.

yaml-edit — logical structure changed, disqualified
asset_groups:
  group_abc:    # group_abc comment
                - ETH
  # group_xyz outer comment
    default:
    # default group inner comment
                - 1INCH
    - LINK

The indentation shift on default: is not just cosmetic: default is now nested inside group_abc rather than being a sibling. The two top-level groups have collapsed into one.

yamlpath + yamlpatch produces valid YAML, but the stranded comment is a violation of the “respectfulness” properties — which is probably not a dealbreaker given the state of other libraries.

The winner

yamlpath + yamlpatch is currently the best of the available options. It is not perfect, but it is actively maintained and it can be made to work with some workarounds and compromises. Here are some caveats I encountered while trying to make it work for my actual use case.

Op::Replace doesn’t work on sequences

yamlpatch-replace-list.rs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
use std::collections::HashSet;

use anyhow::Context as _;
use clap::Parser;
use yamlpatch::{Op, Patch, apply_yaml_patches};

const INPUT: &str = "\
asset_groups:
  default:
    - 1INCH
    - ATOM
    - LINK
";

#[derive(Parser)]
struct Args {
    /// Comma-separated assets to list (i.e. add to `default` if missing).
    #[arg(long)]
    assets: String,
}

fn main() -> anyhow::Result<()> {
    let args = Args::parse();

    let new_assets: Vec<String> = args
        .assets
        .split(',')
        .map(|s| s.trim().to_string())
        .collect();

    // parse old assets
    let parsed: serde_yaml::Value = serde_yaml::from_str(INPUT).unwrap();
    let default_old = parsed
        .get("asset_groups")
        .and_then(|v| v.get("default"))
        .and_then(|v| v.as_sequence())
        .unwrap()
        .iter()
        .map(|v| v.as_str().unwrap().to_string())
        .collect::<Vec<_>>();

    // construct new assets
    let mut default_new = Vec::<String>::from_iter(HashSet::<String>::from_iter(
        default_old.iter().cloned().chain(new_assets),
    ));
    default_new.sort();

    let new_seq: Vec<serde_yaml::Value> = default_new
        .into_iter()
        .map(serde_yaml::Value::String)
        .collect();

    let doc = yamlpath::Document::new(INPUT.to_string()).unwrap();
    let patch = Patch {
        route: yamlpath::route!("asset_groups", "default"),
        operation: Op::Replace(serde_yaml::Value::Sequence(new_seq)),
    };

    let new_doc = apply_yaml_patches(&doc, &[patch]).context("apply patches")?;
    print!("{}", new_doc.source());
    Ok(())
}
$ cargo run --example yamlpatch-replace-list -- --assets 1INCH,BTC,XRP,BNB
Error: apply patches

Caused by:
    0: YAML query error: input is not valid YAML
    1: input is not valid YAML

Updating a list requires a workaround

Since Op::Replace is unusable on sequences, updating a list end-to-end requires a workaround: append the entire desired list to the end first, then remove the original items from the front one at a time:

yamlpatch-rotate-replace-list.rs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
use std::collections::HashSet;

use anyhow::Context as _;
use clap::Parser;
use yamlpatch::{Op, Patch, apply_yaml_patches};

const INPUT: &str = "\
asset_groups:
  default:
    - 1INCH
    - ATOM
    - LINK
";

#[derive(Parser)]
struct Args {
    /// Comma-separated assets to list (i.e. add to `default` if missing).
    #[arg(long)]
    assets: String,
}

fn main() -> anyhow::Result<()> {
    let args = Args::parse();

    let new_assets: Vec<String> = args
        .assets
        .split(',')
        .map(|s| s.trim().to_string())
        .collect();

    // parse old assets
    let parsed: serde_yaml::Value = serde_yaml::from_str(INPUT).unwrap();
    let default_old = parsed
        .get("asset_groups")
        .and_then(|v| v.get("default"))
        .and_then(|v| v.as_sequence())
        .unwrap()
        .iter()
        .map(|v| v.as_str().unwrap().to_string())
        .collect::<Vec<_>>();

    // construct new assets
    let mut default_new = Vec::<String>::from_iter(HashSet::<String>::from_iter(
        default_old.iter().cloned().chain(new_assets),
    ));
    default_new.sort();

    let mut patches: Vec<Patch> = Vec::new();
    for item in default_new {
        patches.push(Patch {
            route: yamlpath::route!("asset_groups", "default"),
            operation: Op::Append {
                value: serde_yaml::Value::String(item),
            },
        });
    }

    for _ in 0..default_old.len() {
        patches.push(Patch {
            route: yamlpath::route!("asset_groups", "default", 0usize),
            operation: Op::Remove,
        });
    }

    let doc = yamlpath::Document::new(INPUT.to_string()).unwrap();
    let new_doc = apply_yaml_patches(&doc, &patches).context("apply patches")?;
    print!("{}", new_doc.source());
    Ok(())
}
$ cargo run --example yamlpatch-rotate-replace-list -- --assets 1INCH,BTC,XRP,BNB
asset_groups:
  default:
    - 1INCH
    - ATOM
    - BNB
    - BTC
    - LINK
    - XRP

It doesn’t play well with flow-style lists

yamlpatch-flow-list.rs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
use std::collections::HashSet;

use anyhow::Context as _;
use clap::Parser;
use yamlpatch::{Op, Patch, apply_yaml_patches};

const INPUT: &str = "\
asset_groups:
  default: [1INCH, ATOM, LINK]
";

#[derive(Parser)]
struct Args {
    /// Comma-separated assets to list (i.e. add to `default` if missing).
    #[arg(long)]
    assets: String,
}

fn main() -> anyhow::Result<()> {
    let args = Args::parse();

    let new_assets: Vec<String> = args
        .assets
        .split(',')
        .map(|s| s.trim().to_string())
        .collect();

    // parse old assets
    let parsed: serde_yaml::Value = serde_yaml::from_str(INPUT).unwrap();
    let default_old = parsed
        .get("asset_groups")
        .and_then(|v| v.get("default"))
        .and_then(|v| v.as_sequence())
        .unwrap()
        .iter()
        .map(|v| v.as_str().unwrap().to_string())
        .collect::<Vec<_>>();

    // construct new assets
    let mut default_new = Vec::<String>::from_iter(HashSet::<String>::from_iter(
        default_old.iter().cloned().chain(new_assets),
    ));
    default_new.sort();

    let mut patches: Vec<Patch> = Vec::new();
    for item in default_new {
        patches.push(Patch {
            route: yamlpath::route!("asset_groups", "default"),
            operation: Op::Append {
                value: serde_yaml::Value::String(item),
            },
        });
    }

    let doc = yamlpath::Document::new(INPUT.to_string()).unwrap();
    let new_doc = apply_yaml_patches(&doc, &patches).context("apply patches")?;
    print!("{}", new_doc.source());
    Ok(())
}
$ cargo run --example yamlpatch-flow-list -- --assets 1INCH,BTC,XRP,BNB
Error: apply patches

Caused by:
    Invalid operation: append operation is not permitted against flow sequence route: Route { route: [Key("asset_groups"), Key("default")] }

Conclusion

yamlpath + yamlpatch is the only option that comes truly close to “respectful” patching as defined here. It is very much usable in practice, even though it doesn’t cover every case out of the box.


Full accompanying source code can be found here. Built with rustc 1.95.0. Library versions used: yamlpath 1.24.1, yamlpatch 1.24.1, yaml-edit 0.2.1, rust-yaml 0.0.5, yamp 0.1.0.