Diffing Pretty-Printed JSON Files
This post's featured URL for sharing metadata is https://www.jvt.me/img/profile.jpg.
As a lot of tools are now using JSON as their configuration format, we will inevitably need to compare differences between files.
But it can be quite difficult to see what's going on, especially without using an online tool (which may be quite risky depending on the JSON you're comparing).
Let's use two examples for how we can diff two different files.
For these examples, I would strongly recommend using a diff tool that allows for side-by-side view, such as diff -y
or vimdiff
.
Simple example
Firstly, let's use a more straightforward example.
1.json
{
"key": [
123,
456
],
"key2": "value"
}
2.json
{
"key": [
456
],
"key2": "value"
}
We can utilise one of the solutions documented in my pretty-print-json series to pretty-print the JSON before diffing it, to try and make it more readable:
$ diff -u <(python -mjson.tool 1.json) <(python -mjson.tool 2.json)
This gives us the following output, which is a little clearer:
--- /proc/self/fd/11 2020-08-24 19:22:51.513741484 +0100
+++ /proc/self/fd/13 2020-08-24 19:22:51.513741484 +0100
@@ -1,7 +1,6 @@
{
"key": [
- 456,
- 123
+ 456
],
"key2": "value"
}
More complex example
However, the above is a bad example, as it's not super realistic, as we generally have a large, nested document, as well as the keys being in different orders:
1.json
{
"Resources": {
"Ec2Instance": {
"Type": "AWS::EC2::Instance",
"Properties": {
"SecurityGroups": [
{
"Ref": "InstanceSecurityGroup"
},
"MyExistingSecurityGroup"
],
"KeyName": {
"Ref": "KeyName"
},
"ImageId": "ami-7a11e213"
}
},
"InstanceSecurityGroup": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "Enable direct HTTP access",
"SecurityGroupIngress": [
{
"IpProtocol": "https",
"FromPort": "443",
"ToPort": "443",
"CidrIp": "0.0.0.0/0"
}
]
}
}
}
}
2.json
{
"Parameters": {
"KeyName": {
"Description": "The EC2 Key Pair to allow SSH access to the instance",
"Type": "AWS::EC2::KeyPair::KeyName"
}
},
"Resources": {
"Ec2Instance": {
"Type": "AWS::EC2::Instance",
"Properties": {
"SecurityGroups": [
{
"Ref": "InstanceSecurityGroup"
},
"MyExistingSecurityGroup"
],
"KeyName": {
"Ref": "KeyName"
},
"ImageId": "ami-7a11e213"
}
},
"InstanceSecurityGroup": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "Enable SSH access via port 22",
"SecurityGroupIngress": [
{
"ToPort": "22",
"FromPort": "22",
"CidrIp": "0.0.0.0/0",
"IpProtocol": "tcp",
"Description": "Only found on 2.json"
}
]
}
}
}
}
If we were to use the above diff example, we'd end up with quite a few lines showing as diffs, even though they've actually got a lot in common:
--- /proc/self/fd/11 2020-08-24 20:34:48.645452666 +0100
+++ /proc/self/fd/13 2020-08-24 20:34:48.648785937 +0100
@@ -1,4 +1,10 @@
{
+ "Parameters": {
+ "KeyName": {
+ "Description": "The EC2 Key Pair to allow SSH access to the instance",
+ "Type": "AWS::EC2::KeyPair::KeyName"
+ }
+ },
"Resources": {
"Ec2Instance": {
"Type": "AWS::EC2::Instance",
@@ -18,13 +24,14 @@
"InstanceSecurityGroup": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
- "GroupDescription": "Enable direct HTTP access",
+ "GroupDescription": "Enable SSH access via port 22",
"SecurityGroupIngress": [
{
- "IpProtocol": "https",
- "FromPort": "443",
- "ToPort": "443",
- "CidrIp": "0.0.0.0/0"
+ "ToPort": "22",
+ "FromPort": "22",
+ "CidrIp": "0.0.0.0/0",
+ "IpProtocol": "tcp",
+ "Description": "Only found on 2.json"
}
]
}
So instead, I'd recommend reaching for something that can sort the JSON documents to make it a bit easier semantically, such as this JSON script I've written (as an aside, this is based on the fact that JSON keys should not be order-dependent - if yours are, you'll not have a great time).
This means that we can run the following:
$ diff -u <(ruby sort-keys.rb 1.json) <(ruby sort-keys.rb 2.json)
This gives us the following, slightly easier to understand, output:
--- /proc/self/fd/11 2020-08-24 20:10:11.630671779 +0100
+++ /proc/self/fd/13 2020-08-24 20:10:11.630671779 +0100
@@ -1,4 +1,10 @@
{
+ "Parameters": {
+ "KeyName": {
+ "Description": "The EC2 Key Pair to allow SSH access to the instance",
+ "Type": "AWS::EC2::KeyPair::KeyName"
+ }
+ },
"Resources": {
"Ec2Instance": {
"Properties": {
@@ -17,13 +23,14 @@
},
"InstanceSecurityGroup": {
"Properties": {
- "GroupDescription": "Enable direct HTTP access",
+ "GroupDescription": "Enable SSH access via port 22",
"SecurityGroupIngress": [
{
"CidrIp": "0.0.0.0/0",
- "FromPort": "443",
- "IpProtocol": "https",
- "ToPort": "443"
+ "Description": "Only found on 2.json",
+ "FromPort": "22",
+ "IpProtocol": "tcp",
+ "ToPort": "22"
}
]
},