Exploring XML Serialisation for Arithmetic Expressions in C#

Exploring XML Serialisation for Arithmetic Expressions in C#

This post explores XML serialisation in C# using the XmlSerializer class and different ways to represent arithmetic expressions in XML.

The arithmetic expressions and the calculator project are introduced in this post. Another post covers the JSON serialisation of these expressions. This post builds on that foundation and focuses on XML.

Here is an example. One way to represent the expression 2 * (3 - 1) * 2 is:

<Expression xsi:type="Operation" Name="Multiplication">
	<Subexpressions>
		<Expression xsi:type="Value">2</Expression>
		<Expression xsi:type="Operation" Name="Subtraction">
			<Subexpressions>
				<Expression xsi:type="Value">3</Expression>
				<Expression xsi:type="Value">1</Expression>
			</Subexpressions>
		</Expression>
		<Expression xsi:type="Value">2</Expression>
	</Subexpressions>
</Expression>

And the same expression in Prefix (Polish) Notation is:

<Elements>
	<Element xsi:type="Operation">
		<Name>Multiplication</Name>
		<Arity>3</Arity>
	</Element>
	<Element xsi:type="Value">2</Element>
	<Element xsi:type="Operation">
		<Name>Subtraction</Name>
		<Arity>2</Arity>
	</Element>
	<Element xsi:type="Value">3</Element>
	<Element xsi:type="Value">1</Element>
	<Element xsi:type="Value">2</Element>
</Elements>

Both representations are implemented and presented in this post.

The calculator project is available on GitHub here. The source code for everything described in this post resides in the pull request here.

Implementation

This section walks through the implementation of the serialisers commit by commit. The first serialiser is named "Nested" because it nests subexpressions within expressions. It is similar to the JSON representation and to the way the Expression class itself encapsulates expressions. The second serialiser is named "PrefixForm" because it serialises expressions into Prefix (Polish) Notation.

Preliminaries

The first commit sets up the Calculator.Serializers and Calculator.Serializers.Tests projects and adds directories for the two serialisers.

Why new projects? This comes down to personal preference. Up to this point, the solution used a single pair of projects for organisation - Calculator.Core and Calculator.Core.Tests. Now, the solution is going to include three serialisers: two new XML serialisers and one existing JSON serialiser. The new functionality is sufficiently distinct from the core functionality to consider a separate project.

Why directories? This comes down to name conflicts. The Calculator.Serializers project will contain multiple serialisers, each with its own set of related classes. Using separate directories allows the reuse of names for these helper classes.

Nested XML Serialiser

This commit implements the Nested XML Serialiser in full. The following three sections discuss each part of the implementation: data models, serialising logic, and tests.

Modelling

There are two ways to approach modelling:

  1. Start with the XML model and deserialise it into a C# class.
  2. Start with a C# class and serialise it into XML.

The first approach is suitable when the XML model is inflexible. For example, when it is already used elsewhere. However, this approach carries the risk of needing to implement a custom deserialiser if the default deserialiser doesn't work with the XML model's peculiarities.

The second approach starts with a C# class, runs the XML Serialiser on it, checks the XML output, and then iterates until the result is satisfactory. This method ensures that the serialisation/deserialisation process is as easy as possible. It is suitable when the goal is to produce an XML output without any restrictions on its form.

This project adopted the second approach and settled on the following model:

[XmlType(TypeName="Data")]
public sealed class Data
{
    public ExpressionModel Expression { get; set; }
}

[XmlInclude(typeof(OperationExpressionModel))]
[XmlInclude(typeof(ValueExpressionModel))]
[XmlType(TypeName="Expression")]
public abstract class ExpressionModel
{
}

[XmlType(TypeName = "Operation")]
public sealed class OperationExpressionModel : ExpressionModel
{
    [XmlAttribute]
    public string Name { get; set; }
    public List<ExpressionModel> Subexpressions { get; set; }
}

[XmlType(TypeName = "Value")]
public sealed class ValueExpressionModel : ExpressionModel
{
    [XmlText]
    public double Value { get; set; }
}

Serialisation & Deserialisation Methods

The logic in the NestedXmlSerializer class can be divided into two parts.

The first part involves mapping between XML and C# models using the XmlSerializer. The code is adapted from hereand here, but it replaces stream writers/readers with string writers/readers.

The second part involves mapping between the Expression class and the ExpressionModel class using the private methods ConvertToModel and ConvertFromModel. This mapping is adapted from the similar mapping used in the previously implemented JSON serialiser.

Testing

The Nested XML Serialiser is tested with six unit tests, which cover each combination of expression type (single-valued, multi-valued, and nested) with each method (serialise and deserialise).

The serialised XML takes form of a single-line string, which would be unintelligible to humans in the unit tests if used unchanged. This issue is addressed using the raw string literals feature of C# combined with some string manipulation.

The result is both human-readable and exactly matches the single-line serialisation:

string expected =
	"""
	<?xml version="1.0" encoding="utf-16"?>
	<Data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
		<Expression xsi:type="Operation" Name="Addition">
			<Subexpressions>
				<Expression xsi:type="Value">1.5</Expression>
				<Expression xsi:type="Value">1.5</Expression>
			</Subexpressions>
		</Expression>
	</Data>
	""".Replace("\r", "").Replace("\n", "").Replace("  ", "");

Prefix Form XML Serialiser

In Prefix (Polish) Notation, all operators precede their arguments. Consider the expression 2+2. In prefix form, it is expressed as +22.

There are multiple ways to handle operators that can apply to a varying number of arguments. One approach is to treat each operator as having an arity of 2. For example, the expression 2*2*2 becomes **222. Another approach is to explicitly specify the arity of each operator. In this case, the expression 2*2*2 becomes *222 with multiplication having an arity of 3; or, specifying the arity in brackets, it becomes *(3)222.

Consider a more complex expression, 2*(3-1)*2, as an example. Using the first approach, it is written as **2-312. Using the second approach, it is written as *(3)2-(2)312.

This project already supports operators with arbitrary arity. Therefore, the implementation proceeds using the second approach. As mentioned in the introduction, the expression 2 * (3 - 1) * 2 in XML form is:

<Elements>
	<Element xsi:type="Operation">
		<Name>Multiplication</Name>
		<Arity>3</Arity>
	</Element>
	<Element xsi:type="Value">2</Element>
	<Element xsi:type="Operation">
		<Name>Subtraction</Name>
		<Arity>2</Arity>
	</Element>
	<Element xsi:type="Value">3</Element>
	<Element xsi:type="Value">1</Element>
	<Element xsi:type="Value">2</Element>
</Elements>

Tests Adaptation

The first commit sets up the interface and models for the new serialiser. The second commit copy-pastes the tests from the Nested XML Serialiser. The third commit adapts these tests for the Prefix Form XML Serialiser.

It's possible to adapt tests in this way due to the similarities between the two serialiser classes. Separating the copy-pasting and adaptation into two distinct commits helps with the review process when examining the differences between commits in Git.

Serialisation

This commit implements the Serialize method, which passes all three serialisation tests.

The Prefix Form XML Serialiser uses the same code for converting a C# class into XML as the Nested XML Serialiser does. However, mapping the Expression class to the ExpressionModel is slightly different. It uses the local function feature of C#. The ConvertToModel method initialises the data object, while the Process method recursively processes the expression and populates the elements variable.

    private Data ConvertToModel(Expression expression)
    {
        List<ExpressionElement> elements = new();
        Data data = new() { Elements = elements };
        Process(expression);
        return data;

        void Process(Expression expression)
        {
            // processing
        }
    }

Deserialisation

This commit implements the Deserialize method. The code structure is identical to that of serialisation, with a recursive local function once again put to good use.

Due to the unconstrained nature of prefix notation, not all valid XML documents correspond to valid expressions. For example, +(2)222 is erroneously deserialised as 2+2, while -(3)111 raises an exception because subtraction has an arity of 2.

The following commits deal with these special cases. The first commit handles cases where there are too many or too few elements. The second commit addresses arity mismatches.

JSON Serialiser Refactor

At this point, the codebase contains the previously implemented JSON serialiser and two new XML serialisers. Now it is time to ensure consistency among all three.

  • The first commit moves the JSON serialiser and its tests from the Core project to the Serializers project.
  • The second commit ensures that all serialisers implement the same interface.

The next commit applies lessons learnt from implementing the XML serialisers to the JSON implementation. It rewrites verbatim strings as raw strings to support multi-line JSON specifications. It generalises variable names to expected, data, and result instead of referring specifically to json. It also adds tests for nested expressions, which became possible due to the support for multi-line JSON specifications.

Conclusion

This blog post introduced XML serialisation in C# by demonstrating two different XML representations of arithmetic expressions. It highlighted several C# techniques, such as raw strings and recursive local functions. It mentioned a few architectural considerations, including consistency, the use of projects and directories, and approaches to C#/XML modelling. Additionally, the post provided reference code for using the XmlSerializer class.

Read more