Exploring XML Serialisation for Arithmetic Expressions in C#
This post explores XML serialisation in C# using the XmlSerializer
class and different ways to represent arithmetic expressions in XML.
The arithmetic expressions and the calculator
project are introduced in this post. Another post covers the JSON serialisation of these expressions. This post builds on that foundation and focuses on XML.
Here is an example. One way to represent the expression 2 * (3 - 1) * 2
is:
<Expression xsi:type="Operation" Name="Multiplication">
<Subexpressions>
<Expression xsi:type="Value">2</Expression>
<Expression xsi:type="Operation" Name="Subtraction">
<Subexpressions>
<Expression xsi:type="Value">3</Expression>
<Expression xsi:type="Value">1</Expression>
</Subexpressions>
</Expression>
<Expression xsi:type="Value">2</Expression>
</Subexpressions>
</Expression>
And the same expression in Prefix (Polish) Notation is:
<Elements>
<Element xsi:type="Operation">
<Name>Multiplication</Name>
<Arity>3</Arity>
</Element>
<Element xsi:type="Value">2</Element>
<Element xsi:type="Operation">
<Name>Subtraction</Name>
<Arity>2</Arity>
</Element>
<Element xsi:type="Value">3</Element>
<Element xsi:type="Value">1</Element>
<Element xsi:type="Value">2</Element>
</Elements>
Both representations are implemented and presented in this post.
The calculator
project is available on GitHub here. The source code for everything described in this post resides in the pull request here.
Implementation
This section walks through the implementation of the serialisers commit by commit. The first serialiser is named "Nested" because it nests subexpressions within expressions. It is similar to the JSON representation and to the way the Expression
class itself encapsulates expressions. The second serialiser is named "PrefixForm" because it serialises expressions into Prefix (Polish) Notation.
Preliminaries
The first commit sets up the Calculator.Serializers
and Calculator.Serializers.Tests
projects and adds directories for the two serialisers.
Why new projects? This comes down to personal preference. Up to this point, the solution used a single pair of projects for organisation - Calculator.Core
and Calculator.Core.Tests
. Now, the solution is going to include three serialisers: two new XML serialisers and one existing JSON serialiser. The new functionality is sufficiently distinct from the core functionality to consider a separate project.
Why directories? This comes down to name conflicts. The Calculator.Serializers
project will contain multiple serialisers, each with its own set of related classes. Using separate directories allows the reuse of names for these helper classes.
Nested XML Serialiser
This commit implements the Nested XML Serialiser in full. The following three sections discuss each part of the implementation: data models, serialising logic, and tests.
Modelling
There are two ways to approach modelling:
- Start with the XML model and deserialise it into a C# class.
- Start with a C# class and serialise it into XML.
The first approach is suitable when the XML model is inflexible. For example, when it is already used elsewhere. However, this approach carries the risk of needing to implement a custom deserialiser if the default deserialiser doesn't work with the XML model's peculiarities.
The second approach starts with a C# class, runs the XML Serialiser on it, checks the XML output, and then iterates until the result is satisfactory. This method ensures that the serialisation/deserialisation process is as easy as possible. It is suitable when the goal is to produce an XML output without any restrictions on its form.
This project adopted the second approach and settled on the following model:
[XmlType(TypeName="Data")]
public sealed class Data
{
public ExpressionModel Expression { get; set; }
}
[XmlInclude(typeof(OperationExpressionModel))]
[XmlInclude(typeof(ValueExpressionModel))]
[XmlType(TypeName="Expression")]
public abstract class ExpressionModel
{
}
[XmlType(TypeName = "Operation")]
public sealed class OperationExpressionModel : ExpressionModel
{
[XmlAttribute]
public string Name { get; set; }
public List<ExpressionModel> Subexpressions { get; set; }
}
[XmlType(TypeName = "Value")]
public sealed class ValueExpressionModel : ExpressionModel
{
[XmlText]
public double Value { get; set; }
}
Serialisation & Deserialisation Methods
The logic in the NestedXmlSerializer
class can be divided into two parts.
The first part involves mapping between XML and C# models using the XmlSerializer
. The code is adapted from hereand here, but it replaces stream writers/readers with string writers/readers.
The second part involves mapping between the Expression
class and the ExpressionModel
class using the private methods ConvertToModel
and ConvertFromModel
. This mapping is adapted from the similar mapping used in the previously implemented JSON serialiser.
Testing
The Nested XML Serialiser is tested with six unit tests, which cover each combination of expression type (single-valued, multi-valued, and nested) with each method (serialise and deserialise).
The serialised XML takes form of a single-line string, which would be unintelligible to humans in the unit tests if used unchanged. This issue is addressed using the raw string literals feature of C# combined with some string manipulation.
The result is both human-readable and exactly matches the single-line serialisation:
string expected =
"""
<?xml version="1.0" encoding="utf-16"?>
<Data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Expression xsi:type="Operation" Name="Addition">
<Subexpressions>
<Expression xsi:type="Value">1.5</Expression>
<Expression xsi:type="Value">1.5</Expression>
</Subexpressions>
</Expression>
</Data>
""".Replace("\r", "").Replace("\n", "").Replace(" ", "");
Prefix Form XML Serialiser
In Prefix (Polish) Notation, all operators precede their arguments. Consider the expression 2+2
. In prefix form, it is expressed as +22
.
There are multiple ways to handle operators that can apply to a varying number of arguments. One approach is to treat each operator as having an arity of 2. For example, the expression 2*2*2
becomes **222
. Another approach is to explicitly specify the arity of each operator. In this case, the expression 2*2*2
becomes *222
with multiplication having an arity of 3; or, specifying the arity in brackets, it becomes *(3)222
.
Consider a more complex expression, 2*(3-1)*2
, as an example. Using the first approach, it is written as **2-312
. Using the second approach, it is written as *(3)2-(2)312
.
This project already supports operators with arbitrary arity. Therefore, the implementation proceeds using the second approach. As mentioned in the introduction, the expression 2 * (3 - 1) * 2
in XML form is:
<Elements>
<Element xsi:type="Operation">
<Name>Multiplication</Name>
<Arity>3</Arity>
</Element>
<Element xsi:type="Value">2</Element>
<Element xsi:type="Operation">
<Name>Subtraction</Name>
<Arity>2</Arity>
</Element>
<Element xsi:type="Value">3</Element>
<Element xsi:type="Value">1</Element>
<Element xsi:type="Value">2</Element>
</Elements>
Tests Adaptation
The first commit sets up the interface and models for the new serialiser. The second commit copy-pastes the tests from the Nested XML Serialiser. The third commit adapts these tests for the Prefix Form XML Serialiser.
It's possible to adapt tests in this way due to the similarities between the two serialiser classes. Separating the copy-pasting and adaptation into two distinct commits helps with the review process when examining the differences between commits in Git.
Serialisation
This commit implements the Serialize
method, which passes all three serialisation tests.
The Prefix Form XML Serialiser uses the same code for converting a C# class into XML as the Nested XML Serialiser does. However, mapping the Expression
class to the ExpressionModel
is slightly different. It uses the local function feature of C#. The ConvertToModel
method initialises the data
object, while the Process
method recursively processes the expression and populates the elements
variable.
private Data ConvertToModel(Expression expression)
{
List<ExpressionElement> elements = new();
Data data = new() { Elements = elements };
Process(expression);
return data;
void Process(Expression expression)
{
// processing
}
}
Deserialisation
This commit implements the Deserialize
method. The code structure is identical to that of serialisation, with a recursive local function once again put to good use.
Due to the unconstrained nature of prefix notation, not all valid XML documents correspond to valid expressions. For example, +(2)222
is erroneously deserialised as 2+2
, while -(3)111
raises an exception because subtraction has an arity of 2.
The following commits deal with these special cases. The first commit handles cases where there are too many or too few elements. The second commit addresses arity mismatches.
JSON Serialiser Refactor
At this point, the codebase contains the previously implemented JSON serialiser and two new XML serialisers. Now it is time to ensure consistency among all three.
- The first commit moves the JSON serialiser and its tests from the
Core
project to theSerializers
project. - The second commit ensures that all serialisers implement the same interface.
The next commit applies lessons learnt from implementing the XML serialisers to the JSON implementation. It rewrites verbatim strings as raw strings to support multi-line JSON specifications. It generalises variable names to expected
, data
, and result
instead of referring specifically to json
. It also adds tests for nested expressions, which became possible due to the support for multi-line JSON specifications.
Conclusion
This blog post introduced XML serialisation in C# by demonstrating two different XML representations of arithmetic expressions. It highlighted several C# techniques, such as raw strings and recursive local functions. It mentioned a few architectural considerations, including consistency, the use of projects and directories, and approaches to C#/XML modelling. Additionally, the post provided reference code for using the XmlSerializer
class.