C# Reflection Disrupting XML Element Serialization Order

W3C XML logo with scrambled letters

Context

I encountered some unintentional and perhaps unusual behaviour when serializing to XML recently. For context, due to problems within a legacy system objects were being serialized to XML with “bad” data. Data which should not be possible within the business logic. The logic was spread around an enormous number of poorly organised MVC views and controllers, which had been generated by a proprietary framework. It’s a grim system. Rather than try to dig out all that logic, the decision was to use reflection to conditionally amend field values of the object due to be serialized. This system would allow ad-hoc object field sanitisation based on configured rules.

The Problem

After successfully amending values within the objects using reflection, the XML output was mutated. XML elements were no longer serialized in declaration order. Specifically, the fields that were read by reflection were serialized first. Now it’s up to the system parsing the XML to really define if that matters. But we weren’t in a position to take the risk. Onto a minimal example.

Let’s say that this class is our object to modify before serializing to XML:

public class Item
{
    public int LineNo;
    public string Name;
    public string Description;
    public int PriceInPence;
}

Forgive the code to serialize it. It comes from a land of many unknowns.

public static string Serialize(Item item)
{
    var nameSpace = new XmlSerializerNamespaces();
    nameSpace.Add("", "");
    var xsSubmit = new XmlSerializer(typeof(Item));

    using var sww = new StringWriter();
    using XmlWriter writer = XmlWriter.Create(sww, XMLOptions);
    xsSubmit.Serialize(writer, item, nameSpace);
    string? xml = sww.ToString();
    return xml;
}

The XML would previously look a little bit like the following snippet. All elements in declaration order.

<Item>
  <LineNo>0</LineNo>
  <Name>Valid</Name>
  <Description />
  <PriceInPence>999</PriceInPence>
</Item>

But if we read our field values using reflection:

public static Item Reflect(Item item)
{
    var nameValue = GetFieldValue(item, "Name");
    if(nameValue.Item1?.ToString() == "Invalid")
    {
        SetFieldValue(item, "Name", "Valid");
    }
    return item;
}

We get our mutated XML, with elements appearing before LineNo:

<Item>
  <Name>Valid</Name>
  <LineNo>0</LineNo>
  <Description />
  <PriceInPence>999</PriceInPence>
</Item>

Crackers. We worked around the issue but it was a strange outcome for simple code. We can follow this back into the dotnet repo. Looking at the XmlSerializer class, there is reflection based serialization. Our ghosts may live there. Additionally, It appears that getting fields/properties in declaration order from reflection has only been standardised recently despite some long running discussion around it. There’s a reference to the internal caching done with reflection results. I wonder if reading a field then adds it to that internal cache ahead of its peers and results in the native order being lost.

So when we get our field (GetField) we can see our first reference to the caching with the private Cache of type RuntimeTypeCache. If we check out the references to the cache in the debugger quickly we can that after getting the field it contains our “Name” field: One item in the field info cache

But after serializing the object to XML we see the rest of the fields, in the mangled order after merging the other fields into the list: Four items in the field info cache

I’ll have a proper look when I get time and try to see what actually happens.