Friday, October 21, 2011

XmlSerializer vs DataContractSerializer: Serialization in Wcf


The XmlSerializer has been in .Net since version 1.0 and has served us well for everything from Remoting, Web Services, serializinto a file, etc. However in .Net 3.0 the DataContractSerializer came along.  And all of a sudden a lot of guidance suggests that we should use it over the old tried and true XmlSerializer. Wcf even uses this as the default mechanism for serialization.  The question is, “Is it really better?”.  The verdict is yes, and no.  Like most things it depends on your implementation and what you need.  For Wcf, you should prefer to use the DataContractSerializer.  If you need full control over how the xml looks though, you should go back to the XmlSerializer.
Lets look at the both of these in detail and leave it up to you to decide which is best for your implementation.  Here are a few of the advantages and disadvantages of each of them:
XmlSerializerDataContractSerializer
Advantages:1. Opt-out rather than opt-in properties to serialize. This mean you don’t have to specify each and every property to serialize, only those you don’t wan toserialize2. Full control over how a property is serialized including it it should be a node or an attribute
3. Supports more of the XSD standard
Disadvantages:1. Can only serialize properties
2. Properties must be public
3. Properties must have a get and a set which can result in some awkward design
4. Supports a narrower set of types
5. Cannot understand the DataContractAttribute and will not serialize it unless there is a SerializableAttribute too
Advantages:1. Opt-in rather than opt-out properties to serialize. This mean you specify what you want serialize
2. Because it is opt in you can serialize not only properties, but also fields.  You can even serialize non-public members such as private or protected members. And you dont need a set on a property either (however without a setter you can serialize, but not deserialize)
3. Is about 10% faster than XmlSerializeto serialize the data because since you don’t have full control over how it is serialize, there is a lot that can be done tooptimize the serialization/deserialization process.
4. Can understand the SerializableAttribute and know that it needs to be serialized
5. More options and control over KnownTypes
Disadvantages:1. No control over how the object is serialized outside of setting the name and the order

What is Serialization?

Let’s start with the basics.  Serialization has been a key part of .Net since version 1.  It is basically the process of converting an object instance into a portable and transferable format.  The objects can be serialized into all sorts of formats.  Serializing to Xml is most often used for its interoperability.  Serializing to binary is useful when you want to send the object from one .Net application to another.  .Net even supports the interfaces and base classes to build your own serializes. There are libraries out there to serialize to comma delimited strings, JSON, etc.
Deserialization is basically the reverse of serialization.  Its the process of taking some data (Xml, binary, etc) and converting it back into an object.

What is the XmlSerialzer?

For those that may not be familiar with System.Xml.Serialization.XmlSerializer let’s go over it briefly.  This is the xml serializer that has been around since .Net version one.  To serialize or deserialize an object, you basically just need to create an instance of the XmlSerializer for the type you want to work with, then just call Serialize() or Deserialize().  It works with streams, so you could serialize to any stream such as an MemoryStream, FileStream, etc.
// Create serializer for the type
System.Xml.Serialization.XmlSerializer xmlSerializer =
new System.Xml.Serialization.XmlSerializer(typeof(MyType));

// Serialize from an object to a stream
xmlSerializer.Serialize(stream, myInstanceOfMyType);

// Deserialize from a stream to an object
myInstanceOfMyType = (MyType)xmlSerializer.Deserialize(stream);
However not just any object can be serialized.  It supports a number of the base types in .Net and most custom types.  Many people think that theSerializableAttribute is required on the class in order for it to be serializable by the XmlSerializer, but this is not the case.  It is good practice to use the SerializableAttribute, but it not required.  As long as your class contains all types that the serializer understands, then it will work.  You need to break outIXmlSerializable to implement your own custom serialization for types that the XmlSerializer cannot understand.  Any public property that is of a known serializable type and has a get and set will be serialized by the XmlSerialzer.  This can be referred to as an “opt-out” approach, because you chose what you don’t want to include, not what you want to include.
There are a number of attributes you can use in your class to change how it is serialized:
  1. System.Xml.Serialization.XmlIgnoreAttribute: This is used to mark a public property as “not to be serialized”. This is the “opt-out” approach used by the XmlSerializer.  There are no properties on this attribute.
  2. System.Xml.Serialization.XmlRootAttribute: This is used on the class itself to change the name or namespace of the root node. The following properties are supported on the attribute:
    1. AttributeName: Gets or sets the name of the XML attribute.
    2. DataType: Gets or sets the XSD data type of the XML attribute generated by the XmlSerializer.
    3. Form: Gets or sets a value that indicates whether the XML attribute name generated by the XmlSerializer is qualified.
    4. Namespace: Gets or sets the XML namespace of the XML attribute.
    5. Type: Gets or sets the complex type of the XML attribute.
    6. TypeId: When implemented in a derived class, gets a unique identifier for this Attribute.
  3. System.Xml.Serialization.XmlAttributeAttributeSerialize the property as an xml attribute.  You can specify things such as the name to use (instead of the property name).  The following properties are supported on the attribute:
    1. AttributeName: Gets or sets the name of the XML attribute.
    2. DataType: Gets or sets the XSD data type of the XML attribute generated by the XmlSerializer.
    3. Form: Gets or sets a value that indicates whether the XML attribute name generated by the XmlSerializer is qualified.
    4. Namespace: Gets or sets the XML namespace of the XML attribute.
    5. Type: Gets or sets the complex type of the XML attribute.
    6. TypeId: When implemented in a derived class, gets a unique identifier for this Attribute.
  4. System.Xml.Serialization.XmlElementAttribute: Serialize the property as an xml element.  You can specify things such as the name to use (instead of the property name), whether or not to serialize it if it is null, the order to serialize the property in relative to other properites, etc. The following properties are supported on the attribute:
    1. ElementName: Gets or sets the name of the XML element.
    2. DataType: Gets or sets the XSD data type of the XML attribute generated by the XmlSerializer.
    3. Form: Gets or sets a value that indicates whether the XML attribute name generated by the XmlSerializer is qualified.
    4. Namespace: Gets or sets the XML namespace of the XML attribute.
    5. IsNullable: Gets or sets a value that indicates whether the XmlSerializer must serialize a member that is set to nullNothingnullptra null reference (Nothingin Visual Basic) as an empty tag with the xsi:nil attribute set to true.
    6. Order: Gets or sets the explicit order in which the elements are serialized or deserialized.
    7. Type: Gets or sets the complex type of the XML attribute.
    8. TypeId: When implemented in a derived class, gets a unique identifier for this Attribute.
Here is an example class setup to use the XmlSerializer.  The only thing fancy here is that I don’t want to serialize the Social Security number:
[System.Serializable]
public class Individual
{
private string m_FirstName;
private string m_LastName;
private int m_SocialSecurityNumber;

public string FirstName
{
get { return m_FirstName; }
set { m_FirstName = value; }
}

public string LastName
{
get { return m_LastName; }
set { m_LastName = value; }
}

[System.Xml.Serialization.XmlIgnore]
public int SocialSecurityNumber
{
get { return m_SocialSecurityNumber; }
set { m_SocialSecurityNumber = value; }
}

public Individual()
{
}
public Individual(string firstName, string lastName)
{
m_FirstName = firstName;
m_LastName = lastName;
}
}

What is the DataContractSerializer?

The System.Runtime.Serialization.DataContractSerializer is new in .Net 3.0 and was designed for contract-first development and speed.  Specifically it was brought in to be used by Wcf, but can be used for general serialization as well. Using the DataContractSerializer isn’t that much different than using the XmlSerializer.  There are a few more options, but the only real key difference is that you use a WriteObject() method to serialize instead of a Serialize() method and a ReadObject() method to deserialize instead of a Deserialize() method.  It works with the same types of streams, so you can write to memory, files, etc.
DataContractSerializer dataContractSerializer =
new DataContractSerializer(typeof(MyType));

// Serialize from an object to a stream
dataContractSerializer.WriteObject(stream, myInstanceOfMyType);

// Deserialize from a stream to an object
myInstanceOfMyType = (MyType)dataContractSerializer.ReadObject(stream);
One thing to note: before you can use the DataContactSeriliazer, you must include a reference to System.Runtime.Serialization.  mscorelib gives you some parts of System.Runtime.Serialization, but you must include this reference to get the DataContactSeriliazer and the associated attributes.
image
Again, not just any object can be serialized.  It supports a number of the base types in .Net and most custom types.  One nice advantage that the DataContractSerializer has over the XmlSerializer is that it understands the SerializableAttribute and classes built for the XmlSerializer or ISerializable.  So if your class is declared with an DataContactAttribute and it contains a type that uses the SerializableAttibute, all will be well.  Unlike the XmlSerializer though you must define either the SerializableAttribute or the DataContractAttributeon the class in order for it to be serializable by the DataContractSerializer.
The DataContactSerializer implements an “opt-in” approach.  The basically means that you have to explicitly say what will be serialized by adding the DataMemeberAttribute to it.  The nice thing about this is that this attribute can be applied to fields and well as properties, you can set it on any access modified (private, protected, etc) not just public, and you can use it on properties that do not have a “set”.  However if you label a property that doesn’t have a “set”, then you can only serialize that property.  You won’t be able to deserialize it since it has no idea how to set the property.  This also means you cant use it for communication over Wcf.  But you can still use a “private set” to ensure that your model is clean.  However, you cannot specify that properties should be xml attributes and control other more complex things about how the xml will look.  It hurts not having this flexibility, but because of this rigidness, the format is highly predictable and the serializer can make some big optimizations.  The DataContractSerializer can serialize and deserialize about 10% faster than the XmlSerializer.  This can be pretty significant if you are working with a lot of data.
There are really only 2 attributes to use in your class:
  1. System.Runtime.Serialization.DataContactAttribute: Declares that the class is serializable and allows you to specify the namespace and name to serializeit as.  This is similar to a combination of the SerializableAttribute and XmlRootAttribute in the XmlSerializer. The following properties are supported on the attribute:
    1. Name: Gets or sets the name of the data contract for the type.
    2. Namespace: Gets or sets the namespace for the data contract for the type.
    3. TypeId:When implemented in a derived class, gets a unique identifier for this Attribute.
  2. System.Runtime.Serialization.DataMemberAttribute: This is used to declare a property or a field to be serialized.  This can work with any access modifier.  The following properties are supported on the attribute:
    1. EmitDefauleValue: Gets or sets a value that specifies whether to serialize the default value for a field or property beinserialized.
    2. IsRequired: Gets or sets a value that instructs the serialization engine that the member must be present when reading or deserializing.
    3. Name: Gets or sets a data member name.
    4. Order: Gets or sets the order of serialization and deserialization of a member.  This can be pretty powerful if you have fields that might depend on one another and you really need to define the order that the properties are serialize and deserializein.
    5. TypeId: When implemented in a derived class, gets a unique identifier for this Attribute.
If you are gointo work with the DataMemberAttribute, here is an concise post about best practices around it:http://blogs.msdn.com/drnick/archive/2008/02/22/datamember-best-practices
Below example class setup to use the DatContractSerializer.  Notice that I am explicitly setting the DataMemberAttribute on the properties I want to serialize, but not on the others.
[DataContract]
public class Individual
{
private string m_FirstName;
private string m_LastName;
private int m_SocialSecurityNumber;

[DataMember]
public string FirstName
{
get { return m_FirstName; }
set { m_FirstName = value; }
}

[DataMember]
public string LastName
{
get { return m_LastName; }
set { m_LastName = value; }
}

public int SocialSecurityNumber
{
get { return m_SocialSecurityNumber; }
set { m_SocialSecurityNumber = value; }
}

public Individual()
{
}
public Individual(string firstName, string lastName)
{
m_FirstName = firstName;
m_LastName = lastName;
}
}

One other important thinto talk about with the DataContractSerializer are the ServiceKnownTypeAttribute and KnownTypeAttribute attributes.  These are similarto the XmlIncludeAttribute used by the XmlSerializer.  When used in Wcf, these identify what types should be represented in the WSDL that is generated.
The KnownTypeAttribute specifies types that should be recognized by the DataContractSerializer when serializing and deserializing a type.  It is applied to a class and basically specifies what other types are used in the class.  You don’t need to specify known .Net types, but any custom classes should be added here.  This attribute can be used multiple times to identify multiple types.
[DataContract]
[KnownType(typeof(MyOtherType))]
public class MyType
{
[DataMember]
public MyOtherType TheOtherType;
}

[DataContract]
public class MyOtherType
{
[DataMember]
public string MyValue;
}
The ServiceKnownTypeAttribute specifies known types to be used by a service when serializing or deserializing.  It is applied to a ServiceContract or to an OperationContract and specifies what types are used in the methods.  Again (like the KnownTypeAttribute), you don’t need to specify known .Net types and this attribute can be used multiple times to identify multiple types.
[ServiceContract]
[ServiceKnownType(typeof(MyType))]
[ServiceKnownType(typeof(MyOtherType))]
public interface MyService
{
[OperationContract]
[ServiceKnownType(typeof(YetAnotherType))]
void MyMethod();
}

What about the NetDataContractSerializer?

I haven’t really mentioned this yet because it is just like the DataContractSerializer, but there is also a System.Runtime.Serialization.NetDataContractSerializer.  It differs from the DatContactSerializer in that it includes CLR type information in the serialized xml, whereas the DataContractSerializer does not.  So it can only work if the serializing and derserializing ends share the same CLR types.   The nice thing about this serializer is that since the CLR type information is sent around, you don’t have to implement the ServiceKnownTypeAttirbute or KnownTypeAttribute. Like the DataContractSerializer it can work serialize types that implement either the DataContractAttribute, SerializableAttribute, or ISerializable.
I don’t recommend using this serializer often.  If possible you should declare your known types explicitly for better understand of the code and of course for the greaterinteroperability.

Why another xml serializer?

So why the need to even create another xml serializer? The XmlSerializer has served us well over the years.  Well part of it it speed.  Since the DataContracts are faster to serialize because the structure is predictable and can be more highly optimized.  This results in about a 10% performance gain.  If you are working with pure Wcf the gain in speed is probably worth the trade off in loss of control over what the Xml is to look like.  Sometimes you might need to control what the xml looks like to get it to fit to some other schema.  If that is the case, then you will want to switch to the XmlSerializer.
There is more to the story.  Microsoft wants us think of in terms of “Contracts” with Wcf.  We often hear the term “Contract First Development”.  Some of this principal is already in play by forcing all Wcf Services to be defined in an interface (contract).  In the old days of pure WebServices, you could just tag a class as WebService and you didn’t need a separate interface or contract.  In some respects, the DataContractSerializer is pushing us down this route too.  By declaring a class as a DataContract and explictily setting the DataMembers, you are building a contract of how the class should look.  One could argue that you could do this through the Serializable attribute, however you don’t define what goes in the contract. Instead you only specify what isn’t suppose to be serialized and the serializer decides what is should serialize.  There is no way to look at the class or use reflection and really get a good feel for the data contract for the class.  While with the DataContract and DataMember attributes, you could use reflection to see exactly what the contract is.

How to change Wcf to use a different serializer?

By default Wcf uses the DataContactSerializer, so if you want to use it, you need to do nothing else.  If you want to use the XmlSerializer through, all you need to do is add the System.ServiceModel.XmlSerializerFormatAttribute to the contact interface.  The nice thing about this attribute is that it can be applied to the entire service contract, or just to an operation contract.  So you could keep the entire service as a whole using the DataContractSerializer, but only the methods you chose to use the XmlSerializer.
[ServiceContract]
[XmlSerializerFormat]
public interface MyService
{
[OperationContract]
[XmlSerializerFormat]
void MyMethod();
}
You can also set the service to use the XmlSerilizer by default, but specify which methods use the DataContractSerializer with the help of theSystem.ServiceModel.DataContractFormatAttribute:
[ServiceContract]
[XmlSerializerFormat]
public interface MyService
{
[OperationContract]
[DataContractFormat]
void MyMethod();
}

Now what about changing to use the NetDataContractSerializer.  Or what if I want to use a different serializer or create my own? In order to do this you need to create a custom operation behavior using the IOperationBehavior interface and the Attribute class.
Here is an example of a custom operation behavior written to use the NetDataContractSerialier.  You could use this, build your own, or modify it to work with a differentserializer.
public class NetDataContractFormatAttribute : Attribute, IOperationBehavior
{
public void AddBindingParameters(OperationDescription description, BindingParameterCollection parameters)
{
}

public void ApplyClientBehavior(OperationDescription description, ClientOperation proxy)
{
ReplaceDataContractSerializerOperationBehavior(description);
}

public void ApplyDispatchBehavior(OperationDescription description, DispatchOperation dispatch)
{
ReplaceDataContractSerializerOperationBehavior(description);
}
public void Validate(OperationDescription description)
{
}

private static void ReplaceDataContractSerializerOperationBehavior (OperationDescription description)
{
DataContractSerializerOperationBehavior dcs = description.Behaviors.Find<DataContractSerializerOperationBehavior>();

if (dcs != null)
description.Behaviors.Remove(dcs);

description.Behaviors.Add(new NetDataContractSerializerOperationBehavior(description));
}

public class NetDataContractSerializerOperationBehavior : DataContractSerializerOperationBehavior
{
private static NetDataContractSerializer serializer = new NetDataContractSerializer();

public NetDataContractSerializerOperationBehavior(OperationDescription operationDescription) : base(operationDescription) { }

public override XmlObjectSerializer CreateSerializer(Type type, string name, string ns, IList<Type> knownTypes)
{
return NetDataContractSerializerOperationBehavior.serializer;
}

public override XmlObjectSerializer CreateSerializer(Type type, XmlDictionaryString name, XmlDictionaryString ns, IList<Type> knownTypes)
{
return NetDataContractSerializerOperationBehavior.serializer;
}
}
}Once you have this you can just mark up your service like before using the new attribute:

[ServiceContract]
[NetDataContractSerializerFormat]
public interface MyService
{
[OperationContract]
void MyMethod();
}
Once you have this you can just mark up your service like before using the new attribute:
[ServiceContract]
[NetDataContractSerializerFormat]
public interface MyService
{
[OperationContract]
void MyMethod();
}

When to use which serializer?

For Wcf, you should prefer to use the DataContractSerializer.  If you need full control over how the xml looks though, you should go back to the XmlSerializer.   If you are doing general serialization, it is up to you, but I would weigh out the advantages and disadvantages.  I would still prefer the DataContractSerializer for the same reasons I prefer it for Wcf.  I do not recommend using the NetDataContractSerializer unless you have to.  You can lose too much interoperability and its not as descriptive.
If you need some custom xml serializer, by all means go ahead and implement it.  Wcf supports any serializer you can throw at it.  Just be careful not to “reinvent the wheel”.  The XmlSerializer is very configurable and may suit your needs. If that doesn’t work, then ISerializable gives you full control.