SharePoint: Updating Data Structures

One of the most problematic aspects of change requests in SharePoint projects is the update mechanism. You have WebParts written in C# and the accompanying .webpart files. These are based on data stored in lists based on a declarative XML specification. The lists themselves refer to ContentTypes, also specified using XML. ContentTypes rely on Site Columns, as found in a third kind of XML file. The data is formatted using CSS files, and all of this is activated according to a XML based Feature. Your solution is already deployed, your users have filled it with actual production data. And then they notice that something has to change. You have to update the whole solution.

SharePoint ships with an update mechanism for solutions, accessible via the PowerShell commandlet Update-SPSolution. How does it work? I can’t tell for sure. Handling all the different kinds of declarative data structures is difficult, and sometimes counterintuitive. Have a look at the feature definitions. When you add a new feature to your solution, it is actually ignored by Update-SPSolution. You have to add a new solution package instead. This leads to solutions with technical feature names, such as “Web” and “Site”. These features contain everything used in this scope. Most users don’t like these features, they prefer names from their domain language instead of obscure technical terms. This is something many developers learned to work around. But there are more complex problems. Given a ContentType with a field called A. Now this field has to be renamed to B. If you update the respective XML files, how does Update-SPSolution handle this? Does is rename the field? Does it remove A and add B? Does is anything at all? It’s this kind of uncertainty which leads many developers to one-shot solutions, devoid of any future updates. You could look it up for any special case, but in general you won’t find a satisfying answer. Retracting and deploying the solution is no alternative, because this would delete all the existing user data.

SharePoint is not the only platform which has to deal with mutable data structures. Lets have a look at SQL. SharePoint isn’t a relational database management system, but the concept of SharePoint lists and SQL tables are close enough to get inspirations. It is important to note that the declarative list definitions are somewhat comparable to CREATE TABLE, while the SharePoint object model can also be used to mimic ALTER TABLE. Tools like LiquiBase and dbDeploy help SQL developers to update data structures without losing user data. Their basic concept is the change set. A change set contains all changes from one version of the database to the next. When applied incrementally to an empty database, you finally reach the current data structure. You can also use them to update existing databases. If your production database runs in version 15, and you install the update including the change sets up to version 22, only the sets 16 to 22 are applied. Change sets explicitly specify the required change, not the resulting structure. This allows to rename fields without losing data, because the change set makes it clear that A is renamed to B, in contrast to removing A and adding B. While the resulting structures would be the same, the first case retains the user data while the second one yields an empty column.

The concept of change sets can be also implemented in SharePoint. The object model provides everything you need. The most comfortable way for developers is to add extension methods to SPList etc. using a fluent syntax:

SPList list;
SPContentType contentType; 
list.Edit(0).AddContentType(contentType).Apply();

Edit(0) specifies that the following change set is only to be applied if the current version of the list equals 0. AddContentType specifies an operation in this change set. You could also chain operations. The final Apply() then checks whether the current version of the list is actually 0. If it is, all operations are applied and the version is incremented by one. When you run the same code a second time, Apply() detects that the change set at hand has already been applied and ignores it. This allows for incremental updates:

list.Edit(0).AddContentType(contentType).Apply();
list.Edit(1).RemoveChildContentTypesOf(contentType).Apply();

In this case, the previously added content type is removed in the next step. You might think that you could also delete the first line instead of adding a second one, so that it isn’t even added in the first place. But the point of this mechanism is to deal with already deployed data structures, so you can only append operations, never change the ones already deployed on production systems. The change set implementation could look like this:

using System;
using System.Collections.Generic;

public abstract class ChangeSet<T> : IChangeSet
{
  private readonly List<Action<T>> _changes = new List<Action<T>>();
  private readonly T _entity;
  private readonly int _fromVersion;
  private readonly string _contextId;

  protected ChangeSet(T entity, int fromVersion, string contextId)
  {
    _entity = entity;
    _fromVersion = fromVersion;
    _contextId = contextId;
  }

  protected ChangeSet(ChangeSet<T> changeSet)
  {
    _entity = changeSet._entity;
    _fromVersion = changeSet._fromVersion;
    _contextId = changeSet._contextId;
    _changes = new List<Action<T>>(changeSet._changes);
  }

  protected ChangeSet(ChangeSet<T> changeSet, Action<T> change) : this(changeSet)
  {
    _changes.Add(change);
  }

  protected abstract Uri WebUrl { get; }
  protected T Entity { get { return _entity; } }

  public void Apply()
  {
    var versionAccess = new VersionAccess(WebUrl);
    var entityVersion = versionAccess.GetVersion(_entity, _contextId);
    if (entityVersion < _fromVersion)
      throw new MissingChangeSetsException(_entity, entityVersion, _fromVersion, _contextId);
    if (entityVersion > _fromVersion)
      return;

    foreach (var change in _changes)
      change(_entity);
    OnPostChanges();
    versionAccess.SetVersion(_entity, _fromVersion + 1, _contextId);
  }

  protected abstract void OnPostChanges();
}

You should store the versions numbers in a hidden list. Unlike the web properties, you can concurrently two different items in a list without overwriting without conflicts, while web properties are always persisted as a whole.

Since you usually don’t have a central point of data access in SharePoint projects, you also don’t have a central point to manage all changes in the data structures. Therefore multiple modules in your code might want to apply change sets to the same object, for example the root web. You can deal with this by storing the version together with a context id:

list.Edit(0, contextId).AddContentType(contentType).Apply();

Given these ids, the order in which your features are activated and the updates are applied becomes irrelevant. By the way, feature event receivers are a good place to apply change sets. As you don’t have declarative data structures anymore, your data structures won’t be removed when you retract your features. This allows you to use the retract/deploy mechanism to install updates, including all the bells and whistles of a new deployment such as adding new features. The feature activated event is then used to perform the update.

We developed and implemented the change set concept at adesso. We might publish the implementation, be it open or closed source, but this is still undecided. If you are interested in this, please leave a comment. I am not the one who decides, but we are actively seeking opinions on this, so you will actually influence the outcome.

3 thoughts on “SharePoint: Updating Data Structures

  1. Hi Malte,
    My team and I have a similar requirement for managing in-place structure updates today for SP sites we’ve been automating creation by code (very few declarative xml). We are seeking existing projects/implementations on this need, but I’m afraid it is not very common topic online. Did you decide what to do with yours?

  2. Hi Jean-Noel, I’m sorry, as far as I know the implementation is still for internal use only, and it’s not going to change. I left adesso about a year ago, so I cannot speak for them now, but that’s the latest position I heard.

    Nevertheless I would like to encourage you to follow this path, even if you have to implement it on your own. It’s not that hard once you get a grip on the mental model behind it. I hope my postings can help in this aspect.

    • Thanks for the update and no worries. Your blog gave us the confidence that we’re on the right track, we already rolled out our own for SQL updates and will do the same for SP objects then.

Comments are closed.