Wednesday, June 28, 2017

[IndieDev] The Nitty Gritty on Save Files, Part 1

Persistence is possibly one of the largest drivers of repeated and extended interaction a game can have. RPGs persist campaign data between sessions; puzzle games persist how far you've been in the game and how well you beat each puzzle; even ye olde arcade games persisted high scores for all to see (until someone rebooted the arcade machine, anyhow). With that in mind, creating a robust save system is one of the most important tasks you could have when developing a video game. For us developing Eon Altar and now SPARK: Resistance, this is no different.

However, even the task of gathering some data, throwing it on disk, and then loading it later comes with a bunch of potential issues, caveats, and work. I'll talk today about our initial attempts at a save system in Eon Altar, why we went that route, why it didn't work, and what the eventual solution came to be.

A Rough Start

When we first created our save system, the primary goal of it was to save character data and what session the players were on. We had a secondary goal of utilizing the same system as our controller reconnect technology, as the character data was originally mirrored on the controllers as it was on the main game: down to character model and everything. We weren't originally planning on having mid-session saves (those came later), so really we only had to worry about saving between levels.

The "easiest" way of doing this, without having to think of any special logic is to copy/paste the state from the main game into the save file, as well as the controllers over the network. In programmer terms, serialize the state, and deserialize it on the other end. Given our time/budget constraints, we thought this was a pretty good idea. Turned out in practice this had some pretty gnarly problems:

  1. The first issue was simply time. The time it took to save out a file or load up a file was in the order of tens of seconds. Serializing character objects and transferring that across the network was measured in minutes, if it succeeded at all.
  2. The second was coupling to code. Since we were serializing objects directly, it meant that any changes to the code could break a save file. If we changed how the object hierarchy worked, or if some fields were deleted and others created, then existing save files would potentially be broken.
  3. The third issue was complexity. The resulting save file was an illegible, uneditable mess. Debugging a broken or corrupt save file was a near impossible task. Editing a broken save file was also quite difficult, if not impossible. Because of this, we couldn't (easily) write save upgrade code to mitigate issue 2. We'd have been locked into some code structures forever.
  4. The fourth was just far too much extraneous data. Because we were performing raw serialization, we were also getting data about textures, character models, what were supposed to be ephemeral objects, hierarchy maintenance objects, and so on
While we had a save system that did what we wanted on the tin, it was untenable. Shipping it would've relegated our small engineering department to an immense amount of time trying to fix or work around those issues. So while this approach was "simple" and "cheap" in terms of up-front engineering cost, it was the wrong solution. We went back to the drawing board.

The Reimagining

About a year after we started development, the team shrank pretty substantially. We'd lost 1/3rd of our engineering team and my time became even more contested as I became the new Lead Programmer. I had to contend with the responsibilities that came with that title, as well as continuing to deliver features and fixes.

However, I had already been noodling on the save and reconnect systems, and had a new plan. The first step was to fix the controller reconnect, which you can read more about here

Given reconnect was taking 8 minutes each time we had to reconnect a controller, it didn't take upper management much convincing that something needed to be done. And since reconnect and save were intimately connected at the time, making a convincing argument to fix save shortly after also wasn't a hard sell. So even though I had to disappear for 2 weeks to fix reconnect, and then another 2 weeks later to fix save files, I think everyone involved believes it was the correct decision.

To fix our 4 issues, it wasn't sufficient that we just be able to save and load character data in any which manner. It needed to be quick, it needed to be decoupled from the code, it needed to be easy to read/edit/maintain, and it needed to be deliberate about what it saved out.

The Reimplementing

For humans, text is easier to read over binary, and a semantic hierarchy is more legible than raw object data. So I knew pretty early on that my save file data was going to be in XML and in plaintext.

Plaintext was important. We often get asked why do not encrypt our save files, and it comes down to maintenance. Human-legible files are easier to read and easier to fix. As an indie studio with extremely limited resources, this was a higher priority for us than preventing people from cheating their save files in a local multiplayer game. If your friend is going to give themselves infinite resources and you catch him, you can dump your coke in his lap. 

Plaintext has saved our bacon multiple times: if there is a bug that is blocking our playerbase, more enterprising players have been able to repair their own save files with careful instructions from us (and a lot of WARNING caveats) until we can get around to fixing it. Also, being able to just quickly get information from broken save files without having to decrypt them.

The benefit of using XML is we could serialize to and deserialize from programmatically without any extra work on our part: tools to do so already existed. In fact, we were already using those tools to do the old save files. The difference was instead of serializing the character object instances directly, I created an intermediate set of data that was decoupled from the objects that made up the character data instances in-game, and this data was going to be organized according to game-play semantics rather than raw object hierarchy.

An example of a simple data class, and the resultant XML.
Having actual data classes meant we could lean on the compiler to ensure data types matched up, and that we could just use existing serialization tools to spit out the save data. It did mean a fair bit of manual work to determine what goes into the save file and where, but the benefits of that work more than made up for the upfront time. Adding new fields to save data is trivial, and populating new fields via upgrade code isn't terribly difficult. Editing existing save files became super easy because the save file format was now extremely legible. Legible enough that we've had users edit their own save files easily. And good news, because the data was decoupled we could actually write save upgrade code!

Collating the data into the data classes at runtime is a super speedy process. Less than 1ms on even the slowest machines. We're only serializing the simplest of objects--data classes are generally only made up of value types, other data classes, or generic Lists of other data classes or value types. And since we weren't serializing a ton of extraneous objects that only were supposed to exist at runtime, the amount of data we'd save out was significantly reduced: 29KB for a file with 2 characters, instead of multiple MBs. We put the actual writing of the save file to disk on a background thread; once we had the data collated, there was no reason to stall the main thread any longer, and disk writes are notoriously slow.

The difficult part was going from the data classes to instanced data. Previously it would get hydrated automatically because that's what deserializing does. However, in this case we hydrated data classes, I had to write a bunch of code that recreated the instanced runtime character data based on those data classes. This required a lot of combing over how we normally generated these object instances, and basically trying to "edit" a base character by programmatically adding abilities, inventory, etc. based on the save data. It wasn't particularly hard, but it was time consuming, and potentially where most of our bugs were going to lie. But by using the same methods we call when adding these things normally at runtime allowed me to reuse a lot of existing code.

Part 2: Checkpoint Saves
We had our new save system, and it was pretty awesome. The original save system was done in approximately a week, if my memory serves, maybe a little longer. The new system took a month to implement after research, programming, and testing. Basically, you get what you invest in. Skimping on engineering time on this feature was a bad decision in my 20/20 hindsight, but we fixed it, so all is well today!

Next blog post I'll discuss the next step we took for Eon Altar: Checkpoint saves. Why checkpoints? What did we need to do to retrofit the game to handle checkpoint saves? What implementation?  What pitfalls we ran into? And then, what can we reuse for SPARK: Resistance? #IndieDev, #EonAltar