Bugs, Part 2

We’ve been using Manuscript (originally Fogbugz) to manage the project, starting during the first week of development in August 2014. Since then, the team and playtesters have created 3412 cases. This breaks down into Manuscript’s categories as:

Donut chart of categories123 Inquiries. This includes playtester feedback and comments, and 75 completed games. I try to do some analysis of each game, to learn where players get stuck and to make sure things are tuned OK. Once in a while this uncovers bugs.

485 Features and 61 Schedule Items. These represent tasks like like “Map Creation,” “iPhone X support” or “Sweep to be sure ChooseLeader is followed by a leader test.” The two categories are pretty similar, but a Feature would probably be passed to QA to check, and a Schedule Item could usually just be marked completed.

2714 Bugs. These are things that didn’t behave as expected. They’re typically fixed, then verified by QA as working correctly. Since we added playtesters over time, we tended not to get a lot of duplicate bugs (though it’s never a problem if we do, since a different report may give insight into reproducibility, and they’re easy to verify).

Overall, we closed 3313 of the 3412 cases. Of the 99 cases not closed, 26 are feedback that I was keeping handy (and probably should close to clean up the project). Most of the rest are issues that we deferred as part of the triage process, (see part 1) or as features that would be nice to have in an update.

Donut chart showing 99% Fixed, 1% Open bugs2695 of 2714 bugs have been closed. A few are still being verified as fixed, or were deferred for an update. (That’s about 1.8 bugs per day over the entire project.)

I don’t really like managing purely by numbers (the way a really large project might have to), but it’s good to see that our gut feeling that the game is solid is also backed by data.

(part 1 of the series on bugs is here)

Bugs, Part 1

I don’t know if this is a dark secret of game developers, but games ship with bugs. Known bugs.

Status and Triage posts from Slack

Part of the late development process is reviewing the bug list to see what most needs fixing. We all hate bugs, but we also love shipping. Not everything has to be fixed before release. The closer we get to making the release build, the stricter the criteria become. We’ve recently deferred a bunch of bugs that we would have fixed when we started the triage process.

Why be so strict? Because many fixes can have unintended consequences. We just discovered a fairly rare bug yesterday, which had been introduced by a change we made last year (which was tested back then and found to work perfectly as designed). Since even a simple bug fix can destabilize the game, we want to make sure any last minute fixes are truly warranted.

At the end of the process, the only bugs to be fixed are those that affect all players, prevent play, or make us look stupid. Most of what we’ve been finding lately have been issues that come up if you have tons of feuds, or when you made certain decisions and then ally a particular clan. These aren’t going to be that common. And even the players who end up doing that won’t be blocked from the game, or find it illogical. My classic “look stupid” bug was a game I saw years ago, where the publisher spelled their own name wrong on the title screen. Luckily we haven’t found anything like that.

But of course, we do want to fix all the bugs. So I’ve been making the fixes in an alternate branch of the code. As soon as everything is locked down in the App Store, we’ll switch over internally and start testing that.

Part 2 of “Bugs”

A Week of Polishing

Right now we’re finding bugs and tuning the game. What exactly does that mean?

Liana stressed one of the game’s new features (using it as much as possible in a complete game), and wrote up a bunch of issues: it was inconsistent, wasn’t clear enough, didn’t feel like good value for the cost, and felt too generic. So I reviewed what was going on. Some of the inconsistency was actually just feedback and advice that was out of date. I added a new way to see all the effects. There are new side effects to help add value. And there is now the possibility of interactive results instead of a boilerplate response. These are new situations that need to be exhaustively tested, so that’s now on Kati’s plate. Some of these changes needed to be mentioned in the manual, too.

Liana had previously played a game where her main goal was to increase the size of the clan. I had fixed some of the specific bugs this revealed, and also started applying some new consequences and activated a new scene I had written the previous week.

I tracked down and fixed a bug which could make the game unplayable if you made peace with a clan right before it was scheduled to raid you (due to previous events).

Report buttonAnother timing bug (reported by a playtester) had to do with when the followup for an omen occurred. And another was a possible conflict with the seasonal calendar (you might be willing to ignore the harvest, but would your allies?).

One bug had to do with dead emissaries returning with gifts. This seemed like an issue that could happen in scenes other than the one it was reported for, so I swept the entire code and found another six places.

Raided By: NoneOne game had very few raids. I looked at the logs and data, and made some adjustments to what are probably the underlying causes. (Raids look at many factors, such as relative strength and different aspects of inter-clan relationships.)

While playing, I was surprised not to get a relevant bit of advice on one situation. The exact set of advisors makes a huge difference on what advice is given, as does the context in the story, but I ended up changing the priority of one advice type. In theory this is a big change, but I believe it is more in line with how Robin and I used that type. We’ll need to play for a while and make sure you still get good advice.

A conversation with Zack sparked an idea that I need to implement this week.

By the end of the week, there were bugs relating to the new interactive results, so I fixed those.

And there were a number of typos and text edits.

So that’s a fairly typical week of polishing a game which is at the alpha stage.

Find Once, Fix Everywhere

Just got a bug report from QA that a scene threw up an error. It was pretty easy to see that it couldn’t find a friendly clan to visit you.

That turned out to have happened because of using the debug tools to have everyone feuding (to make it easy to test resolving feuds). This is an unlikely situation, but it’s certainly possible to get a lot of clans to hate you, if you play the right (wrong?) way. So it’s a valid bug, and I fixed it.

Given how many scenes are in the game, I figure any bug is probably somewhere else as well. So figured I should go back to our automated tests and run them in their nonstandard configurations. I don’t usually do this, partly because they’re mutually exclusive, but also because some of them can take a long time (in fact, as I type this I’m running one … that just finished in just under 26 minutes).

But sure enough, enabling TEST_ALL_FEUDING found several scenes that would have trouble. It also gave a lot of false positives, since the brute force testing ignores scene conditions. (Scenes almost always have conditions that prevent them from being randomly chosen when they’re not appropriate.)

So today I’ve been dealing with that, and also TEST_NO_CHIEF, TEST_NO_GOODS, TEST_NO_RING, TEST_NO_WARRIORS, TEST_ALL_HATE, and TEST_Q_BRANCHES.

Leveraging QA

giving a potRight now, the game is undergoing internal testing while being finished. Most of the bugs are easy to deal with: fix a typo, clarify something that’s unclear, fix a logic error, or make sure advice and recommendation match.

One that came in today had to deal with being unable to proceed after having decided to give gifts, when your clan had no goods to give. The specific script was

[ChooseYesNo("Do you take a gift?")]
yes: {
    w = ChooseGoods(“What do you give to them?”)
    TransferGoods(otherClan, w)
}

Normally players who have no wealth wouldn’t be choosing to gift, and it wouldn’t be a problem. But of course part of QA’s job is to test the boundaries.

It would be easy enough to fix this scene, but this has the potential to be a wider problem. ChooseGoods is used fairly often. Any of those situations had the potential of failing, too. One answer to that is to do a code sweep, searching for all occurrences, and making sure they are conditioned. But that’s a manual step, which means it can be prone to error (especially if there are lots of places).

Another is to have the computer do this. We already have a unit test that exhaustively runs scenes. So I set goods to 0 in this test, and found six other problem scenes.

So human QA gives the best results, but automation can give decent testing over the entire game, and it’s easier to set up certain conditions (usually a clan doesn’t stay at 0 goods long, since crafters and traders are continually making more).

Proofreading

Like King of Dragon Pass before it, Six Ages will have a lot of text. Which means a lot of opportunity for typos or other misspellings.

Most of the text is in OSL scripts, such as these excerpts:

saga: <expeditionLeader> was attacked, but returned home with <his/her> escort.
sagaText: The worshipers suffered the same fate.
text: We eventually pieced this together from stories told by wandering traders.

The scene compiler outputs all strings into a single text file. It looks something like this:

<expeditionLeader> was attacked, but returned home with <his/her> escort.
<expeditionLeader> was attacked and wounded, but returned home with <his/her> escort.
Unfortunately, <he/she> lost the livestock they were driving home.
<expeditionLeader> and <his/her> escort <disappeared mysteriously/were ambushed by trolls and completely devoured>.
The worshipers suffered the same fate.
We eventually pieced this together from stories told by wandering traders.

And that file can be spellchecked. I just use TextEdit. The biggest issue with proofreading is that the game uses a lot of proper names and jargon specific to Glorantha. Luckily it’s easy enough to add “Orlanth” to the dictionary (or the ignore list). More problematic is that variable names (like expeditionLeader) also show up here, though ignoring them usually works too.

Another issue is that it’s a big file. It may be generated by our tool but it still takes a human a while to review, so that doesn’t happen often (in fact the first complete review was today).

This is just a brute force pass. Many typos end up with words that are spelled correctly. And once in a while game-specific names get misspelled. So QA still needs to keep an eye open for problems.

Emailing Debug Info

Six Ages tries to make it easy for the player to send us debug info.

Previously, we saw that the game keeps a pair of debug logs on the player’s device. Back in the late 1990s when were did this for King of Dragon Pass, we asked players to navigate to those files (with Finder or Windows Explorer), and send them.

When I adapted King of Dragon Pass for iOS, I had to use a different approach (since the files aren’t accessible). Luckily, iOS had an easy way to send an email message, and we were using Fogbugz to do bug tracking. So the game let the player mail in a report (to the special Fogbugz address), automatically attaching the log, as well as the most recent saved game.

In Six Ages, I’ve improved that slightly, compressing the files into a ZIP archive, and including a few other debug files that help with tuning.

I’m also detecting certain kinds of crashes, and offering to email the report. (In this case, I take a screen shot and include that in the ZIP file.)

Since some crashes prevent this, I also try to detect a crash on the next launch, and offer to send the report then.

This is all fairly trivial code (I’m not using one of the third party libraries that no doubt do a much better job with edge cases), but it makes sending the debug info much simpler for the player. Which makes fixing the bug much easier for me.

Debugging: The Log

Like King of Dragon Pass, Six Ages logs all scripts, which helps show exactly what happened and the state of the game.

Right now a lot of my time is spent fixing bugs in the code. Given that the game is large and complex, it can be hard to know just what happened that led up to a bug. So we keep track of everything important.

This is actually something that we did in King of Dragon Pass. Shawn Steele was implementing our scripting language, OSL, and wanted a way to check that things like conditionals and calculations worked. So he came up with a way to log this to a file. You can see his focus from some of the options:

kAllBranches = 0x01,
 kListSizeing = 0x02,
 kTraceOSL = 0x04,
 kTraceMath = 0x08, // Extra COSL::StartMath debugging information
 kSetVariables = 0x10,
 kTraceFixed = 0x20, // Extra CFixed debugging information
 kLoadVariables = 0x40,
 kGetStrings = 0x80,
 kTraceTribes = 0x100,
 kTraceOSLVariables = 0x200, // CFixed::PrintFileDebug should print OSL Variable Names the hard way
 kTraceMathResult = 0x400,
 kTraceList = 0x800, // Attempt to show list content

The output shows addresses and opcodes, showing its focus on debugging the language:

OSL 0xd8b694 Running from 0, fRunDepth 0:
-------------------------------------------
0000 : 0202 019E Picture "scene018" 
0002 : 0203 0001 Position 0001 
0005 : 4005 E
 Loading Variable 4005 (E) 0.000 (gValue)
0006 : 0800 = 
0007 : 068D RandomElement 
0008 : 0480 ( 
0009 : 070E ClanMembers 
000A : 0841 - (subtract) 
000B : 0718 RingMembers 7ffa0000 -- P00000000:0021a700 (gValue)
 Result: 7ffa0000 -- P00000000:005e18fe (gValue)
000C : 0481 ) -- gValue = 7ffa0000 -- P00000000:005e18fe (gValue)7ff20001.0000 : Brenna (gValue)
 Setting Variable 4005 (E)
 Result: 7ff20001.0002 : Brenna (gValue)
000E : 4021 otherClan
 Loading Variable 4021 (otherClan)7ff30017.0000 : Blue Jay (gValue)
000F : 0800 = 
0010 : 068D RandomElement 
0011 : 0480 ( 
0012 : 068A NeighboringClans 
 4 COSL::Neighbors:
Clans:
 9 Boskovi
 15 Blackrock
 18 Greydog
 23 Blue Jay

0013 : 0481 ) -- gValue = 7ffb0000 -- C00000000:00848200 (gValue)7ff30012.0000 : Greydog (gValue)
 Setting Variable 4021 (otherClan)
 Result: 7ff30012.0000 : Greydog (gValue)
0015 : 4012 R
 Loading Variable 4012 (R) 0.000 (gValue)
0016 : 0800 = 
0017 : 0600 FALSE 0.000 (gValue)
 Setting Variable 4012 (R)
 Result: 0.000 (gValue)
0018 : 0212 01A4 Saga "<4005> told us we should take in Orlkensor Bronzebones, a warrior outlawed by the <21>.plural." 
 Loading Variable 4005 (E)7ff20001.0002 : Brenna (gValue)
ReplacePlaceHolders...7ff20001.0002 : Brenna (gValue)
 Loading Variable c321 (otherClan)00000000.0000 : Greydogs (gValue)
ReplacePlaceHolders...00000000.0000 : Greydogs (gValue)
001A : 0201 01CF Music "IsItAdventure" 
001C : 0A00 NewChoices

Once OSL was reliable, it turned out that the log was useful to help debug the OSL scripts themselves. If something weird happened, you could see what code branch was taken, and what some of the variables were. For example, the output above shows all the neighboring clans

And we output other information to the log, such as the scene queue, some of the clan decisions, and more.

The debug log can grow quite large (a quick search shows one at 7.7 MB), so the game makes a new one every time you launch. But if you had to relaunch because of a crash, that would mean that any evidence would be deleted. So actually, we rename the log, and actually delete the previous one.

While I reworked OSL for Six Ages, I based it on Shawn’s work, and wasn’t so concerned about debugging the language itself. Instead, I wanted to focus on the scripts, since that seemed like where most of the bugs would be.

<OSL: 0x170194ec0> ® 2 Affiance her to …
-----------------
: Saga "We affianced her to <.an> <ourClan> groom." {a} {C_1 Arrowstone}
: (ourClan).commoners {277}
: += 
: # 1 {277} {278}
 ourClan.commoners ← 278
: ChooseYesNo
: ( 
: String "Do you accept the 10 cows?" 
: ) ↖︎ = Do you accept the 10 cows?
sendToCurrentScene: kNewChoice
… exploded
Restarting OSL (self.result=kNewChoice)

This is more compact (thus easier to read): more values are shown on the same line, and it doesn’t bother to show addresses or opcodes.

The next post will talk more about how to make use of this.