GoIO Post-Mortem – Part IV: Through the Storm

sandy_1

Releasing a game that we’ve worked on for over 2 years should be a cause for joy and celebration.  For us, it turned out to be the biggest crisis we’ve ever faced.  Leading up to release, positive and supportive articles, previews, and videos by journalists and casters gave the game some momentum.  Our Kickstarter backers and alpha players were also incredible at helping us test, fix issues, and get the word out.  Later in beta (when people could pre-order) and when the game went live, it reached #2 in Indie and #7 overall on Steam. I took a screenshot and savored for a brief moment.  Then disaster struck, literally.  Survival was once again on the line. 

images.jpg

We had known that Hurricane Sandy was building, but we had no idea how devastating it would end up being.  On the eve of the game’s launch, we found that out for ourselves first hand.  As the game went live, most of NYC, NJ, and Connecticut’s power and internet were being knocked out, and there was severe flooding and debris falling everywhere.  New York was not (and still isn’t) equipped to handle a hurricane, and the city was battered to a halt.  There were no subway lines running, no buses or cars on the street.  The wind shook buildings, and windows panes were falling away and shattering on the pavements.  Cranes on buildings under construction were torn and dangling precariously.  Trees fell, houses flooded and uprooted near the shores.

The area around our building was flooded, and we would end up having no power and internet in our space for almost a month.  A month later, after the storm had passed, I went out to the Rockaways to volunteer.  For miles, there would be nothing open, no electricity, and almost no activities of daily life.  At a gas station, people would line up over 10 blocks or what seemed like a mile to get gas under army surveillance.  At the volunteer station, we went through huge piles of various items of necessity, and it wasn’t enough.  All this hit on the night we released the game.

YIP-2012-01.ss_full.jpgUnknown.jpg

Around release, we had a lot of players in game, and a lot of them had very rough experiences.  Servers were teetering and going down as the storm slammed our hosting company’s facilities.  Steam’s servers that we used for authentication and voice were very unstable as well.  Disconnections, match slowdowns, server overloading and hangups abounded. We had slow to no access to Steam’s dev portal as well as our site, so had a desperate time supporting players or even updating our Steam page info.  We also had player reported issues with disconnections, crashes, frame drops, etc., that were new to us.  The game released, and it was at or near the top of Steam’s charts.  Yet, we had no time to do anything else but to buckle down and get to work.  I was stressed out and wanted to freak out, but players were waiting and everything that could have gone wrong was going wrong, so there was really no time for that.

For the first week, because most of us had no internet or power for long stretches, we had to divide up the support roles.  A couple of us who had internet were on player support, updating players on Steam forum, our forum, and in game for pretty much all our waking hours.  The programmers were debugging and trying to divert traffic to server regions that were unaffected by the hurricane.  One problem we had was that, the machine we were using to deploy builds were in our studio room, and the building was closed off.  We had quite a few fixes that we urgently needed to commit and deploy.  We mulled over our options, trying to figure out if we could deploy without the machine, if we couldn’t who should go in to retrieve it, and where could we relocate and set the machine up.  One of our programmers decided to go because he knew what to get and knew the deploy code the best.  He biked in, but our building was closed off.  He was standing by while we haggled with the building security trying to talk our way in.  Finally, he made it in, grab what we needed, ditched his bike (which was actually still there a day later), and rode the cramped bus to my apartment where I had power and internet working.  For the rest of the day and night, he set up the build machine while I updated our status and supported players.  The programmers made fixes and deployed while the rest of us then jumped on support.  And while the servers were being overloaded, we were manually starting matches for people and in game helping people pretty much non-stop.

We probably had one of most harrowing, stressful, and sleepless launch week ever, but we stuck together and fought through it.  With conditions being as rough as it was, we lost players for certain.  Some people had a rough time in the game and quit the game forever.  But to see how supportive a lot of the players were was also very uplifting.  We got a lot of nice posts on Steam like this one below:

First off I would like to congratulate you on releasing a great game; the gameplay, art, and music compliment each other incredibly well.  Second off, wishing everybody is all right in the aftermath of Hurricane Sandy and hopefully the team is back at full force soon.

And they definitely motivated us to work hard for the players and to keep the game alive.  Over the next few weeks, we would roll in patch after patch.  This is normally not advised, but desperate times called for desperate measures, and at least the players knew we were trying everything we could.  People were more forgiving when they knew we were trying and were willing to own up to our mistakes and circumstances.  We identified many issues to fix and things to improve on over the coming weeks and months, but through hard work as a team and openness with our players, at least we accomplished the most important thing in the early days of the game – not dying.

Lessons and Aftermath

Here are some of the things we learned from this ordeal:

1.  In the event of a disaster, we learned to disperse the build critical knowledge and tools overtime, so that anyone can jump in and act.  We also created an on-call schedule, so before a promo event when we would expect a new player surge, we would be more ready and could distribute our workload better.

2.  We needed to continue to build up a support knowledge base and flow and to keep updating our players.

3.  There were a number of things we needed to improve, such as server performance, first time player experience, bug fixes, etc., and we had to really learn to prioritize being a small team.

Iterate and Iterate Some More

Screen Shot 2014-04-05 at 3.46.00 PM.png

After launch, there were a lot of things we needed to improve.  In addition to adding new contents, honing balance, and adding fun features, we took a hard look at the problem of retention in its entirety.  The problem for us was that, retention could mean a lot of things, and a lot of things could affect retention…  So how and where do we start was the next question.  We started by breaking retention down to a few categories or buckets – performance, ease of play, and fun.  Then, we tried to prioritize tasks or projects that would help us improve in each of these buckets.

Performance

Performance for us covered both the server and the client side, and there was a lot of work to be done.  With the large influx of players, the variety of hardware and software specs that we had to support increased exponentially.  Quite a few crash, connection, frame rate, or input related issues were detective work.  For instance, we had a case of failing to connect to the game that players reported to us by email and on Steam, and we had no clue what was happening.  We reached out to one of the players who had this problem, and we got on Skype with him for an afternoon, going through all the different programs and configurations on his computer as well as his network setup.  Finally, we isolated his connection issue to his router.  Once he bypassed his router, he could connect to the game.  The reason turned out that a few router models were causing disruption to Photon’s UDP packet transmission.  Bypassing the router or updating the router firmware solved it.  Through a lot of detective work and the patience of players who were willing to help us, we gradually built our support library and FAQ.

We also had to take steps to improve memory usage, physics, and compatibility with different hardware and software.  To measure and define success, we had to benchmark and then set targets and goals for us to drive towards.  For server, it was the same process.  On the server side, drawing a lesson from the hurricane, we needed to expand coverage, so that players from different parts of the world can have shorter latency.  We still have a long ways to go, but the goal was to get to a point where we could better anticipate player surge and to have our benchmark goals outpace the concurrent player threshold we needed to support.

Also a quick note on hosting.  We use Softlayer for hosting, and their servers are concentrated in US (West and East Coast), EU (Amsterdam), and Singapore.  We started using their cloud service, but because of issues with clock inaccuracies, where we weren’t able to fix update rate reliably, we switched to dedicated servers.  Softlayer was great because its dedicated servers were very affordable, and we had the ability to take an instance up or down in a few hours.  This allowed us to manage server costs and capacity more effectively.  Softlayer has also been great with support, solving their server hardware issues relatively fast.  The two things we are still actively working on are expanding our server coverage and enabling player hosted servers.  We decided to focus on hosting our own servers to start.  In hindsight, we probably should have approached these two things differently and planned for both player hosted servers and more flexibility in server coverage from the beginning.  Another lesson learned.

Ease of Play

Launch would not be the first time our servers exploded.  In fact, it would happen a couple more times…  A few months after launch, we had our first featured sale on Steam.  This time, our match servers held up fine, which was the good news, in that we successfully fixed up one of the major issues that plagued us during launch.  But with the surge of new players, our lobby and master servers were straining.  Players were having massive slowdowns getting into a match and performing lobby functions such as customizing, loading friend list, etc.

Once again, as we rose to the top of Steam’s charts, I took a moment to take a couple of screenshots, and then it was onward to another crisis and survival moment.  In between, I’m pretty sure my heart skipped a few beats, and I freaked out for a bit.  While the players waited, some got impatient and started spamming our in game global chat just for fun, and the spamming took on a life of its own.  The spammers quickly took over chat, and brought down the one lifeline we had to help people start matches or to support.  Chat was one vital thread with which our in game community was woven, and it just got severed.  We were already straining to help people start matches, and this completely crippled us.  What was a performance issue just worsened to a ease of play issue.  We lacked mechanisms to combat spamming in game, and this had caused the game to essentially shut down.

We had to implement a spam chat delay essentially on the fly, and hot patch these in.  All the while, we were on support and starting matches for player pretty much through our waking hours once again.  This time, we had a small army of player volunteers helping us, which made things easier, and we would not have pulled through without a collective effort.  However, we again learned a hard lesson that if ease of play issues were not addressed, they could bring down our game as well.

Aside from moderation features, other big ease of play issues we faced were first time player experience and UI flow to name a couple.  When we started, we did not have a tutorial or manual in game.  Somehow, for us to create a new teamwork experience that hadn’t really been tried before, we thought people would just figure things out, no problem.  Well, we were wrong.  We got a lot of feedback and frustration about lack of ways for players to learn the game, and a lot of players couldn’t figure the game out and left forever.  Over the last year, we would create and iterate on a tutorial multiple times, incorporating a manual, building a practice mode, enabling novice mode where we limit the game’s complexity of choices, and granting veteran players Teacher status for them to teach new players.  With the task of improving new player experience, we are by no means done, and we learned from our rough start that it would be a continuous process.

Our initial UI had ease of use concerns as well.  Some things were unclear, too cumbersome, or too awkward in flow.  And there would be other gameplay related issues big and small.  What drove how we went about improving ease of play was in part collecting and analyzing game data.  However, collecting and digesting player feedback played a bigger role.  Overall, player feedback has been our biggest impetus and motivator for ease of play improvements.

Screen Shot 2014-04-02 at 11.16.43 PM.png

In order to prioritize player feedback, we created a pipeline to integrate the feedback into our flow of ideas.  We amalgamated all player and team ideas in one place.  As a team, we then periodically reviewed and prioritized them based on criticality and importance.  As we sprint planned, we devoted time to player ideas.  We also collected all bugs, prioritized them, and fed them directly into sprint planning.  Toward the release of each update to the game (which we aimed to be 7-8 weeks), we blocked out time to focus on debugging.  So a release cycle for us became two sprints plus a debug period.  Since we’re all in one room, we still pull our chairs over and have impromptu discussions, brainstorms, and meetings all the time.  But with better process, we became better organized over time.

Screen Shot 2014-04-02 at 5.37.27 PM.pngScreen Shot 2014-04-02 at 5.38.08 PM.png

Since last year, we’ve migrated and settled on Trello as our project management tool of choice.  Based on Kanban, Trello helped us to create repositories at different stages of our progress towards updates as well as major milestones.  Trello was easy to interact with, share, and attach documents or images, and we integrated our support center into Trello as well.  We created different “boards” to correspond to various major steps of our pipeline (ex. ideas collection, collection of reviewed and approved projects, sprint, testing and release, and release archive.  Roadmap (for major milestones) and player support were boards on Trello as well.  Within each board, we used “lists” for finer categorizations and steps.  Within each list, “cards” were created for each task or project.  For a small team, we weren’t looking for something too complex.  If a system wasn’t easy or accessible enough, we likely would not use it.

Fun

The more obvious elements of fun for us were replay, progression, and new contents related.  For us that meant iterating on our achievement based progression system, making rematch easier, creating new ships, guns, skills, and cosmetic items, etc.  But for some of the things we decided to take on, the reasoning and benefits were less obvious to us beforehand.  For instance, we created an end match sequence for the winning team, showing the deck of each ship with victorious crew in poses and animations.

Mario.jpg

The reason why we did this was to find a creative way for players to show off their costumes and ship.  This also turned out to accentuate victory in a non-acrimonious way.  Another mechanics for end match was a commendation system, where players could give each other thumb ups for a good match (there was no negative reinforcement).  It’s hard to measure exactly how beneficial these types of features were especially beforehand, as they were typically released along with other improvements in an update.  However, they were really fun to do, and they turned out very positively received.  For the fun of the game, we just had to take chances and do what we felt was fun.

Another big area of work under the umbrella of fun for us was balance.  Balance would always be ongoing, and we would continually be chasing the dream of perfection.  Balancing all the different ships, maps, guns, and skills would be an intricate process in part because while we valued player feedback, players became accustomed to different tactics and debates, so the feedback tended to become more anecdotal and subjective.  Yet, we knew that players not liking a balance change could leave the game forever.  So balancing the game itself was a tricky balancing act.  With Balance, we looked at underused and overused ships, skills, and guns in game, and try to carve out specializations for each weapon.  We also wanted to ensure that weapons were skill based, so that not one or two weapons could be so easy to use and powerful that it dominated use.  Over time, after numerous adjustments, we’ve improved significantly.  Now, we have more ships and weapons in play than ever before.

Fun wasn’t limited to in game features.  Fostering community events also went a long way towards fun and retention.  When we started, we didn’t have much in the way of community features, so player clans and player run events existed primarily outside the game and on our forums.  Players created a lot of contents, and while we tried our best to feature their work, community features were largely absent in game.  So we tried to create in-game systems such as clans and events and better integrated them with our social features such as friends, party, and chat.  So far, in the clan system’s first month of existence, we had close to 4,000 clans registered, and with the popularity of the system, the focus then became how we make it easier to use.  In this case, something we did for the fun of the game spilled over into the ease of use bucket.

Reality Check

Over the last year, we’ve implemented over 350 player reported features big and small, as well as fixed close to 300 player reported issues.  The game project now consists of over 300,000 lines of code, with about 20,000 changes in our version control.  We placed emphasis on player feedback and communications.  Yet, with everything that we’ve done, retaining players in the face of so many great games is exceedingly difficult, especially given that we’re a small team with limited money.  We’ve been lucky to have journalists covering the game, youtube casters casting the game, Steam featuring us in sale events, and fans helping us to spread the word.  But the reality is that our sales and concurrent players, like a lot of other indie games, are spiky, and they fluctuate a lot.

While our player base grew slowly over time, there were big spikes corresponding to larger player influx as result of promo features and events.  A lot of players played for a bit and then moved on to the next game.  As players on Steam tend to have tens if not hundreds of games.  They have a lot of great choices.  So the work that we do retaining players as well as our expectations have to be calibrated against this reality.  And really all we can do is to keep listening to players, be keen to what we lack, and continue to improve.