The CIG conference (IEEE’s Conference on Computational Intelligence in Games) hosts several competitions each year. This year, the list also included the Game Data Mining competition with an individual track for churn prediction. This does not only highlight the importance of the problem but we also see several opportunities by participating in such challenges. E.g.,
- Benchmark our own solution and technology against the brightest minds in gaming research.
- Share practical solutions for hard real-world problems with other international AI researchers.
- Extend our expertise in digital gaming.
Additionally, the challenge provides very interesting data. NCSOFT, one the world’s largest game studios for MMORPGs, provided datasets with telemetric user data from their highly successful Blade & Soul.
The data consists of three datasets: one for training (4k users) and two for testing (3k users each). The training data ranges over a time span of 40 days. The two datasets for testing each contain data for a period of 56 days. To give you a rough idea of the size, the training data contains around 175 million events that compromise almost 48 GB of unpacked data.
There are roughly 80 different types of events in total. Following our experience, we would argue that this is an excellent amount, as we typically – if available – recommend between 50-100 different event types as a rule of thumb. However, due to Blade & Soul’s immense game depth, each event has additionally up to 75 properties attached to it.
Although the events are well structured in tabular form, it is still challenging to transform such data in a way that is suitable for machine learning algorithms. In particular, with only 4k users in the training dataset, you quickly reach a point where the number of features exceeds the number of training examples. Typically, you want to avoid such a case.
Therefore, we analyzed the data in several iterations and engineered a comprehensive set of features. This set does not only include features that we have been using for years for various customers but also contains features newly added to our toolbox for this particular case. For example, while recency and frequency matter a lot in predicting churn in general, we found it particularly interesting to engineer social features for Blade & Soul. For games, or apps, which contain a social network, like guilds and parties in Blade & Soul, one should try to reflect that users leaving the network typically affect their peers.
At goedle.io, we typically do not use one particular machine learning algorithm for all of our customers and their different use cases. Churn prediction might benefit from a different feature set than a conversion prediction. Consequently, each problem is solved best with a suitable algorithm for that purpose. For that reason, we do not only test a variety of algorithms for each customer and problem, but we also optimize all parameters for each setting and make heavy use of ensemble methods.
As many of you have probably witnessed, in AI and machine learning it is currently all about Deep Learning and Artificial Neural Networks. For that reason, we used this opportunity to benchmark our existing solution against a simple approach based on Tensorflow and feedforward neural networks as well.
Since the results of the competition have not been announced yet, we present some high-level insights at this point and give more details after the announcement of the results.
First of all, let’s have a brief look at different types of features and their importance in predicting churn. The figure above gives the relative importance of the features used in our solution grouped by their type. Here, frequency represents features that count events in Blade & Soul. Recency-features measure the time since events or between two events. Amount summarizes features which depend on values attached to events, for example, the amount of money spent. Lastly, features in the tendency group indicate an increasing or decreasing level of interaction. To be technically more precise, we use simple curve fittings on histograms to build these features. In general, these results coincide with our observations from the past and show that counting events and calculating ratios is often a good indicator of churn.
One thing that was surprising to us, and in contrast to our previous experiences, was the fact that the algorithm did not highly assess features based on the social network within Blade & Soul. We attribute this observation to the fact that the data did not contain all players in the entire network. Typically, players tend to churn when all of their friends leave a game or if they do not find new friends within the game’s eco-system. The data for the competition only depicts a subset of the entire social network. While we have data for 4k users in the training dataset, the number of identifiers in the friends graph is above 32k users. This means that we only observe the activity of roughly 10% of the entire network.
When comparing our existing solution to our newly implemented Deep Learning approach, we did not see Deep Learning to be superior. This holds for both, the computational demand, as well as the predictive score. However, we see different reasons for this result. First of all, it is certainly possible to do churn prediction with neural networks, as it has been shown by the data science team at Moz. They describe in this blog post how they successfully used Recurrent Neural Networks for churn prediction. We only used simple feedforward neural networks, and hence did not make use of the full power of Deep Learning. This is also emphasized by the fact that a simple NVIDIA gaming graphics card was sufficient to run all experiments. Also, running the same experiment on the Google Cloud Platform with a Tesla GPU did not substantially decrease the run times. We also did not spend nearly as much time as necessary on the network engineering and parameter optimization. Both aspects can have a major influence on building a successful model. Lastly, we still believe that for a simple univariate problem (a single variable in the output layer), other algorithms may be the better choice. However, when turning the problem into a multivariate setting, i.e., when predicting churn for several players at the same time, Deep Learning can be a very interesting technology. Therefore, we will certainly continue our work on neural networks and further tweak our setup.
We have not reported any details on the accuracy of our predictions yet as the results of the competition have not been announced yet. If you are interested in more details, make sure to come back to our blog in September. If you want to make sure not to miss the release of the next blog post, subscribe to our newsletter and follow us on Twitter. To learn more what goedle.io offers for game studios, please visit gaming.goedle.io.
Also published on Medium.