At, we help our customers to better understand their user data. For example, mobile games or educational apps are known to have plenty of data and the data itself already poses several interesting questions. For example, how do the level of difficulty and retention correlate? Answering such questions does not directly require machine learning in the first place, however, it certainly needs a data-driven mindset. Once actionable insights have been gained from an analysis, machine learning helps to personalize the user experience to a level that exceeds human engagement options. E.g., adapting the difficulty for each user individually.

Measuring Difficulty

Let’s start simple. Before we can analyze the relationship between retention and level difficulty, we need to measure the perceived level difficulty by the player. In many cases, it is not obvious how to measure this metric by purely looking at the data. For that reason,’s infrastructure supports user feedback. After the completion of a level, quiz, or task, the app can simply ask the user to score the previous challenge. In a simple setting, one can just ask the user to rate the challenge as “easy”, “medium”, “difficult”. This data can then be used to calculate a score for each challenge.

You don’t want to ask every single user to rate every single challenge as this will annoy the user and possibly lead to churn. For that reason, one is often interested in simple machine learning models that can model the perceived difficulty of each challenge. In some cases, a very simple model can be found. For example, a single dimension which highly correlates with the perceived difficulty. Particular examples of such dimensions are the playtime for a level, the number of jokers used in a level, or the churn rate of a particular level.

User Feedback vs. Playtime
The figure shows how the user feedback for a level correlates with the actual playtime of that level.

The figure above shows an example from one of our customers. In this quiz game, the users were asked for their feedback after completing a level. However, when comparing this score with different metrics based on the user behavior, we found that the score correlates well with the time required to solve the level. The figure shows data from more than 2k different levels. In total, 750k level completions were taken into account from roughly 60k players. In many cases, a single dimension is not going to be expressive enough, and more features have to be taken into account. In those cases, machine learning can be used to model the user score more accurately.

The User Journey Visualized Based on Difficulty

Once we have a scoring function for all levels, we can get a better understanding how the user journey looks like in terms of the level difficulty. This is a first actionable insight because we can now test different strategies and measure their impact on churn and monetization. Let’s have a look at what the initial user journey looked like before we tested different strategies for optimization.

Initial User Journey
The figure shows the initial user journey depicted based on the level difficulty.

In the figure above, the x-axis gives the position of a level in the game, i.e. the first level is at position 1. The y-axis gives the relative difficulty of a game, i.e. the easiest level has value 1 on the y-axis, the second easiest level has value 2 and so on.

Now, we can design strategies that give more structure to the order of the levels. We then test each strategy in an A/B-test or we use Multi-Armed Bandits to find the best strategy directly. The result of each test also gives new ideas on designing additional strategies. One example of a test could be the following strategy:

Example strategy for A/B test
The figure shows one example strategy designed based on the level difficulty.

In the figure above, we designed one new strategy where the level difficulty increases with every level for a certain number of levels before it then decreases again for the same number of levels. Users who want to be challenged right away might find such a strategy more appealing than the initial one. We can test dozens, hundreds, or even thousands of such strategies depending on the number of users or players available. If we have enough data, we can also use Genetic Algorithms to iteratively generate new strategies.

Finding the Right KPI

While apps and games often have different objectives, their KPIs are similar in many cases. In (mobile) games, revenue is typically the most important KPI. In other settings, such as education and learning, a KPI based on the student’s performance or their retention is more valuable. By reordering the levels or challenges, significant improvements for these KPIs can be achieved. By testing various different strategies, we were able to improve ad revenue for the mobile game mentioned above by 50% after 7 days and 74% after 14 days compared to the initial baseline.

Global Optimization vs. Personalized User Experience

Once you have found a solution that maximizes your most relevant KPI, you can start to think about changing the strategy for certain segments of your users. For example, it might be globally optimal to increase the difficulty slowly. However, most likely there will be a segment of users that learns the game comparably well. These users will churn early if the game is too easy at the beginning because the app demanded too little from them. You would want to treat such users differently.

Here, we can once again ask a subset of users for their feedback and specifically ask them about their satisfaction. Based on this information, we can now build a model that predicts user satisfaction. This model is then used to test different strategies for specific user segments based on their satisfaction and early behavior in the app. We will give more details on personalizing the user experience in one of our upcoming blog posts.


Measuring and analyzing the difficulty level has several benefits and applications. On the one hand, it helps you to optimize the retention or monetization, depending on your goals and KPIs. On the other hand, once a well working strategy has been found via A/B-testing, the difficulty can also be adapted for different segments of users. Such a framework is not only applicable to games but also in the context of learning apps or virtual labs. At, we offer an end-to-end solution that supports you in optimizing and personalizing the user experience. Click on the following links to learn more about our solutions for games or education. Also, don’t forget to follow us on Twitter!

Also published on Medium.