Using Deep Reinforcement Learning For Boss AI

Yorth · June 2022

Hey everyone. First of all, I would like to apologize if this is not the right place for these kinds of requests. I am new to the forums and General Discussion seemed like the safest place for me to post.

With that preface out of the way, I would like to suggest using modern AI technology for Boss fights—specifically deep reinforcement learning. In this post I will be tackling three main points: What is deep reinforcement learning and how is it useful? How can it be used to create interesting Boss AI? And what are some of the problems that might appear and how to deal with them?

Deep reinforcement learning is a mix of two ideas: neural networks, and reinforcement learning. Neural networks are a semi-abstraction of how the brain works. It is composed of neurons (or nodes) and these neurons are connected much like the ones in our brains are. We give this neural network an input (for a Boss's AI it might be player action, its Hp, its environment, etc...) and some of the neurons get activated. Depending on which neurons get activated we end up with a different output (for a Boss's AI, these outputs are the action that the boss will take. Which direction should it move? Should it use an ability? Which ability should it use?).

Now that we have a basic understanding of neural networks, let's move to reinforcement learning. It is simply a training method. The way it works is that when an AI does something that is good, we give it a reward, when it does something bad, we give it a punishment. So of course, the AI is incentivized to do more of the things that give it a reward and avoid the things that give it a punishment.

Deep reinforcement learning mixes these two ideas. We take a neural network, and we train it using reinforcement learning. When the AI does something good, we leave the neural network as it is. When it does something bad, we tweak it a little bit. This explanation is obviously a very big oversimplification, but you get the general gist. If you want to learn more; you can check out this article that explains it in a beginner-friendly manner: https://wiki.pathmind.com/deep-reinforcement-learning

So, why do we care about deep reinforcement learning? It is the architecture that OpenAI used to create the models that dominated Dota 2 pro players. It showed that not only did it have high-level mechanics (even when its APM and effective APM were reduced to human levels), but it was also able to bait and fake out its opponent. It was able to showcase levels of strategy that pro-player started mimicking the AI.

Now for the second point, how can this algorithm be used to create interesting Boss AI? One pet peeve that I always had with MMO AI, in general, is how formulaic it is. The boss starts with a couple of attack patterns and skills that it use, and once it reaches a certain threshold it changes those patterns. You end up with bosses that have very little variety of moves, and they can be beaten easily as long as you have the right equipment and learned the right strategy over the course of a few runs (or more likely googled the clearing guide for the boss). I believe that a boss that not only challenges you from a strategy standpoint but also tactically and mechanically can be very interesting. It would require that the players raiding the boss show a level of skill, adaptability, and coordination, rather than just going through the motion as you already memorized all the patterns and phases of the boss.

Now, for the problems that might appear and how to solve them:

The raid boss might be so challenging that it is impossible for players to win against: I have seen this argument being thrown around in these types of threads multiple times, and I believe that there are many ways to solve this issue. Let's start with the easiest one: nerf the boss stats. Rather than having a boss with millions of Hit points and multiple one-shot mechanics, we can tone that down to a more manageable level. That way even a more dynamic AI can still be beaten by players that are skilled enough.

The other option can be used along with the first option. But to understand, let's first go back to how will we train our Boss AI in the first place. It is similar in a sense to how OpenAI was able to train their Dota 2 AI. We will create a boss model (model/agent is just another name for AI, don't be too confused by it) and some player models, and have them fight against each other. Maybe in the first round the boss wins, in the second round the players win, and so on and so forth. As they continuously fight each other, they keep learning and getting better. What we can do is keep a copy of each of these models as they learn, so that we can access them later. Once we reach the point in the training (if we ever reach it) where the boss is just stomping the player models without them being able to do a thing, we can go back in time and use an earlier copy of it that is less skilled (and can be killed by the player agents). We can check how this boss model fights, and if we like it, we can playtest it. Then the playtesters can tell us whether the boss was fun to fight—if it was too hard or too easy. If it was too hard then we can go back and look for even less skilled copies and test those; if it was too easy then we do the opposite.

In my opinion, this is just a natural balancing problem, and the solution will follow the same old steps. It's an iterative process that will take into consideration multiple things until the design and AI teams reach as perfect of an encounter as can be.

How to make sure that the Boss is controllable/doesn't do stupid stuff: As with all probabilistic algorithms, you cannot enforce policies 100% of the time, but there are tools to make sure that they happen as rarely as possible. Going back to our definition of reinforcement learning, we can enforce our policies by manipulating our reward function. So let's say that we want to program an aggro radius for the boss, we want to have a green zone where the boss can go wherever it pleases, a yellow zone where it should only access if it's fighting players and wants to kill them, and a red zone where it should never step foot. The way we code that is if the boss is in the green zone, reward += 0. If the boss is in the yellow zone reward -= 100. If the boss is in red zone reward -= 10000. This way, the boss can stay in the green zone as much as it likes, but only enter the yellow zone if it thinks that it can kill players(because let's say that killing players give it reward+= 500, which is more than the penalty of the yellow zone), and never enter the red zone (because the reward of killing the player is less than entering the red zone).

Well, to wrap it up, I just want to say that I'm really hyped for this game and can't wait to play it out. It's been a while since I have been this hyped for an upcoming game and that I trusted a game company to deliver on their vision and receive feedback from their players and actually implement said feedback. I'm sorry for the long rant, but this idea has been in my head for quite a while and I really wanted to share it with the community and the team to know their opinion.

Noaani · June 2022

So, here's my question.

Why?

Also, I disagree with your first suggestion for how to prevent it from getting too difficult for players - no one has shown this to be possible at all yet. While it may be theoretically possible, proving it on a hundred million dollar project is probably not the wisest of moves.

SongRune · June 2022

I found this concept super interesting. I hadn't heard of this particular Reinforcement Learning technique, so I went to go learn about it via your linked source. My problem with it, fundamentally, as I went through was that I kept looking for "Okay, but how do you feasibly apply long-term reward outcomes back to long-prior decisions that lead to them?". I did find my answer eventually, in footnote two, and though it wasn't what I had hoped, it was what I was expecting. Unfortunately, it has some strong implications for the use of this technique in an MMO Boss Combat AI environment.

So let's go over requirements: To function in an MMO Boss AI context, such a system needs:

- To be able to handle all or most group compositions without falling flat.
- To be able to be applied to multiple enemy AIs in the same encounter, and manage to learn individual tactics, teamwork, the ability to handle losing various combinations of allies, or various disruptions to positioning.
- To be able to be applied to/trained for many different bosses in many different situations. MMOs aren't small, and tend to need to have, or develop, a lot[ of content. Even if there are fewer at launch, "several thousand unique bosses" is not out of the question for a mature MMO.

AI think tank OpenAI trained an algorithm to play the popular multi-player video game Data 2 for 10 months, and every day the algorithm played the equivalent of 180 years worth of games. At the end of those 10 months, the algorithm (known as OpenAI Five) beat the world-champion human team. That victory was the result of parallelizing and accelerating time, so that the algorithm could leverage more experience than any single human could hope to collect, in order to win.

The first two requirements I'm willing to concede. If you can play Dota, you can probably pull those off. Unfortunately...

It's not economically feasible to provide a significant, high-performance compute cluster for every or even most major bosses in a game at Ashes' scale. It's even less feasible to provide this cluster for 10 months for each boss encounter, or even type or style of boss encounter in the game. When you add to this that a boss has to be finished and balanced for the AI to really be training with it properly (as minor to moderate changes in design can sometimes create or invalidate tactics), you simply can't use this technique effectively for the scale of content you would find in this type of game.

Tuning the difficulty of the AI is another special consideration as well. Simple stat adjustments (even on a pure "uniform scaling" level) can have the effect of rebalancing what risks are advantageous to take against a given set of opponents, requiring the AI to relearn possibility-spaces that it has already trained on. Performance does not simply scale linearly with statistics in all cases with a pre-trained AI. Add to this the fact that these sorts of effects would apply differently (sometimes negligible, and sometimes devastating) to different player team compositions and tactics, it become even more difficult to use a simplistic scaling method without allowing the boss to be 'cheesed'. And if you find a flaw and need to rebalance or rethink some aspect of a boss? Back to the cluster for a few more months of training.

I would find this sort of thing REALLY COOL, but I don't expect to get it.

Dygz · June 2022

The devs aren't going for that degree of innovative AI.
Maybe we will get an MMORPG with Deep Reinforcement Learning in 2030.

Yorth · June 2022

SongRune wrote: »

I found this concept super interesting. I hadn't heard of this particular Reinforcement Learning technique, so I went to go learn about it via your linked source. My problem with it, fundamentally, as I went through was that I kept looking for "Okay, but how do you feasibly apply long-term reward outcomes back to long-prior decisions that lead to them?". I did find my answer eventually, in footnote two, and though it wasn't what I had hoped, it was what I was expecting. Unfortunately, it has some strong implications for the use of this technique in an MMO Boss Combat AI environment.

So let's go over requirements: To function in an MMO Boss AI context, such a system needs:

- To be able to handle all or most group compositions without falling flat.
- To be able to be applied to multiple enemy AIs in the same encounter, and manage to learn individual tactics, teamwork, the ability to handle losing various combinations of allies, or various disruptions to positioning.
- To be able to be applied to/trained for many different bosses in many different situations. MMOs aren't small, and tend to need to have, or develop, a lot[ of content. Even if there are fewer at launch, "several thousand unique bosses" is not out of the question for a mature MMO.

AI think tank OpenAI trained an algorithm to play the popular multi-player video game Data 2 for 10 months, and every day the algorithm played the equivalent of 180 years worth of games. At the end of those 10 months, the algorithm (known as OpenAI Five) beat the world-champion human team. That victory was the result of parallelizing and accelerating time, so that the algorithm could leverage more experience than any single human could hope to collect, in order to win.

The first two requirements I'm willing to concede. If you can play Dota, you can probably pull those off. Unfortunately...

It's not economically feasible to provide a significant, high-performance compute cluster for every or even most major bosses in a game at Ashes' scale. It's even less feasible to provide this cluster for 10 months for each boss encounter, or even type or style of boss encounter in the game. When you add to this that a boss has to be finished and balanced for the AI to really be training with it properly (as minor to moderate changes in design can sometimes create or invalidate tactics), you simply can't use this technique effectively for the scale of content you would find in this type of game.

Tuning the difficulty of the AI is another special consideration as well. Simple stat adjustments (even on a pure "uniform scaling" level) can have the effect of rebalancing what risks are advantageous to take against a given set of opponents, requiring the AI to relearn possibility-spaces that it has already trained on. Performance does not simply scale linearly with statistics in all cases with a pre-trained AI. Add to this the fact that these sorts of effects would apply differently (sometimes negligible, and sometimes devastating) to different player team compositions and tactics, it become even more difficult to use a simplistic scaling method without allowing the boss to be 'cheesed'. And if you find a flaw and need to rebalance or rethink some aspect of a boss? Back to the cluster for a few more months of training.

I would find this sort of thing REALLY COOL, but I don't expect to get it.

Completely agree with you. After publishing this post I went back and looked at the cost for training the OpenAI five model. I came across this figure:

Now to explain this image let's look at the axes. On the y axis we have trueskill which is just an inhouse measure of how skilled a player or an agent is, much similar to elo points. On the x axis we have pflops/s-days. I know that unit of measure looks deranged, but a simple way of looking at it is 1 pflops/s-days is the amount of compute that 8 Nvidia V100 calculate at full efficiency in the span of a day. So, to get a model that can beat the world champions, you would need to use 800 pflops/s-day, which is like 3 million dollars, but to get a model that is around the level of an amateur team, you would need something like 20-30 pflops/s-day, which is close to 80k to 100k (these figures use the running rate for google cloud services, link here). This is still an insane amount of money to spend, especially on just a side demo. Plus, when you consider the limited parameters that openAI five had in place, these figures are probably lowballing it by a large margin. I wouldn't be surprised if it required something closer to 500 pflops/s-days for the models to actually generalize for all possible class combinations (64 archetypes are A LOT).

So yeah, I agree with you. This kind of project would probably be better suited for more specialized AI labs like DeepMind and OpenAI before trickling down to gaming companies. That and the cost of compute will need to go down by a lot.

PS: One little side note though, it might not be necessary to retrain models every time classes get updated or there is a bug to be fixed. There is a technique that openAI used for this specific usecase called neural surgery. You can find the link to the paper here (sorry for just linking the paper and not some article that intuitively explains it. I tried looking for one but I couldn't)

Yorth · June 2022

Dygz wrote: »

The devs aren't going for that degree of innovative AI.
Maybe we will get an MMORPG with Deep Reinforcement Learning in 2030.

Hopefully we do, I would love to see that

Noaani · June 2022

Yorth wrote: »

Completely agree with you. After publishing this post I went back and looked at the cost for training the OpenAI five model. I came across this figure:

You have still not answered my very basic question; why?

As in, why would any developer do this?

Most raid encounters in games are designed to test very specific aspects of a raid. It may be their DPS, it may be their communication, it may be their ability to think on their feet, or it may be as an introduction to a new mechanic.

If you give an encounter a bunch of abilities and just get the AI to work out how to best use them, you are losing all of this. It will turn raid progression in games in to an ungodly mess that makes no sense at all.

While it is probable that this will one day be possible from an AI, that day is literally decades away still.

So again I ask; why?

Tragnar · June 2022

I really don't like this from a single reason

Boss fight is a crafted dance for many players at once that purposefully avoids "unfair" practices like overwhelming players and spontaneous oneshots (through layering multiple mechanics at the same time)

if you design a neural network and try to limit it with the "unfair practices" then you basically get rid of most things that neural networks are good for.

The best approach we currently have is semi randomized moveset of behaviors to create the illusion of spontaneity and true intelligence and leave the fight pacing in the designer hands

Nobody would like boss behavior that was learnt through neural networks - cuz you'd always end up with something that isn't killable because it would always find a way to fuck players over with unfair mechanics layering that has no counterplay

Yorth · June 2022

Noaani wrote: »

Yorth wrote: »

Completely agree with you. After publishing this post I went back and looked at the cost for training the OpenAI five model. I came across this figure:

You have still not answered my very basic question; why?

As in, why would any developer do this?

Most raid encounters in games are designed to test very specific aspects of a raid. It may be their DPS, it may be their communication, it may be their ability to think on their feet, or it may be as an introduction to a new mechanic.

If you give an encounter a bunch of abilities and just get the AI to work out how to best use them, you are losing all of this. It will turn raid progression in games in to an ungodly mess that makes no sense at all.

While it is probable that this will one day be possible from an AI, that day is literally decades away still.

So again I ask; why?

Okay, how do you know that letting an AI learn movesets on its own is going to create an ungodly mess? It might be more fun than you might expect. OpenAI's event where they let players all over the world play against their AI model was very well received and a ton of fun for everyone involved.

Since you talked about how raids usually test very specific aspects such as DPS, or communication. Well, a more intelligent boss might test a raid party's adaptability, mechanics, and ability to make decisions on the fly, without relying on some pre-planning (though you'll still have pre-planning because the abilities of the boss don't change from an encounter to the next).

The bottom line is you'll never know what works and what doesn't until you sit down and try it out. Thinking up arguments and counter-arguments in the air can be done ad infinitum without reaching any useful conclusion. Very intelligent AI bosses can be fun, or can be a complete disaster. The only one to find out which is the case is to just put into action and test it. There have been thousands of times where developers thought that a feature would be an absolute disaster and it turned out amazing, and thousands of other cases where they thought it would be amazing and it turned out to be complete shit.

Caww · June 2022

I would think the server resources for such an undertaking to be prohibitive given all the iterations that AI needs to perform just to do one action and then iterate on those results. Most of the big Boss fights are out in the open with players either attacking the Boss or each other to claim the Boss and each time a raid is attempted the entire quantity and mix of player objectives change.

Raising concerns is called thinking ahead and is a major aspect of decision making. Deciding not to consume limited game development resources on attempted technology should not be considered limited foresight.

Noaani · June 2022

Yorth wrote: »

Okay, how do you know that letting an AI learn movesets on its own is going to create an ungodly mess?

If you look at literally all AI that have been used to learn to play games, they have all had a singular goal - win.

That isn't the idea behind raid design. As I said, the idea behind many (most) raid encounters is to do something specific - test the raid in a very specific way. Many of them are indeed puzzles that the developer craft for players to solve, they aren't just supposed to be some big bad thing to kill - there is much more to them than that.

This has literally never been done with an AI. It hasn't even been attempted as far as I can see.

If a developer creates 20 encounters, gives them all some abilities, and then leaves it to the AI to determine their behavior, what then? Where the the specific test of the raid? Where is the puzzle?

Sure, you may have 20 encounters that are all difficult, but what order do you put them in front of players? How do you ensure that the gear you have provided for players is adequate? The whole thing would be a mess, even if a singular encounter may be fine.

Also, there are also raid encounters that test a raids adaptability, and require everyone in the raid to make their own decisions. This has been a thing for at least a decade, so AI is definitely not needed for it.

While it is true that you never know what will happen until you try it, as I also said in my first post, this kind of thing isn't the role of a company making a hundred million dollar product - it is the role of universities and other such institutes. When they get this all worked out, commercial operations can start looking at it.

Until that point though, it is straight up not viable.

Yorth · June 2022

Noaani wrote: »

Yorth wrote: »

Okay, how do you know that letting an AI learn movesets on its own is going to create an ungodly mess?

If you look at literally all AI that have been used to learn to play games, they have all had a singular goal - win.

That isn't the idea behind raid design. As I said, the idea behind many (most) raid encounters is to do something specific - test the raid in a very specific way. Many of them are indeed puzzles that the developer craft for players to solve, they aren't just supposed to be some big bad thing to kill - there is much more to them than that.

This has literally never been done with an AI. It hasn't even been attempted as far as I can see.

If a developer creates 20 encounters, gives them all some abilities, and then leaves it to the AI to determine their behavior, what then? Where the the specific test of the raid? Where is the puzzle?

Sure, you may have 20 encounters that are all difficult, but what order do you put them in front of players? How do you ensure that the gear you have provided for players is adequate? The whole thing would be a mess, even if a singular encounter may be fine.

Also, there are also raid encounters that test a raids adaptability, and require everyone in the raid to make their own decisions. This has been a thing for at least a decade, so AI is definitely not needed for it.

While it is true that you never know what will happen until you try it, as I also said in my first post, this kind of thing isn't the role of a company making a hundred million dollar product - it is the role of universities and other such institutes. When they get this all worked out, commercial operations can start looking at it.

Until that point though, it is straight up not viable.

fair enough, I respect your opinion.

Tragnar · June 2022

Yorth wrote: »

fair enough, I respect your opinion.

What a way to say you disagree with everything he said

Problem is that it is a fact that neural network learning is inapplicable to boss encounters, because it purposefully breaks all of the unspoken rules that encounters have and its sole goal is to win over players that are fighting it.

You need to look at what your "need" is - that is to have bosses with dynamic realistically behaving autonomy over their behavior so every fight different

the solution to that is NOT neural network learning , because you would get unfun encounters that will have toxic gameplay. You misunderstand what neural network learning is and what it is good for and all that you take from it is that it "figures things out on its own" which in essence isn't even true, because it figures out ways to solve problems with a specific goal in mind - and that usually is to score the highest score possible that is evaluated with some scoring system.

I know you want fancy stuff, but this is an engineering problem and you are suggesting to use a tool that isnt at all suited for the problem at hand.

In other words:
Neural network learning is not magic - it doesn't understand your meaning behind

how do you know that letting an AI learn movesets on its own

it doesn't understand what you subjectively see as fun encounter - what it does is that it finds the most effective strategies to achieve easily quantifiable goal that can be rated. Which is perfect for balanced games like chess, go or even mobas (like dota 2). And even then the new strategies can be viewed by many as "cheese" or "stupid" because it doesn't follow the unspoken rules the players created with time

superhero6785 · June 2022

Teams from conpanies like OpenAI have developed sphisticated AI to play games like Dota 2 which compete against the best players in the world. Is it possible to build a learning AI for Ashes bosses? Sure. Do they have the team to pull it off? Doubtful.

Using Deep Reinforcement Learning For Boss AI

Comments