OpenAI Five: Goals and Progress

shawn · on Aug 20, 2018

As someone who was once semi-pro in dota (4400 MMR, get rekt), it's freaky watching these bots play. It's uncanny. Little things... Like, when the bots are taking a tower, one of them will stand in front of the tower and tank the creep wave, so that their creeps do more damage on the tower. They had to learn this.

Insta-TPing right when an enemy wastes their stun and can't cancel their TP.

Grouping up as 5 at the beginning of the game and pushing into the enemy jungle. Pubs never do this.

The most interesting part is that OpenAI appears to be discovering new knowledge in the dota scene. For example, they always take the ranged barracks first, never the melee. This is exactly the opposite of what the pro scene does. Therefore, the smartest pro team should study what the bot is doing and trust that on average it's a better idea to always focus on the ranged barracks first. After all, if it was a bad idea, they probably wouldn't do that.

The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

Question for OpenAI: Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

EDIT: By the way, TI is going on right now! https://www.twitch.tv/dota2ti If you're new to the scene, take a peek. TI is always so high energy -- even if it's hard to follow what's going on, listening to Tobi (the shoutcaster) go nuts during the game is always a highlight.

And of course, /r/dota2 has the best memes anywhere, hands-down. https://www.reddit.com/r/DotA2/

zawerf · on Aug 20, 2018

> Question for OpenAI: Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

They released their architecture: https://www.reddit.com/r/MachineLearning/comments/9533g8/n_o...

In the above thread, a reddit user noted that 512 out of the 2048 unit input into the LSTM of each bot is shared (max pooled across players).

This means they are telepathically linked and never need to worry about communicating, disagreeing, etc. They know how the others are interpreting their local inputs because it's explicitly shared. So it's not really fair to call it 5 separate minds.

You can't really call it a single mind either because if you assume the LSTM is the "mind" (since it's the only place that has memory of previous state), that state isn't read by any other bots.

shawn · on Aug 21, 2018

This is fascinating. Thank you!

jdoliner · on Aug 20, 2018

I'm also very excited about what these bots can tell us about DOTA but I think a certain degree of skepticism is necessary due to the bots not actually having been trained truly tabula rasa. The training contained lots of little "nudges" to reward the bots for things that lead to winning. I.e. there are nudges to get them to go to lane to start, there are nudges to get them to buy the right items. I don't know for sure but I wouldn't be surprised if they kill the ranged barracks first because there's a built in reward for killing barracks that's the same for both melee and ranged barracks but the ranged ones have less health so they value that a little bit higher since it's a more consistent reward than trying to get the melee one. I hope that someday they get to a point where they can truly train from scratch with no human nudges, I'm not sure how much more impressive of an ML result that would be (I'd imagine meaningfully so but I'm not an AI researcher) but it would certainly be a much more interesting result in terms of our understanding of DOTA.

Crespyl · on Aug 20, 2018

As an admittedly lower-mid level player, when it comes to the ranged barracks at least, I suspect there's actually some value in attacking them over the melee racks simply because the ranged ones do not regenerate their HP, unlike the melee racks.

Whenever my team is pushing high-ground and we're not certain we have the time and strength to completely 100-0 the melee racks, we always try to hit the ranged one first, otherwise we'll just get pushed back off the high ground and all the damage we did will just be regenerated, negating the value of all the resources we expended in the push.

Ideally, we prefer to take the melee if we know we can, since there's more melee creeps in a wave, the buff is worth more, but if we're not sure we'd rather do damage we know won't evaporate in a few minutes.

Ntrails · on Aug 21, 2018

> since there's more melee creeps in a wave, the buff is worth more

This is true - but not the whole story. Since ranged creeps do significantly more damage taking the ranged rax means that your lane still pushes a bit and importantly it accumulates ranged creeps as it goes along. So if the lane is left, when it hits their base it will do hella damage to buildings if they don't address it. (Bonus melee supers would last longer and do more base damage over their lifespan - but what you want is damage in the time it takes them to TP).

(also trash tier player, so just my thoughts)

delusional · on Aug 21, 2018

That's exactly why pros do it too. If you aren't sure you'll take it, you go for the ranged. The strange thing is that the AI always goes for the ranged, even if it's sure it can take it.

Benjammer · on Aug 20, 2018

Keep in mind that it's not just pure learning from zero, with these crazy emergent behaviors coming out. There is a fair amount of "coaching" involved. The OpenAI team explained this last year when they revealed the SF mid 1v1 version of their Dota bot.

People were astounded that the SF mid was creep blocking at an extremely high level, since wave positioning/management is a very complex behavior/process that doesn't pay off directly right away. The OpenAI team said that they basically taught the SF specifically how to creep block, and they've done this with other behaviors as well, like denying allied creeps, as well as a lot of the meta-level strategic play like warding/vision and item builds/usage.

shawn · on Aug 20, 2018

The thing is, OpenAI are terrible at the game. If they try to coach too much, they'll ruin their chances at beating the top Dota team. They have to let the bot discover its own winning strategies.

For example: There are an infinite number of places to place a ward. One way to train the bot is to preselect all possible ward locations, reducing it to 30 or so common ones. Another way is to make an optimization algorithm where the bot focuses on trying to maximize the "strategic vision" (if it's possible to come up with a measure for "strategic") and then let the bot place wards wherever it wants. After hundreds of years of self-play, it should figure out the best place and times.

As I write this out, I think you're probably right. There are too many aspects of the game for a purely-random algorithm to be effective... E.g. item builds. But I'm holding out hope that's just because they haven't figured out a good way to encode all dota items into a distance measure.

AI has proven over and over that humans aren't so special. And humans know how to adapt to the game.

That said, I wish OpenAI would be completely transparent as to what's emergent behavior and what's not. :)

Oh, one last interesting thing: Icefrog is going to roll out a big patch after this TI, just like he always does. I wonder how much of the bots' knowledge will transfer over? Or if they'd be better off training from a clean slate?

dx87 · on Aug 20, 2018

During the OpenAI stream where they first played against 5 other players, they said that the AI isn't picking items at all, it's just using pre-made recommended item lists. They also said that the match was being played on an older patch so they may just not update if the upcoming patch makes large changes.

Benjammer · on Aug 20, 2018

>I wish OpenAI would be completely transparent as to what's emergent behavior and what's not.

Totally agree with you here. This would also help actual DOTA players to analyze what is truly emergent behavior that might open new doors to high level play styles, and what is just weird coaching/niche optimization choices by the OpenAI team.

ufo · on Aug 20, 2018

For what is worth, they have been very clear about that kind of stuff in their blogposts and interviews. It just happens that the information is a bit disjointed due to that format.

hoseja · on Aug 21, 2018

>Oh, one last interesting thing: Icefrog is going to roll out a big patch after this TI, just like he always does. I wonder how much of the bots' knowledge will transfer over? Or if they'd be better off training from a clean slate?

The bots aren't even playing real Dota, let alone current patch.

shawn · on Aug 21, 2018

That's not really true. There are restrictions, but it's real dota. The fact that people haven't had time to adapt to the rules is irrelevant; the bots' strategy is very similar to TI4, when deathball meta was in fashion and games would end in 13 minutes.

OpenAI are here to compete on equal footing. They aren't going to stick with the old patch, I would guess.

delusional · on Aug 21, 2018

Watching the showmatch as a dota player, it was clear it's not "real" dota. I played Turbo while it was new, and the game they are playing is much more similar to that. Having _almost_ instant regen whenever you need it is a gamechanger.

It's sort of a "Theseus's paradox" situation.

Ntrails · on Aug 21, 2018

Until the AI is thinking about how to deal with a possible 5th pick brood/huskar/meepo it's not real drafting. Until it's coping properly with courier snipes mid, courier use distribution etc. It's just playing a variant of dota.

It's still really fucking good. But it's not dota _yet_

andreyk · on Aug 21, 2018

re "That said, I wish OpenAI would be completely transparent as to what's emergent behavior and what's not. :)"

This is a pretty good summary - the main bit to know is reward shaping https://medium.com/@evanthebouncy/understanding-openai-five-...

delusional · on Aug 21, 2018

> AI has proven over and over that humans aren't so special.

I disagree. I think AI has shows that we are very special.

antidesitter · on Aug 20, 2018

Perhaps one day they’ll be able to create the analogue of what AlphaGo Zero (learning from scratch) was to the original AlphaGo (learning from human play). That would be very impressive.

Benjammer · on Aug 20, 2018

Imo, going from OpenAI Five to some kind of OpenAI FiveZero or whatever is like 1000x the amount of progress of AlphaGo to AlphaGo Zero, just based on the complexity of game mechanics and the fact that it's 5v5 and not 1v1.

kirillseva · on Aug 20, 2018

and perfect vs imperfect information - the policy network has to forecast enemy positions and deduce enemy goals. At least in Go you know full game state

cc81 · on Aug 20, 2018

I think you are putting too much trust in how good OpenAI is at bigger strategy right now. They have a very push focused strategy that fits with the limitations they have put on the game.

When OpenAI wins against human opponents it seems mostly be because it is so much better at cooperating and can jump the enemy so quickly. That combined with continued ferrying of regen that would not work in a normal game.

shawn · on Aug 20, 2018

Mm, everyone said the same kind of thing last year. It's worth being skeptical, but I'd rather be a true believer and be proven wrong. It seems like betting against technology to say that they can't beat the top dota team in a fair fight within a couple years.

Benjammer · on Aug 20, 2018

I think the argument people used last year is still the same argument though. The 1v1 SF mid was all about computer-precision mechanical skill with hitting creeps and HP management. People said the AI would never be able to match pro-level meta/team strategies. While OpenAI Five is insanely impressive, it's still very glaringly obvious that their range of play styles is very restricted. Many of the pros, even ones who have played against (read: got crushed by) OpenAI Five, still think it's very far away for the bot to out draft a pro team with the full ranked game rules and hero pool.

The push heavy death ball strategy is pretty much optimal for the high-precision, perfectly coordinated, mechanical fighting skill the bots have. They get a few kills using the mechanical skill and then they group up and press their advantage as hard as they can.

It seems like all the rules/mechanics they are still working on are the more abstract out-of-the-box stuff that evolved to deal with these types of 5-man "teamfight" hero lineups... (warding, game-contextual item builds, courier management, full hero pool with all the "rat" split pushing heroes, etc.)

roenxi · on Aug 20, 2018

> Many of the pros, even ones who have played against (read: got crushed by) OpenAI Five, still think it's very far away for the bot to out draft a pro team with the full ranked game rules and hero pool.

Neither AI experts or DotA experts have the necessary background to predict how close anything is to any level of future performance. DotA players are probably the worst to ask, because the AI is already stronger than they are at a subset of the game and they have no insight into how quickly that subset could be generalised into other aspects.

OpenAI has a list of something like 7 restrictions. Nobody has any real idea of how quickly these restrictions can be lifted once the OpenAI team has an AI that has mastered the game with those restrictions.

Eg, 5 invulnerable couriers is obviously a huge restriction - but once OpenAI knows it gains a benefit by ordering items, how difficult is it to lift that restriction? Nobody knows. Might be easy, might be hard.

mellinoe · on Aug 20, 2018

> OpenAI has a list of something like 7 restrictions.

Some of those are very major, though, and I think the word "restriction" is a bit misleading. Having 5 invulnerable couriers is not really a "restriction" in that it limits or simplifies parts of the game -- it's just a fundamentally different mechanic that changes the way the game can be played.

> DotA players are probably the worst to ask, because the AI is already stronger than they are at a subset of the game and they have no insight into how quickly that subset could be generalised into other aspects.

I think that's a little unfair. Most folks have been pointing out that OpenAI's current momentum-based "deathball" strategy seems to fall apart without infinite regen and a limited hero pool, both facilitated by the current set of restrictions.

I'd agree that nobody really knows how well OpenAI will adapt to the full game, but I disagree that the criticisms I've seen are meritless. OpenAI's current level of play is definitely impressive, but I think there's still room for skepticism given the current restrictions. I (and I think a lot of others) would be pretty disappointed if the TI showmatch happened with the turbo mode couriers still enabled.

roenxi · on Aug 20, 2018

> I think that's a little unfair. Most folks have been pointing out that OpenAI's current momentum-based "deathball" strategy seems to fall apart without infinite regen and a limited hero pool, both facilitated by the current set of restrictions.

People said similar things about Go AIs and ko fights. And in the end it turned out that neural networks handled kos fine but ladders were a challenge.

On the deathball strategy in particular, consider that we expect a superhuman DotA AI to change the DotA metagame, so playing off-meta doesn't tell us anything. AlphaGo would invade 3-3 point a lot more enthusiastically than a human player. This was considered a classic beginner mistake for many years; now the theory has been readjusted to cope with the fact that AlphaGo stuck with it and just considered it a good move.

We can safely say that the courier change has made a deathball strategy more powerful and it seems quite likely it is not an optimum strategy. But we can't be sure until OpenAI tests it, and we absolutely can't be sure that OpenAI won't just learn a new style when the conditions change.

The criticisms have merit, but nobody has enough data predict anything about the future. Particularly a professional DotA player.

Benjammer · on Aug 21, 2018

>I (and I think a lot of others) would be pretty disappointed if the TI showmatch happened with the turbo mode couriers still enabled.

Totally agree. This one change alone is _so central_ to both the bots laning strategies and meta-game team strategies. They can't just leave heroes in lanes forever no matter what, and have all 5 heroes literally never go back to the well if there aren't 5 couriers. Not to mention their initial item builds, stats-only-4-man-the-lane-for-first-blood bullshit wouldn't work at all without constant ferrying of regen on the couriers.

eertami · on Aug 20, 2018

>semi-pro in dota

>4400 MMR

OP is being extremely satirical here by the way. He means he's not great but knows how to play (and definitely not semi-pro) but that context might be lost if you don't play Dota!

>Is it more accurate to think of the bots as 5 separate minds, or a single mind controlling 5 heroes?

They answered this on the last stream, iirc it's 5 identical clones with the same goals, but not sharing any knowledge, info, or decisions with each other.

jjcm · on Aug 20, 2018

RE: Semi-pro

He also mentions that this was in the past. 4 or so years ago 4400 MMR was in the top 1% of dota players. MMR creep has happened significantly since then.

shawn · on Aug 20, 2018

1v1 me

hokumguru · on Aug 20, 2018

I think 4400 MMR places OP in at least Ancient-1 ranking which is approximately the 95th percentile. I'd call that at least semi-pro.

a_humean · on Aug 20, 2018

Errrrr, at best a dedicated amateur.

I'm just 3.5k (I think that's 70th percentile), but I know lots of 4.5k players. To describe the average 4.5k player: Probably has regular groups of people they play with at different skill levels (anywhere from 2.5-5k+), regularly plays battle-cup on Saturdays, maybe played amateur JoinDota league, maybe had a laugh and played open qualifiers only to lose in the first couple rounds, and probably log between between 10-20 hours per week into the game.

4.5k players know how to play to a very good standard and beat the vast majority of other players, but are miles away from the weakest of the professional scene. 4.5k doesn't even appear on the leader boards.

bkovacev · on Aug 20, 2018

I definitely agree with your statement, however..

Solo - the guy that is the captain of Virtus Pro, was at 4k for the longest time. There's more to dota than just mmr.

There are players at 5.5-6k range that still do not understand the basics of team play, but are just extremely mechanically gifted and are in great gaming shape.

PeCaN · on Aug 20, 2018

4400 isn't even good enough for amateur tournaments.

apeace · on Aug 20, 2018

> The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

I asked about this in a previous thread[1] and received a response that a network blip caused all the players to drop from the game, in which case OpenAI Five was programmed to pause.

[1] https://news.ycombinator.com/item?id=17700001

EDIT: Fixed thread link.

currymj · on Aug 20, 2018

From reading their website, each bot gets its own neural network and its own reward function, so in that sense they are 5 separate agents.

When training starts out, the bots solely focus on their own reward functions. This is so they can learn very basic things like how to move around, how to attack, and so on.

Over time, the combined team reward function gradually gets weighted more and more heavily, so that teamwork is encouraged.

hcnews · on Aug 20, 2018

The pause was because human players disconnected. They covered that in the post match session. Looks like no one heard that.

PyroLagus · on Aug 22, 2018

> Therefore, the smartest pro team should study what the bot is doing and trust that on average it's a better idea to always focus on the ranged barracks first. After all, if it was a bad idea, they probably wouldn't do that.

This is exactly what is happening with Go right now. Many pros are emulating and learning from AlphaGo (Zero) and are starting to play moves that were always thought to be suboptimal until now.

shawn · on Aug 24, 2018

As someone who has no knowledge whatsoever about Go, is there some way I could learn more about what you mention? I’m really curious to see how people are trying to learn from the bot, but I’m not too sure how to find out.

jlebar · on Aug 20, 2018

> The most hilarious part was when OpenAI paused the game, then resumed it. This illustrates that there is still some unexplainable randomness.

This was done by the game coordinator -- the humans' machines DC'ed during the game. https://news.ycombinator.com/item?id=17700233

ufo · on Aug 20, 2018

I heard that the pause was a behavior scripted by the developers, which happened in response to a network connection issue.

Benjammer · on Aug 20, 2018

Yeah there's no way they're explicitly integrating pauses-in-time into the winrate calculation... There are so many other central/meta game mechanics that they haven't solved yet, it wouldn't make any sense to incorporate a behavior like strategic pausing.

somebodythere · on Aug 20, 2018

I wonder if AI self-play has any implications for e-sports balance. Could balancing be aided, or at least improved, by ultra-strong AI players, reducing the chance of broken mechanics making it to live?

jdoliner · on Aug 20, 2018

Sure seems like it would. A big part of DOTA balance these days is Icefrog (DOTA's BDFL) talking to pro players and about the meta-game and observing pro play to figure out what should change. Having pro-level AIs that you could just set lose to play 100,000 games and then come back to you with statistics about the current patch would probably help a lot.

I think it would also raise the level of play that we see from pros. That's what happened with chess bots when they got good enough to be the best human players.

shawn · on Aug 20, 2018

One specific thing that bots could help Icefrog solve: Many games are draft wins, i.e. a >90% chance of winning solely due to the heroes you've picked. There's almost no point in playing these games out for 20-40 minutes; it's boring for everyone involved.

This will always exist in dota, but being able to simulate hundreds of years of drafting + games could help reduce this.

prestonh · on Aug 21, 2018

I really doubt that at TI level the drafting advantage is significant enough to give one team 90% advantage. My guess would be closer to 2/3, granted thats still just a complete guess. I think often analysts will use the draft as an excuse to in retrospect justify the outcome of the game, but that doesn't bear on the reality of the situation at all. Though, I would be really interested to see some data that shows the significance of drafting.

Benjammer · on Aug 20, 2018

They're already made great strides into expanding the pro-level meta hero pool over the last several major patches. As an observer it feels like the viable pro hero pool is bigger than ever this year at TI8.

trocadero · on Aug 20, 2018

>Many games are draft wins, i.e. a >90% chance of winning solely due to the heroes you've picked.

Can you explain?

danielvf · on Aug 20, 2018

Each team of five picks the hero’s they will play the game with. This is like an extra complicated game of rock-paper-scissors. At the end of the picking, one teams composition may be be so setup to exploit the weakness of the other, that the actual 30-40 minute game is almost pointless.

trocadero · on Aug 21, 2018

You pick back and forth though, right? How does one screw up the draft so badly as to have only 10% chance of winning?

bigger_cheese · on Aug 21, 2018

Not mentioned was each team gets to ban out 5 heroes as well.

This is a bigger factor in my opinion. Each team alternates two bans, then two picks, then there is another round of two alternating bans followed by two picks and finally a single ban and pick round for the 5th hero.

Various in-meta heroes are usually "first pick/ban worthy" which means they tend to get picked or banned in the first phase and tend to shape the rest of the draft as teams will build the core of their strategy around the first phase heroes or around countering oppositions first phase heroes.

Another strategy is to avoid "showing your hand" during first phase by first picking strong but generic heroes that can fit into many potential lineups to keep opponent guessing. This leads to a lot of mind games where even commentators don't know what role the hero is going to be played in until the culminating 5th pick when the draft comes together.

Some teams are very good at specific strategies or have certain players exceptionally skilled at individual heroes which necessitate certain first phase bans against them lest they have an advantage.

For instance If a team is known for having a player good at the hero "Wisp" it will often force out a first phase Wisp ban from opponents because it is the kind of hero that when played well can be absolute nightmare to play against.

In some ways I find the draft mini game to be just as interesting as the main game especially in the longer tournaments where you can see new metagames emerging as captains adjust their pick strategies.

shawn · on Aug 21, 2018

There are so many drafting combinations that it's not quite obvious it's a bad idea until you hit the 3 minute mark.

Take OpenAI game 3 as an example. The first two games, OpenAI wiped the floor with the humans and taunted them that they had a >90% chance of winning. The third game, OpenAI was saying the bots had an >80% chance of losing by 5min. The sole difference was the draft.

minimaxir · on Aug 20, 2018

Training prices would have to drop significantly. It's much cheaper to take a PR hit and apply a hotfix instead of spending $100k+ on building a self-play AI.

adw · on Aug 20, 2018

If it were really just 100k it would be a huge win to have the AI. That's one developer for three to six months.

(I'd handicap the crossover somewhere between one and two orders of magnitude more expensive than that.)

IshKebab · on Aug 20, 2018

The AI cannot pause the game. That's not one of its available actions.

daveguy · on Aug 21, 2018

Yes. Yes it is.

IshKebab · on Aug 25, 2018

No. No it isn't.

Go and look how it works: https://blog.openai.com/openai-five/

daveguy · on Aug 26, 2018

I'm sorry, I must be missing the part where your link says that pause is not a valid action available to the bots just as it is available to the humans.

minimaxir · on Aug 20, 2018

From a presentation standpoint, I am impressed by and appreciate the effort in making the project process transparent and accessible, even to those without an AI background (in contrast to recent AI literature which tends to obfuscate the secret sauce).

furi · on Aug 21, 2018

There is nothing transparent about OpenAI. They have never released any of their models to the public, despite the fact that their models play completely different strategies to humans in an extremely heavily modified version of the game (multiple updates out of date, 80%+ of the heroes turned off, many core mechanics disabled or modified beyond recognition). Without them releasing the models for people to practice against there is absolutely no way to tell the difference between AI superiority and the humans being unfamiliar with the enemy tactics and even the very game they are playing. Compared to actual professional Dota, where pros have tens or hundreds of matches played by their opponents to study, an ecosystem of thousands of top level players hashing out new strategies for each patch and months to practice that particular version of the game, this is not a test I would call "open".

Leary · on Aug 20, 2018

The same 18 heroes? While impressive this is less of an improvement since the August 5th match even if they beat the pro team.

I thought they'd at least remove more of the rules (5 couriers, no illusions) or add some heroes.

nstart · on Aug 21, 2018

Not sure if they can remove the rules of no illusions. The bots would completely wreck the humans if they were allowed to use illusions. I don't even want to think what would happen if they learnt how to use phantom lancer or nature's prophet. Most people throw all their illusions/summons into one bucket and the hero into another. The AI being able to control each unit perfectly would be terrifying.

evozer · on Aug 21, 2018

But it would be fun to see NP bodyblock the entire enemy team with one set of treants.

foobaw · on Aug 20, 2018

I wonder if any updates have been made since the last match to remove more restrictions. The most common complaint from users was the courier changes.

exabrial · on Aug 21, 2018

I really want to see them play humans with no restrictions on the humans! I get it they're still in the learning phase but I want to see the gloves off

an_opabinia · on Aug 20, 2018

Is dealing with imperfect information a research goal?

Does the OpenAI team think there's a way to adapt the UX of DOTA 2 "Perfect Information Edition" to communicate the game better to human players?

modeless · on Aug 20, 2018

AFAIK the bot's "vision" is subject to fog of war, so it's not a perfect information game in the usual sense. Yes, it gets precise numerical values for hit points etc from the API, but only for visible units.

Honestly I think that it would not be much more difficult to train a bot that looks at screen pixels and outputs keyboard and mouse events instead of using the bot API. In fact it might be easier to code, but the problem is it would require several orders of magnitude more processing power to train, which is impractical. I am confident it would work if the processing power was available, given the success of these techniques on other problems.

a_humean · on Aug 20, 2018

Probably bigger than you think. The AI has perfect information about everything in vision with few exceptions.

If they have to use the standard UI they lose a significant amount of information as they only view a very small percentage of the map per frame + the minimap. Just think about the implications for team fights. If all the agents have different information about a fight, then you aren't necessarily going to see the uncanny stun stacking and perfect long range nukes. The AI cannot assume that all the other agents share the same information, and so the other agents might not be as predictable. I imagine fragmented information would dramatically change behavior, for the worse. They will probably act more like human players - more cautiously and with more mistakes due to "miscommunication".

itsaporter · on Aug 21, 2018

One thing to note is that the AI does not orchestrate all 5 players on the team at once like 1 superhuman player would. Instead it runs 5 instances of itself on each of the heroes and they play together based on how selfish they need to be in any given moment. This earlier blog post[1] has some additional information about how the bot works, including an interactive view into the bot's eyes.

>"OpenAI Five does not contain an explicit communication channel between the heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed “team spirit”. Team spirit ranges from 0 to 1, putting a weight on how much each of OpenAI Five’s heroes should care about its individual reward function versus the average of the team’s reward functions. We anneal its value from 0 to 1 over training."

[1] https://blog.openai.com/openai-five/

drexlspivey · on Aug 20, 2018

This would require the bot to learn to point the screen to the right place

modeless · on Aug 20, 2018

It would require the bot to learn a lot of things. Perhaps a curriculum learning method would be appropriate. I don't see any reason why it wouldn't be possible though, given several orders of magnitude more compute power.

Benjammer · on Aug 20, 2018

Fog of war on the map doesn't depend on where a single player's viewport is pointing...

dbelchamber · on Aug 20, 2018

I'm very excited about this. When I watch this new breed of AI play, I find it really interesting what they value and greatly enjoy speculating as to why in human terms.

nstart · on Aug 21, 2018

I watched the Open AI play against the "team" of pros at the calibration match earlier this month. Couple of observations and takeaways.

The first is that the bot strategy currently revolves around the special rule of 5 invulnerable couriers. Bots find microing lots of units effortless, so the map constantly showed each bot's courier flying back and forth carrying regen. The bots never had to really go back to base or their shrines to heal. This is important because it changes the meta of the game entirely. The way the game is structured allows only one (very vulnerable) courier per team. Usually this means that after a team fight, teams need to reset since they've expended significant resources for the fight. But that meta was non existent under the rules for matches against the Open AI five. The humans had trouble coping with this as they weren't used to the idea of ferrying regen constantly.

Takeaways here - I could go on about the nuances of a single courier. But basically, the bots' gameplay will likely have to change once it comes down to 1 shared courier per team. Not sure how that will affect the architecture of a "no shared mind". Also, humans will likely need to take a page out of this gameplay and realise that couriers are a highly underutilized resource. Every second it's not doing something for no reason is just as bad as a hero not doing anything.

The second observation comes from the last game of AI vs pro humans. This was an interesting game where the audience picked a losing set of heroes for the team. Despite a predicted chance of winning being less then 2% (iirc) he AI could have probably won on account of being mechanically better than the humans. But their insistence on sticking to a strategy of "push hard" found them doing really strange things. The strangest of this was Slark running ahead to cut down creep waves in the lane on its own. The human players knew this would happen and they kept forcing the Slark to go hide in the trees and at some point they were always able to corner it and get the kill. Over and over again. The Slark never changed.

Similar things happened around the map during this game.

What should have happened was that the AI should have adapted to its disadvantage, and poured its efforts into first defending and then snowballing later with its mechanical advantage. But that element of "intelligence" was never there.

The takeaway is this. The AI will eventually beat the humans on account of them being always mechanically better. They need very slight changes in their strategy to win 99.9% of the time. They can be aggressive beyond any human possibility because they can calculate everything to perfection. How long it will take them to travel across the map vs how much longer it will take for an opposing hero to have its ultimate ready for example. There are a lot of mechanical components to Dota that the AI will always have an advantage over. But the AI will likely always reveal quirks that can be turned into dumb winning strategies (aka cheese strats). Something like the whole team fighting from the trees for example might just confuse the AI terribly. We don't know but every now and then someone will discover it and the teams working on the AI will have to "patch" the behaviour.

Final takeaway from all of that - I'm not sure if training the AI towards "objectives" is really the best metric towards making an intelligent bot. It seems like what's instead happening is that we get software that has no intelligence at adapting in the moment to things its never seen even if they are brain dead. But it'll get better at hiding them through mechanical perfections.

Upside - We get AI's capable of doing increasingly complex things in a seemingly perfect manner.

Downside - We get a scary future of AI filled with byzantine issues that need to be "patched".