Trendlist: [New post] Generally capable agents emerge from open-ended play

Monday, August 30, 2021

[New post] Generally capable agents emerge from open-ended play

Generally capable agents emerge from open-ended play

by Matteo

Measuring progress

To measure how agents perform within this vast universe, we create a set of evaluation tasks using games and worlds that remain separate from the data used for training. These "held-out" tasks include specifically human-designed tasks like hide and seek and capture the flag.

Because of the size of XLand, understanding and characterising the performance of our agents can be a challenge. Each task involves different levels of complexity, different scales of achievable rewards, and different capabilities of the agent, so merely averaging the reward over held out tasks would hide the actual differences in complexity and rewards — and would effectively treat all tasks as equally interesting, which isn't necessarily true of procedurally generated environments.

To overcome these limitations, we take a different approach. Firstly, we normalise scores per task using the Nash equilibrium value computed using our current set of trained players. Secondly, we take into account the entire distribution of normalised scores — rather than looking at average normalised scores, we look at the different percentiles of normalised scores — as well as the percentage of tasks in which the agent scores at least one step of reward: participation. This means an agent is considered better than another agent only if it exceeds performance on all percentiles. This approach to measurement gives us a meaningful way to assess our agents' performance and robustness.

More generally capable agents

After training our agents for five generations, we saw consistent improvements in learning and performance across our held-out evaluation space. Playing roughly 700,000 unique games in 4,000 unique worlds within XLand, each agent in the final generation experienced 200 billion training steps as a result of 3.4 million unique tasks. At this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human. And the results we're seeing clearly exhibit general, zero-shot behaviour across the task space — with the frontier of normalised score percentiles continually improving.

Source link

Matteo | August 30, 2021 at 11:26 am | Tags: agents, capable, emerge, Generally, openended, Play | Categories: Artificial Intelligence | URL: https://wp.me/pcAtNR-2VZ

Comment

Trendlist

Monday, August 30, 2021

[New post] Generally capable agents emerge from open-ended play

Generally capable agents emerge from open-ended play

Measuring progress

More generally capable agents

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own

Report Abuse

Labels

Monday, August 30, 2021

[New post] Generally capable agents emerge from open-ended play

New post on Matteo Sala

Generally capable agents emerge from open-ended play

Measuring progress

More generally capable agents

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own