Two robots of the same type

Two robots of the same type#

The code for this example is implemented same_robots. Let us import it.

[1]:
from enki_env.examples import same_robots

Environment#

The environment contains just two Thymio robots. The initial position of the robots is fixed while their orientation is sampled uniformly.
The robots’ goal is to rotate so to face each other.

To create the environment via script, run:

python -m enki_env.examples.same_robot.environment
[2]:
env = same_robots.make_env(render_mode="human")
env.reset()
env.snapshot()
snapshot

The robots belong to the same "thymio" group and share the same configuration.

[3]:
env.group_map
[3]:
{'thymio': ['thymio_0', 'thymio_1']}

Like in the single robot example, the robots use just their proximity sensors and receive a similar reward that makes them want to rotate until they face each other, when the episode terminates.

[4]:
env.action_spaces
[4]:
{'thymio_0': Box(-1.0, 1.0, (1,), float64),
 'thymio_1': Box(-1.0, 1.0, (1,), float64)}
[5]:
env.observation_spaces
[5]:
{'thymio_0': Dict('wheel_speeds': Box(-1.0, 1.0, (2,), float64), 'prox/value': Box(0.0, 1.0, (7,), float64)),
 'thymio_1': Dict('wheel_speeds': Box(-1.0, 1.0, (2,), float64), 'prox/value': Box(0.0, 1.0, (7,), float64))}

Baseline#

We have hand-coded a simple distributed policy to achieve the task.

To evaluate the baseline via script, run:

python -m enki_env.examples.same_robots.baseline
[6]:
import inspect

print(inspect.getsource(same_robots.Baseline.predict))
    def predict(self,
                observation: Observation,
                state: State | None = None,
                episode_start: EpisodeStart | None = None,
                deterministic: bool = False) -> tuple[Action, State | None]:
        prox = np.atleast_2d(np.array(observation['prox/value']))
        m = np.max(prox, axis=-1)
        prox[m > 0] /= m[:, np.newaxis][m > 0]
        ws = np.array([(0.5, 0.25, 0, -0.25, -0.5, 1, 1)])
        w = np.tensordot(prox, ws, axes=([1], [1]))
        w[m == 0] = 1
        return np.clip(w, -1, 1), None

To perform a rollout, we need to assign the policy to the whole group.

[7]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': same_robots.Baseline()})

For multi-robot environments, the rollouts return a dictionary with data collected from each group,

[8]:
rollout.keys()
[8]:
dict_keys(['thymio'])
[9]:
rollout['thymio'].episode_reward
[9]:
np.float64(-22.821592696832248)

Reinforcement Learning#

Let us now train and evaluate a RL policy for the same task.

To perform this via script, run: ```console python -m enki_env.examples.same_robots.rl

[10]:
policy = same_robots.get_policy()
[11]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': policy})
rollout['thymio'].episode_reward
[11]:
np.float64(-9.336332451263782)

Video#

To generate a similar video as in the single robot example, run

python -m enki_env.examples.same_robots.video

or run

[12]:
video = same_robots.make_video()
video.display_in_notebook(fps=30, width=640, rd_kwargs=dict(logger=None))
[12]:
[ ]: