Introduction#
Enki_env lets you use pyenki with common ML-libraries by wrapping it in environments that are compatible with Gymnasium, PettingZoo, and PyTorchRl.
Users can setup environments for any task involving the ground robots implemented by pyenki, i.e., e-pucks, marxbots and thymios, by combining two parts:
a scenario that generates a world populated with robots and static objects,
a configuration for each group of robots that defines which observations to include, how to actuate actions, which rewards to assign, and so on.
In the simplest case, we control a single robot, for example an e-puck. For this, we generate a world that contains the robot and some object to interact with. We could define a task where the e-puck uses its 8 proximity sensors to turn towards a nearby object.
import math
import enki_env
import gymnasium
import pyenki
class Scenario:
def init(world: pyenki.World) -> None:
robot = pyenki.EPuck()
robot.angle = world.random_generator.uniform(0, 2 * math.pi)
world.add_object(robot)
obj = pyenki.PhysicalObject(radius=10, height=10, mass=-1)
obj.position = (10, 0)
world.add_object(obj)
def reward(robot: pyenki.DifferentialWheeled, success: bool | None
) -> float:
return -1.0 if math.cos(robot.angle) < 0.9 else 0.0
env = gymnasium.make("Enki",
scenario=Scenario(),
config=enki_env.EPuckConfig(reward=reward)
The environment is now ready for training or for evaluation. For example, we can compute the reward collected by a random policy during an episode:
>>> env.unwrapped.rollout(max_steps=10).episode_reward
-10.0
In the more general case, we control multiple robots, possibly of different types. Robots that share the same configuration are grouped together. For example, we could create an environment where two e-pucks use the camera while three other e-pucks use the proximity sensors.
import enki_env
import pyenki
class Scenario:
def init(world: pyenki.World) -> None:
rng = world.random_generator
for _ in range(2):
robot = pyenki.EPuck(camera=True)
robot.position = (
rng.uniform(-10, 10),
rng.uniform(-10, 10))
robot.name = 'e-puck-camera'
world.add_object(robot)
for _ in range(3):
robot = pyenki.EPuck(camera=False)
robot.position = (
rng.uniform(-10, 10),
rng.uniform(-10, 10))
world.add_object(robot)
config = enki_env.EPuckConfig()
config_camera = enki_env.EPuckConfig()
config_camera.observation.camera = True
config_camera.observation.proximity_value = False
configs = {'e-puck': config, 'e-puck-camera': config_camera}
env = enki_env.parallel_env(Scenario(), configs)
env.reset(seed=0)
In the environment, robots are identified by a string <group>_<index>.
>>> print(env.agents)
['e-puck_0', 'e-puck_1', 'e-puck_2', 'e-puck-camera_0', 'e-puck-camera_1']
Robots in the same group share the same action and observation spaces, reward function, and (when assigned) policy.
>>> print(env.group_map)
{'e-puck': ['e-puck_0', 'e-puck_1', 'e-puck_2'], 'e-puck-camera': ['e-puck-camera_0', 'e-puck-camera_1']}
Robots in different groups will instead apply different policies and receive rewards from possibly different reward functions.
See also
More comprehensive examples.