1. Table of contents

2. Introduction

Inspection, repair and maintenance is a multi-billion business that is projected to grow substaintially in the next decade, with integration of automation and AI being the driving force. At its core, the primary goal in an inspection mission is to capture images on the surface of some structures at the best possible quality. However, exploration is also a secondary objective that needs to be addressed to identify the structure and its surface. Oftentimes, bounding box(es) can be set around the target of interest to narrow the area of exploration.

Thanks to their unique mobility, aerial robots have become widely adopted for inspection tasks. We believe that the next breakthrough in this industry shall be delivered by Cooperative Aerial Robots Inspection (CARI) systems. Moreover, in the same spirit of economic specialization, heterogeneous CARI systems has the potential to acheive greater efficiency, quality and versatility compared to single-UAV or homogenous systems. Nevertheless, to accomplish this vision, novel cooperative strategies that can optimally coordinate the operation of multiple robots remain an open research problem.

To accelerate this development, we introduce the Cooperative Aerial Robots Inspection Challenge Benchmark, a software stack based on Gazebo, RotorS and other open-source packages. The objective of CARIC is twofold. First, it aims to faithfully simulate multi-UAV systems operating in typical real-world inspection missions. Second, based on this tool, different cooperative inspection schemes can be benchmarked based on a common metric. The software stack is made public to benefit the community and we would like to welcome all who are interested to participate in the challenge to be held at CDC 2023, Singapore.

CARIC software package can simulate UAV dynamics, physical collisions, camera-FOV-based instance detection, and line-of-sight-only communications

3. CARIC at CDC 2023

3.1. How to participate?

The challenge’s procedure is as follows:

3.2. Important dates

3.3. Prize

The winner will receive a material or monetary prize of SGD1000 and a certificate. If you are interested in being a sponsor of the challenge, please contact any of the organizers.

4. Installation

The system is principally developed and tested on the following system configuration:

4.1. Install the dependencies

4.1.1. Ubuntu 20.04 + ROS Noetic

Please install the neccessary dependencies by the following commands:

# Update the system
sudo apt-get update && sudo apt upgrade ;

# Install some tools and dependencies
sudo apt-get install python3-wstool python3-catkin-tools python3-empy \
                     protobuf-compiler libgoogle-glog-dev \
                     ros-$ROS_DISTRO-control-toolbox \
                     ros-$ROS_DISTRO-octomap-msgs \
                     ros-$ROS_DISTRO-octomap-ros \
                     ros-$ROS_DISTRO-mavros \
                     ros-$ROS_DISTRO-mavros-msgs \
                     ros-$ROS_DISTRO-rviz-visual-tools \
                     ros-$ROS_DISTRO-gazebo-plugins \

# Install gazebo 11 (default for Ubuntu 20.04)
sudo apt-get install ros-noetic-gazebo* ;

Please check out the notes below if you encounter any problem.

4.1.2. Ubuntu 18.04 + ROS Melodic

Please install the neccessary dependencies by the following commands:

# Update the system
sudo apt-get update && sudo apt upgrade ;

# Install some tools and dependencies
sudo apt-get install python-wstool python-catkin-tools python-empy \
                     protobuf-compiler libgoogle-glog-dev \
                     ros-$ROS_DISTRO-control-toolbox \
                     ros-$ROS_DISTRO-octomap-msgs \
                     ros-$ROS_DISTRO-octomap-ros \
                     ros-$ROS_DISTRO-mavros \
                     ros-$ROS_DISTRO-mavros-msgs \
                     ros-$ROS_DISTRO-rviz-visual-tools \

# Install gazebo 11 (default in melodic is gazebo 9)
sudo sh -c 'echo "deb http://packages.osrfoundation.org/gazebo/ubuntu-stable `lsb_release -cs` main" > /etc/apt/sources.list.d/gazebo-stable.list'
wget https://packages.osrfoundation.org/gazebo.key -O - | sudo apt-key add -
sudo apt update
sudo apt-get install ros-melodic-gazebo11-ros-control \
                     ros-melodic-gazebo11-ros \
                     ros-melodic-gazebo11-msgs \
                     ros-melodic-gazebo11-plugins \
                     ros-melodic-gazebo11-ros-pkgs \

Please check out the notes below if you encounter any problem.

4.1.3. Notes

4.2. Install the CARIC packages

Once the dependencis have been installed, please create a new workspace for CARIC, clone the necessary packages into it, and compile:

# Create the workspace
mkdir -p ~/ws_caric/src
cd ~/ws_caric/src

# Download the packages:

# Manager node for the mission
git clone https://github.com/ntu-aris/caric_mission

# Simulate UAV dynamics and other physical proccesses
git clone https://github.com/ntu-aris/rotors_simulator

# GPU-enabled lidar simulator, modified from: https://github.com/lmark1/velodyne_simulator
git clone https://github.com/ntu-aris/velodyne_simulator

# Converting the trajectory setpoint to rotor speeds
git clone https://github.com/ntu-aris/unicon

# To generate an trajectory based on fixed setpoints. Only used for demo, to be replaced by user's inspection algorithms
git clone https://github.com/ntu-aris/traj_gennav

# Build the workspace
cd ~/ws_caric/
catkin build

The compilation may report errors due to missing depencies or some packages in CARIC are not yet registered to the ros package list. This can be resolved by installing the missing dependencies (via sudo apt install <package> or sudo apt install ros-$ROS_DISTRO-<ros_package_name>)). Please try catkin build again a few times to let all the compiled packages be added to dependency list.

4.3. Run the flight test

To make sure the code compiles and runs smoothly, please launch the example flight test with some pre-defined fixed trajectories as follows:

source ~/ws_caric/devel/setup.bash
roscd caric_mission/scripts
bash launch_demo_paths.sh

You should see 5 UAVs take off, follow fixed trajectories, and fall down when time is out.

5. The benchmark design

5.1. The UAV fleet

A 5-UAV team is designed for the challenge, two of the explorer class (nicknamed jurong and raffles), and three of the photographer class (changi, sentosa, and nanyang), plus one GCS (Ground Control Station). Each unit has an intended role in the mission.

An illustration of one GCS, one explorer, and two photographers in Gazbo environment.

Note that the explorer is twice the size and weight of the photographer. Thanks to the bigger size, it can carry the lidar and quickly map the environment, at the cost of slower speed. In contrast, the photographers have higher speed, thus they can quickly cover the surfaces that have been mapped by the explorer to obtain images of higher score. The GCS’s role is to compare the images taken by the drones. For each interest point, the GCS can select the image with the best quality to assign the score to it.

5.2. Inspection scenarios

The following scenarios are included in the challenge:

Note that for different inspection scenarios, different number of drones and camera settings are used. The setting for each scenario is indicated in the description file caric_ppcom_network_NAME_OF_SCENARIO.txt in the folder.

5.3. The benchmark criteria

5.3.1. Inspection mission overview

The mission time starts from the moment any UAV takes off (when it’s velocity exceeds 0.1m/s and it’s altitude exceeds 0.1m). When the time elapses, all drones will shut down and no communication is possible. User can design any CARI strategy that follows the ground rules to acheive the highest inspection score within the finite mission time.

During the mission, the GCS will receive the information regarding the captured interest points from the UAVs when there is LOS. The captures are compared and the score will be tallied and published in real time under the /gcs/score topic. After each mission, a log file will be generated in the folder specified under the param log_dir in the launch file of caric_mission package.

In practice operators in the field can subjectively limit the area to be inspected within an area. Thus, in each mission a sequence of bounding boxes are given to help limit the exploration effort. The interest points will only be located inside these bounding boxes. Details on the bounding boxes can be found in the later section on onboard perception data.

5.3.2. The mission score

Let us denote the set of the interest point as \(I\), and the set of UAVs as \(N\). At each simulation update step \(k\), we denote \(q_{i,n,k}\) as the score of the interest point \(i\) captured by UAV \(n\). Specifically \(q_{i,n,k}\) is calculated as follows:

\[q_{i,n,k} = \begin{cases} \displaystyle 0, & \text{if } (k = 0) \text{ or } (q_\text{seen} \cdot q_\text{blur} \cdot q_\text{res} < 0.2),\\ q_\text{seen} \cdot q_\text{blur} \cdot q_\text{res}, & \text{otherwise.} \end{cases}\]

where \(q_\text{seen} \in \{0, 1\}\), \(q_\text{blur} \in [0, 1]\), \(q_\text{res} \in [0, 1]\) are the LOS-FOV, motion blur, and resolution metrics, which are elaborated in the subsequent sections. The above equation also implies that an interest point is only detected when its score exceeds a threshold, which is chosen as 0.2 in this case.

At the GCS, the following will be calculated:

\[q_{i, \text{gcs}, k} = \max \left[ \max_{n \in \mathcal{N}_\text{gcs}} (q_{i, n, k}),\ q_{i, \text{gcs}, k-1} \right],\\\]

where \(\mathcal{N_\text{gcs}}\) is the set of UAVs that have LOS to the GCS. This reflects the process that the GCS receives the images captured by the drones and selects the one with the highest score and keep that information in the memory. Moreover, the stored data will also be compared against the future captures.

The score of the mission up to time \(k\) is computed as follows:

\[Q_k = \sum_{i \in I} q_{i, \text{gcs}, k}.\]

Hence, the mission score will be \(Q_k\) at the end of the mission.

Below we will explain the processes used to determine the terms \(q_\text{seen}\), \(q_\text{blur}\), \(q_\text{res}\) in the calculation of \(q_{i,n,k}\).

5.3.3. LOS and FOV

The term \(q_\text{seen}\) is a binary-valued metric value that is 1.0 when the interest point falls in the field of view (FOV) of the camera, and the camera has direct line of sight (LOS) to the interest point (not obstructed by any other objects), and 0.0 otherwise. The camera horizontal FOV and vertical FOV are defined by the parameters HorzFOV and VertFOV in the description files for the respective inspection scenarios. Note that the camera orientation can be controlled as described in the section Camera gimbal control.

5.3.4. Motion blur

The motion blur metric \(q_\text{blur}\) is based on the motion of the interest point during the camera exposure duration \(\tau\). This value is declared in the parameter ExposureTime in the description files. It can be interpreted as the number of pixels that an interest point moves across during the exposure time, i.e.:

\[\displaystyle q_\text{blur} = \min\left(\dfrac{c}{\max\left(|u_1-u_0|, |v_1-v_0|\right)}, 1.0\right),\]

where \(c\) is the pixel width and \(\|u_1-u_0\|\), \(\|v_1-v_0\|\) are the horizontal and vertical movements on the image plane that are computed by:

\[u_0 = f\cdot\dfrac{x_0}{z_0},\ u_1 = f\cdot\dfrac{x_1}{z_1},\ v_0 = f\cdot\dfrac{y_0}{z_0},\ v_1 = f\cdot\dfrac{z_1}{z_1},\\\] \[[x_1,y_1,z_1]^\top = [x_0,y_0,z_0]^\top + \mathbf{v}\cdot\tau,\]

with \(f\) being the focal length, \([x_0,y_0,z_0]^\top\) the position of the interest point at the time of capture, and \([x_1,y_1,z_1]^\top\) the updated position considering the velocity of the interest point in the camera frame at the time of capture, denoted as \(\mathbf{v}\) (see our derivation for this velocity at the following link). The figure below illustrates the horizontal motion blur by showing the horizontal (X-Z) plane of the camera frame, where the vertical motion blur can be interpreted similarly.

Illustration of horizontal resolution computation

Note that for an sufficiently sharp capture, it only requires that the movement of the interest point is smaller than 1 pixel. Therefore we cap the value of \(q_\text{blur}\) at 1.0. Otherwise if the UAV stays static during the capture, \(q_\text{blur}\) could be \(\infty\).

5.3.5. Image resolution

The resolution of the image is expressed in milimeter-per-pixel (mmpp), representing the size of the real-world object captured in one image pixel. To achieve a satisfactory resolution, the computed horizontal and vertical resolutions should be smaller than a desired mmpp value. We will explain our derivation of the mmpp metric below.

In this part we will consider all coordinates in the camera frame. Given the coordinate of the interest point \(p_i\) and its surface normal \(n\), we can define the so-called interest plane \(\mathscr{P} \triangleq \{p \in \mathbb{R}^3 : n^\top (p - p_i) = 0\}\). We also define the so-called horizontal plane \(\mathscr{U} \triangleq \{p \in \mathbb{R}^3 : [0, 1, 0]^\top (p - p_i) = 0\}\) and vertical plane \(\mathscr{V} \triangleq \{p \in \mathbb{R}^3 : [1, 0, 0]^\top (p - p_i) = 0\}\). Hence, we displace the interest point by \(\pm 0.5\) mm along the intersection line \(\mathscr{P} \cap \mathscr{U}\) and find the corresponding length of the object in the image. The image below illustrates this process, where the length of the object in the image is expressed as \(\|u_1-u_2\|\).

Illustration of horizontal resolution computation

We follow the same procedure for the line \(\mathscr{P} \cap \mathscr{V}\) to obtain the object’s length variation in the image \(r_\text{vert}=\frac{c}{\|v_2 - v_1\|}\). The horizontal and vertical resolutions are then computed as \(r_\text{horz} = \frac{c}{\|u_2 - u_1\|}\) and \(r_\text{vert}=\frac{c}{\|v_2 - v_1\|}\). The resolution metric of the point is therefore calculated as:

\[\displaystyle q_\text{res} = \min\left(\dfrac{r_\text{des}}{\max\left(r_\text{horz}, r_\text{vert}\right)}, 1.0\right),\]

where \(r_\text{des}\) is the desired resolution in mmpp.

6. Developing your CARI scheme

6.1. Ground rules

The following rules should be adhered to in developing a meaningful CARI scheme:

6.2. Onboard perception data

CARIC is intended for investigating cooperative control schemes, hence perception proccesses such as sensor fusion, SLAM, map merging, etc… are assumed perfect (for now). To fullfill feedback control, mapping, and obstacle avoidance tasks… users can subscribe to the following topics:


6.3. UAV control interface

Whatever control strategy is developed, the control signal should be eventually converted to standard multi-rotor command. Specifically the UAVs are controlled using the standard ROS message trajectory_msgs/MultiDOFJointTrajectory. The controller subscribes to the command trajectory topic /<unit_id>/command/trajectory. Below are sample codes used to publish a trajectory command in traj_gennav_node.cpp, given 3d target states in the global(world) frame target_pos, target_vel, target_acc and a target yaw target_yaw:

trajectory_msgs::MultiDOFJointTrajectory trajset_msg;
trajectory_msgs::MultiDOFJointTrajectoryPoint trajpt_msg;

geometry_msgs::Transform transform_msg;
geometry_msgs::Twist accel_msg, vel_msg;

transform_msg.translation.x = target_pos(0);
transform_msg.translation.y = target_pos(1);
transform_msg.translation.z = target_pos(2);
transform_msg.rotation.x = 0;
transform_msg.rotation.y = 0;
transform_msg.rotation.z = sinf(target_yaw*0.5);
transform_msg.rotation.w = cosf(target_yaw*0.5);


vel_msg.linear.x = target_vel(0);
vel_msg.linear.y = target_vel(1);
vel_msg.linear.z = target_vel(2);

accel_msg.linear.x = target_acc(0);
accel_msg.linear.x = target_acc(1);
accel_msg.linear.x = target_acc(2);


trajset_msg.header.stamp = ros::Time::now();
trajectory_pub.publish(trajset_msg); //trajectory_pub has to be defined as a ros::Publisher

There are multiple ways you can control the robots:

Full-state control: consider you have computed the future trajectory of a robot with timestamped target position, velocity, acceleration and yaw. You may publish the target states at the desired timestamp using the above example code.

Position-based control: you may also send non-zero target positions and yaw with zero velocity and acceleration, the robot will reach the target and hover there. Note that if the target position is far from the robot’s target position, aggresive movement of the robot is expected.

Velocity/acceleration-based control: when setting target positions to zeros and setting non-zero velocities or accelerations, the robot will try to move with the desired velocity/acceleration. The actual velocity/acceleration may not follow the desired states exactly due to the realistic low level controller. Hence, the users are suggested to take into account the state feedback when generating the control inputs.

You may try controlling a drone and its camera gimbal manually using keyboard inputs, see instructions.

6.3.1. Camera gimbal control

Confusion can easily arise when handling rotation. It is recommended that users go through our technical note for an overview of the underlying technical conventions.

The camera is assummed to be installed on a camera stabilizer (gimbal) located at [CamPosX, CamPosY, CamPosZ] in the UAV’s body frame. The stabilizer will neutralize the pitch and roll motion of the body, and user can control two extra DOFs relative to the stabilizer frame. We shall call these DOFs gimbal yaw and gimbal pitch (different from the body frame’s yaw and pitch angle). The gimbal control interface is the topic /<unit_id>/command/gimbal of type geometry_msgs/Twist. An example is shown below:

  geometry_msgs::Twist gimbal_msg;
  gimbal_msg.linear.x  = -1.0;  //setting linear.x to -1.0 enables velocity control mode.
  gimbal_msg.linear.y  =  0.0;  //if linear.x set to 1.0, linear,y and linear.z are the 
  gimbal_msg.linear.z  =  0.0;  //target pitch and yaw angle, respectively.
  gimbal_msg.angular.x =  0.0; 
  gimbal_msg.angular.y = target_pitch_rate; //in velocity control mode, this is the target pitch velocity
  gimbal_msg.angular.z = target_yaw_rate;   //in velocity control mode, this is the target yaw velocity

As explained in the comments in the sample code, the interface is catered for both angle-based or rate-based control. When gimbal_msg.linear.x is set to 1.0, the fields gimbal_msg.linear.y and gimbal_msg.linear.z indicate the target pitch and yaw angle, respectively. The pitch and yaw angles are controlled independently: given a target pitch/yaw angle, the gimbal will move with the maximum pitch/yaw rate defined by the parameter GimbalRateMax (in degree/s) until reaching the target. In velocity control mode, the gimbal pitch/yaw rates can be set to any value in the range [-GimbalRateMax,+GimbalRateMax]. The gimbal pitch and yaw only operate in the ranges [-GimbalPitchMax,+GimbalPitchMax] and [-GimbalYawMax,+GimbalYawMax], respectively.

Note that the terms “yaw”, “pitch”, “roll” angles follow the (in Z-Y-X intrinsic rotation sequence) convention. Also, we define the camera frame with its X-axis perpendicular to the image plane, and Z-axis pointing upward in the image plane.

The camera states relative to the stabilizer coordinate frame are published through the topic /<unit_id>/gimbal of type geometry_msgs/TwistStamped. The fields twist.linear indicates the euler angle while the fields twist.angular indicates the angular rates (a bit of deviation from the original meaning of the message type).

6.3.2. Camera trigger

Two camera trigger modes are allowed. If the parameter manual_trigger is set to false, the robot will automatically trigger camera capture at a fixed time interval defined by the parameter TriggerInterval. If the parameter manual_trigger is set to true, the user may send camera trigger command by publishing to a topic /<unit_id>/command/camera_trigger of type rotors_comm/BoolStamped:

rotors_comm::BoolStamped msg;
msg.header.stamp = ros::Time::now();
msg.data = true;
trigger_pub.publish(msg);  //trigger_pub has to be defined as a ros::Publisher

Note that in manual trigger mode, the time stamps of two consecutive trigger commands should still be separated by an interval larger than the parameter TriggerInterval, otherwise, the second trigger command will be ignored. The benefit of using manual trigger is that the users may send the trigger command at the exact time that results in the best capture quality.

6.4. Communication network

Each robot is given a unique ID in a so-called ppcom network, for e.g. gcs, jurong, changi. These IDs are declared in the description file.

In real-world conditions, communications between the units can be interrupted by obstacles that block the LOS between them. To subject a topic to this effect, users can do the following:

Illustration of communication among ros nodes under different namespaces via the ppcom_router: In the first terminal we start the the simulation. Net the ppcom_router node is launch. Then ppcom_firefly1_talker, ppcom_firefly2_talker, ppcom_firefly3_talker are launched in different terminals. You can observe the messages sent and received by each node in the corresponding terminal. Notice how the messages are dropped when the corresponding entry in the LOS matrix turns to 0 (the firefly1 --> firefly3 LOS status is indicated by the entry at 2nd row, 4th column).

7. Baseline Method

A baseline method is provided for participants as an example. The technical report for the baseline method can be found here.

8. Organizers

Drawing Drawing Drawing Drawing Drawing
Thien-Minh Nguyen Muqing Cao Shenghai Yuan Lihua Xie Ben M. Chen
thienminh.npn@ieee.org mqcao@ntu.edu.sg shyuan@ntu.edu.sg elhxie@ntu.edu.sg bmchen@cuhk.edu.hk