Camereon Model Tracker

Real-time 6-DOF pose tracking of rigid objects based on CAD models and monocular images

( English / 中文 )

Introduction

Theory

This algorithm is used to calculate the 6-DOF pose, including translation and rotation, of a 3D object relative to a monocular camera in real time, which can be used in the fields of augmented reality, robotic grasping, and target tracking etc.. It is similar with VisionLib and Vuforia Model Target. The algorithm uses the CAD model of the target object to extract 3D edge features, and extracts 2D edge features in the RGB image acquired by the monocular camera, and then establish the correspondence between the 3D features and 2D features, so as to solve the object's pose.

Features

Only CAD models and monocular images are used, without Al pre-training.
Stable and fast 6DOF pose tracking @60FPS.
Automatic features detection and update during tracking process.
Improved robustness against cluttered background, partial occlusion, and fast motion.
Certain automatic re-initialization ability after tracking failure.
Enhanced robustness with the integration of SLAM.
Cross-platform supported.

Demo

Video

Scenes

Based on the above principle, this algorithm is mainly applicable to (but not limited to) texture-less scenes that natively have CAD models, such as the industrial field. Meanwhile, given its dependence on edge features, the geometric structure of the object and the number of edges will have a certain impact on the tracking effect, so the model and parameter selection should follow suggestions from VisionLib or suggestions from Vuforia.

Implementation

The author implemented the algorithm kernel using C . Meanwhile, in order to make it easier to use, the author has also developed a Unity plugin based on the kernel. There are two versions of this plugin: ARFoundation version and MRTK version. The former is suitable for iOS/Android devices and the latter is suitable for Microsoft Hololens 2. Most features can also work based on the video stream from the native camera directly without relying on the ARFoundation, but ARFoundation can bring two benefits: 1. Fusion with SLAM can improve the stability of the pose and the ability of self-recovery; 2. ARFoundation provides camera intrinsics in real time without requiring the user to pre-calibrate the camera (especially those with Auto Focus).
The plugin is currently a lab prototype version and has not been fully tested, there may be adaptation bugs on some devices. During the trial period, the algorithm kernel runs for 5 minutes from the time the program starts, and stops automatically after the timeout.

Preparation

Device Support

For iOS devices, most of them in recent years with ARKit are supported.
For Android devices, ARCore support is required, please see the official device support list. For those devices that are not on the list, there are some tricks to support ARCore.
For Microsoft Hololens, only the second generation is supported.

Software

ARFoundation 5.1.0
MRTK 2.8.3
Unity 2022.3 LTS

Installation

(ARFoundation version only) Install ARFoundation through Unity Package Manager, and install ARKit XR or ARCore XR according to the target platform. Tutorial
(MRTK version only) Install MRTK through MixedRealityFeatureTool. Tutorial
Download Camereon unitypackage from the Release page.

Model Preparation

There are no special requirements for the format of the CAD model, as long as it can be imported into Unity and rendered properly.
Textures are not needed, the color doesn't matter either.
The model's local coordinate system should be right-handed，and Y-up is recommended.
The model's unit should be "meter".
The origin of the local coordinate system should preferably be located near the center of the model to facilitate the adjustment of the pose and also to improve the chances of successful self-recovery.

Usage

ARFoundation Version

（Go to MRTK version）

1. Import basic objects

Delete the default Main Camera from the scene，and import AR Session and XR Origin through "Toolbar - GameObject - XR". There will be a new Main Camera under XR Origin.

2. Import Camereon package

Import Camereon.ModelTracker.ARFoundation-vx.x.x.unitypackage, there are three folders（Plugins、Scripts、Shaders）and one prefab (CMRModelTracker) in this package.
NOTE: There is a known bug in unity that the Plugins folder does not contain the iOS framework library when exporting the unitypackage. Therefore, iOS users need to download the iOS framework zip from the Release page, unzip it and put the iOS folder into the Plugins folder. This bug is being fixed.

3. Import Camereon objects

Add the prefab CMRModelTracker to the scene, the prefab contains：

CMR Camera. This camera is not involved in the display and is only used by the algorithm for background processing of CAD models.
Canvas
- Edge Image. An image that is used to display edges, which is necessary for initialization process.
- Resume Button. A button that is used to resume the tracking when the tracking fails or the wrong target is tracked. This button is not required, you can design your own trigger logic using the scripting API below.
Target Anchor. An empty object to maintain the pose. When the target object's pose relative to the camera is successfully solved, the pose is further converted to the object's pose in the SLAM coordinate system (XR Origin) and updated to this Target Anchor object (which will be automatically moved under XR Origin as its son node). In this way, SLAM can continue to maintain the object's pose in case of tracking failures in static scenes. And in practice, the user only needs to create virtual content based on this object.

4. Import CAD model

Add the target's model to the scene, place it under CMR Camera as a son node, and adjust it's initial pose relative to the CMR Camera through the Transform component. This pose will be used in the initialization process, and the user should choose the appropriate pose based on the application scenario.
Add the last Layer and name it "CMR", and set the Layer of both the CMR Camera and the model to "CMR". This is to avoid interfering with the main scene when performing background processing on the model.

5. Object settings

The prefab CMRModelTracker should load the script CMRModelTrackerManager, which is under the Scripts folder. Connect the objects in the scene to the variables in the script as shown in the figure, where VLCar is the CAD model as an example.

The script CMRModelTrackerManager provides several setup options：

Auto Start. Start initializing and tracking automatically after program startup.
Display Edges. Display edges during the tracking (after initialization).
Use FPS60. Use 60fps. Higher frame rates will increase power consumption and not all devices support 60 fps.
Initialization Only. The tracking stops after successful initialization, and the subsequent poses are maintained by SLAM. Commonly used in static scenes.
Edge Magnitude Thresh. Threshold for detecting edge features in an image, reflecting the difference in grayscale between the two sides of the edge. Too large a threshold may lead to incomplete edge detection, and too small a threshold may introduce noise interference. It needs to be adjusted according to the application scenario, and the default value is 20.
Initialization Quality Thresh. Threshold for the initialization quality, reflecting how much edge features in the image are aligned with the target object. In some cases, the edge features in the image do not completely match the target object, for example, the object is partially occluded, the CAD model is inaccurate, etc. A threshold that is too large may cause initialization difficulties, and a threshold that is too small may result in tracking the wrong object. It needs to be adjusted according to the application scenario, and the default value is 0.65.
Control Points Max Number. The maximun number of control points. The algorithm will sample points, which are called control points, at a certain step on the edges. Increasing the number of control points can improve the robustness of the algorithm, but the computation effort will also increase. Reducing the number of control points will do the opposite. The default value is 2500.
Pose Smooth Factor. The factor to smooth the pose. To reduce jitter, the algorithm can smooth the output pose. A value of 0 means no smoothing is performed. The larger the value, the smoother the pose, but also the greater the delay. The default value is 0.5.

6. Project settings

Enable "Allow ‘unsafe’ Code"
Set "Scripting Backend" to "IL2CPP"
For Android, set "Graphics API" to "OpenGLES3", not "Vulkan"
Set "Target Architectures" to "ARM64"
For iOS, enable "Requires ARKit Support"
Confirm "Apple ARKit" or "Google ARCore" is checked in "XR Plug-in Management"

7. Program interaction

When the program starts running on the device, the camera is automatically turned on by ARFoundation and the video stream is displayed on the screen. When the Camereon tracker starts running (through API or "Auto Start" is checked in the script), the edge features of the target with the initial pose, which was set in "4. Import CAD model", will display on the screen. The user just need to move the device so that the edges are roughly aligned with the target object in the image, then the initialization will be completed and tracking begins. Successful initialization is manifested by the red edges disappearing or turning green, depending on whether "Display Edges" is checked in the script settings.