Skip to content

zhaoxuhui/Genshin-Impact-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

The Genshin Impact Dataset (GID) for SLAM

The Genshin impact dataset (GID) is collected in the Genshin Impact game[1] for visual SLAM. It currently consists of 60 individual sequences (over 3 hours in total) and covers a wide range of scenes that are rare, hard, or dangerous for field collection in real world (such as dull deserts, dim caves, and lush jungles). It provides great opportunities for SLAM evaluation and benchmark. Moreover, it includes a large number of visual challenges (such as low illumination and low texture scenes) to test the robustness of various SLAM algorithms. It is part of our work How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception.

Citation

If you use some resource from this dataset, please cite the paper as:

BibTeX

@article{Zhao2024CEMS,
  title={How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception},
  author={Xuhui Zhao and Zhi Gao and Hao Li and Hong Ji and Hong Yang and Chenyang Li and Hao Fang and Ben M. Chen},
  journal={Journal of Intelligent \& Robotic Systems},
  year={2024},
  volume={110},
  number=42,
  pages={1--19},
  doi={https://doi.org/10.1007/s10846-024-02077-4}
}

APA

Zhao, X., Gao, Z., Li, H., Ji, H., Yang, H., Li, C., Fang, H., & M. Chen, B. (2024). How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception. Journal of Intelligent & Robotic Systems.

1. Dataset Organization

The dataset is generally composed of two parts: sequences (blue part) and support files (orange part), as the following figure shows.

In the sequences part, each sequence contains several files for the convenience of usage. We take the Seq-001 as an example and elaborate next.

  • Seq-001.mp4 The recorded video from the Genshin Impact game which can be further processed according to different needs. It has a 1436 (width) × 996 (height) resolution at 30 FPS.

  • Seq-001.png The content preview of the recorded video for a fast grasp without playing it. It summarizes the resolution (width × height), duration (sec), FPS, and frames in total.

  • Frames-Sparse It is a folder storing split frames from the recorded video. For the convenience of end users, we split the whole video in advance with a frame interval of 10 (extract 1 frame every 10 frames).

  • Groundtruth-EuRoC.txt For the convenience of users, we provide groundtruth poses of split frames in both EuRoC and TUM format. This file records poses in EuRoC[2] format:

    timestamp[ns], pos_x[m], pos_y[m], pos_z[m], quat_w, quat_x, quat_y, quat_z

  • Groundtruth-TUM.txt This file records poses in TUM[3] format:

    timestamp[s] pos_x[m] pos_y[m] pos_z[m] quat_x quat_y quat_z quat_w

  • Timestamps.txt This file stores the corresponding timestamps of split frames in the Frames-Sprase folder. The time unit is nanosecond (10-9 second).

In the support files part, it contains camera intrinsics and tool scripts.

  • Intrinsics.yaml This file records the focal length (fx and fy) and principle point (cx and cy) for the pinhole camera model we use. It is organized in standard yaml format, which is easy for data input and output.

  • tool-splitVideo.py This Python script is used for splitting the original video into separate frames according to user settings. The only launch parameter for this script is the path of the video you want to process. As for the other parameters, users can set them in an interactive manner. All interactive parameters are summarized below:

    • Clipping start time: start timestamp of clipping, unit: second, default: 0s
    • Clipping end time: end timestamp of clipping, unit: second, default: the end of the whole video
    • Sampling interval N: sample one frame every N frame, default: output every frame
    • Scale for output frame: scale factor for output frame images, default: 1 for the original size
    • Type for output frame: file type for output frame images, default: .jpg
    • Name format for frame: name format for output frame images, select from Timestamp format (12 digits to represent timestamp in nanoseconds) and Frame index format (4 digits to represent frame index in the original video). Default: Timestamp format.
  • tool-resizeFrames.py This Python script is used for the resizing of existing frame images. It requires three launch parameters:

    • Search folder: the folder path of frames need to be processed
    • Image type: the type of images in the folder
    • Scale: the scale for resizing

2. Dataset Coverage

2.1 Collection Distribution

We collect sequences at different places in the Genshin Impact game to cover a wide range of scenes as much as possible. Generally, each country in the game (Mondstadt, Liyue, Inazuma, and Sumeru) has 15 sequences to reflect its unique features. More specifically, the sequences are distributed as follows:

  • Sequence 1-15 are collected in Mondstadt
  • Sequence 16-30 are collected in Liyue
  • Sequence 31-45 are collected in Inazuma
  • Sequence 46-60 are collected in Sumeru

The following figure shows the distribution of sequences in different regions. You may click the figure and zoom in to see the details since the world map is very large.

2.2 Sequence Diversity

Benifiting from the large and diverse game world, the sequences in GID also have a great diversity, which we summarize in the following aspects.

Scene The dataset involves a wide range of scenes, including deserts, caves, jungles, and so on. The following figure shows some type of scenes. For example, the user can test the robustness toward low light conditions of their SLAM in dim cave scenes.

Time The sequences in GID generally cover a whole day, from morning to afternoon and night. This potentially enables experiments for SLAM in changing illumination conditions. The following figure shows the coverage of a whole day.

Weather The dataset includes various weather conditions, such as clear, cloudy, and raining scenes. The following figure shows some examples of different weather conditions.

Visual Challenges for SLAM The dataset contains various visual challenges for SLAM algorithms, such as low-light, low-texture. Sequences of these challenges may boost the development and benchmark of visual SLAM in challenging environments. The following figure shows some representitative challenges in the dataset.

Duration The sequences cover a wide range of durations, from 59 seconds (Seq-042) to 333 seconds (Seq-049 & Seq-058), which provides the possibility to test the scability of SLAM. The following figure shows the distribution of sequences in different durations.

3. Downloads

We upload all 60 sequences and provide two ways to download the dataset: Google Drive and Baidu Netdisk. You can click Google Drive or Baidu Netdisk for downloading the whole dataset (about 22 GB totally) according to your network environment. Or you can download individual sequences by clicking corresponding links in the following table.

Seq. No Region Duration (sec) Preview Google Drive Baidu Netdisk
Seq-001 Mondstadt 102 Link Link
Seq-002 Mondstadt 280 Link Link
Seq-003 Mondstadt 170 Link Link
Seq-004 Mondstadt 120 Link Link
Seq-005 Mondstadt 177 Link Link
Seq-006 Mondstadt 142 Link Link
Seq-007 Mondstadt 140 Link Link
Seq-008 Mondstadt 130 Link Link
Seq-009 Mondstadt 129 Link Link
Seq-010 Mondstadt 182 Link Link
Seq-011 Mondstadt 209 Link Link
Seq-012 Mondstadt 231 Link Link
Seq-013 Mondstadt 123 Link Link
Seq-014 Mondstadt 150 Link Link
Seq-015 Mondstadt 293 Link Link
Seq-016 Liyue 294 Link Link
Seq-017 Liyue 191 Link Link
Seq-018 Liyue 288 Link Link
Seq-019 Liyue 175 Link Link
Seq-020 Liyue 177 Link Link
Seq-021 Liyue 322 Link Link
Seq-022 Liyue 238 Link Link
Seq-023 Liyue 158 Link Link
Seq-024 Liyue 163 Link Link
Seq-025 Liyue 241 Link Link
Seq-026 Liyue 326 Link Link
Seq-027 Liyue 257 Link Link
Seq-028 Liyue 104 Link Link
Seq-029 Liyue 286 Link Link
Seq-030 Liyue 269 Link Link
Seq-031 Inazuma 172 Link Link
Seq-032 Inazuma 110 Link Link
Seq-033 Inazuma 249 Link Link
Seq-034 Inazuma 77 Link Link
Seq-035 Inazuma 268 Link Link
Seq-036 Inazuma 235 Link Link
Seq-037 Inazuma 152 Link Link
Seq-038 Inazuma 252 Link Link
Seq-039 Inazuma 231 Link Link
Seq-040 Inazuma 98 Link Link
Seq-041 Inazuma 129 Link Link
Seq-042 Inazuma 59 Link Link
Seq-043 Inazuma 133 Link Link
Seq-044 Inazuma 155 Link Link
Seq-045 Inazuma 64 Link Link
Seq-046 Sumeru 72 Link Link
Seq-047 Sumeru 191 Link Link
Seq-048 Sumeru 208 Link Link
Seq-049 Sumeru 333 Link Link
Seq-050 Sumeru 219 Link Link
Seq-051 Sumeru 146 Link Link
Seq-052 Sumeru 237 Link Link
Seq-053 Sumeru 147 Link Link
Seq-054 Sumeru 213 Link Link
Seq-055 Sumeru 79 Link Link
Seq-056 Sumeru 186 Link Link
Seq-057 Sumeru 150 Link Link
Seq-058 Sumeru 333 Link Link
Seq-059 Sumeru 200 Link Link
Seq-060 Sumeru 190 Link Link

4. Technical Details

4.1 Data Collection & Pre-processing

All the sequences are collected with fixed and consistent camera settings. The computer used for data collection is equipped with an Intel Core i9-9900K CPU, 64GB RAM, and an NVIDIA Titan RTX GPU. We first record videos from the Genshin Impact game, where the videos are saved in .mkv format. The original resolution of the recorded video is 1920 (width) × 1200 (height) @ 30FPS, as the following figure shows.

Then, we write Python scripts to split the recorded videos into frames and save them in .jpg format, where we sample 1 frame every 10 frames. Moreover, we simultaneously crop the frame images to 1436 × 996 to remove unrelated parts in the original videos. The following figure shows the cropped and outputted frames of the Seq-046 sequence.

4.2 Groundtruth Estimation & Reconstruction

To obtain the precise poses of the camera, we use the ColMap software[4] for groundtruth estimation and 3D reconstruction. We input all the frames in a sequence to ColMap and obtain the camera poses and 3D points. We use "automatic reconstruction" mode with the following parameters:

  • Data type: Video frames
  • Quality: Medium
  • Shared intrinsics: Yes
  • Sparse model: Yes
  • Dense model: Yes

For other parameters, we let ColMap manage them as default. The following figure shows the estimated camera poses and point cloud of the Seq-046 sequence in ColMap.

We can also visualize reconstructed 3D meshes with MeshLab[5] software, as the following figure shows.

4.3 Post-processing

After reconstruction, we export the estimated poses and trajectory in ColMap to a images.txt file, which contains the estimated camera poses. We then write Python scripts to convert the images.txt file to the aforementioned standard TUM and EuRoC formats. Moreover, we export the estimated camera intrinsics in ColMap to a cameras.txt file.

5. SLAM Demos with GID

The following figure briefly demonstrates the performance of ORB-SLAM2[6] (monocular) on our dataset, which is a classic and sophisticated visual SLAM. For the best understanding, you may click here to download and view the whole video of testing (50s).

Generally, the ORB-SLAM2 performs well in various scenes, even in some challenging scenes, demonstrating the feasibility of our dataset for running SLAM algorithms. For example, we compare the estimated trajectory for Seq-060 and the groundtruth poses with EVO tool[7], as the following figure shows.

After scale and trajectory alignment, it can be seen that the estimated poses are generally consistent with groundtruth. On the one side, this demonstrates the feasibility of our dataset; on the other side, this shows the high accuracy of groundtruth estimated by ColMap.

6. FAQs

Q1: What are the features and advantages of the proposed dataset?

Answer:

  • Compared with field-collected sequences, our dataset contains more diverse scenes for SLAM to test. Moreover, many scenes in the dataset may be difficult or dangerous to collect in real world, such as the desert, the caves, and snow mountains.

  • Compared with sequences collected in simulation environments, the proposed dataset has the following advantages.

    • The scenes are exquisite and beautiful in the Genshin Impact game. Generally, few simulation platforms (such as Gazebo[8], XTDrone[9]) provides such simulated quality. Some sophisticated simulation platforms (such as AirSim[10], Nvidia Omniverse[11]) may provide high quality, but they are usually difficult to get involved and design your own world.
    • It is time-consuming and laborious to build a high-quality scene in simulation software from scratch, especially for large scenes. However, we can directly use the built scenes and collect sequences in the game, which is more efficient.
    • Existing simulation platforms are difficult to simulate photorealistic visual challenges we wanted for SLAM tests. For example, XTDrone typically cannot simulate different weather conditions. However, we can easily recored sequences containing photorealistic weather changes in the game, such as sunny, rainy, snowy, and foggy.

Q2: How the groundtruth poses are estimated? What about the accuracy? How you guarantee its reliability?

Answer:

  • As we mentioned before, we use the ColMap software for the groundtruth pose estimation, which is a popular and sophisticated software for 3D reconstruction. We use the "automatic reconstruction" mode with medium quality to obtain the groundtruth poses. The estimated poses are generally accurate.

  • Since we do not have the real poses of camera, we evaluate the accuracy of estimated groundtruth with reprojection error, which is automatically calculated in ColMap software. The reprojection error indicates the average distance between the reprojected 3D points and the corresponding 2D points in the image. The following figure shows the corresponding reprojection error of each sequence in the dataset. Generally, the overall of all sequences is 0.88 (less than 1 pixel), which is very small.

  • We cannot obtain the real groundtruth poses, so we focus more on the consistency of estimated trajectory and reconstructed 3D points. We think that if the consistency is high, then the estimated trajectory is accurate. Of course, this is not absolute, and the estimated groundtruth may also have errors. We will continue to explore and adopt more accurate methods to estimate the groundtruth.

  • Moreover, it should be noticed that the scale of the estimated trajectory is not absolute due to the scale ambiguity, and the groundtruth trajectory does not have absolute scale information. Therefore, remember to perform scale alignment before evaluating estimated trajectories from your SLAM. The scale of different sequences is not comparable.

Q3: How can I use the dataset to evaluate my SLAM?

  • Step1: Download the sequences and useful tools you need in the dataset with provided links.
  • Step2: (optional) Resample the downloaded video with provided Python script according to your needs.
  • Step3: Run your interested visual odometry or SLAM algorithm and save the estimated trajectory to a file.
  • Step4: Evaluate the performance of your algorithm with the provided groundtruth poses with various tools, such as the EVO.

7. References

About

Genshin Impact Dataset (GID) for SLAM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published