TY - JOUR
T1 - Prediction of Delay-Free Scene for Quadruped Robot Teleoperation
T2 - Integrating Delayed Data With User Commands
AU - Ha, Seunghyeon
AU - Kim, Seongyong
AU - Lim, Soo Chul
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2025
Y1 - 2025
N2 - Teleoperation systems are utilized in various controllable systems, including vehicles, manipulators, and quadruped robots. However, during teleoperation, communication delays can cause users to receive delayed feedback, which reduces controllability and increases the risk faced by the remote robot. To address this issue, we propose a delay-free video generation model based on user commands that allows users to receive real-time feedback despite communication delays. Our model predicts delay-free video by integrating delayed data (video, point cloud, and robot status) from the robot with the user's real-time commands. The LiDAR point cloud data, which is part of the delayed data, is used to predict the contents of areas outside the camera frame during robot rotation. We constructed our proposed model by modifying the transformer-based video prediction model VPTR-NAR to effectively integrate these data. For our experiments, we acquired a navigation dataset from a quadruped robot, and this dataset was used to train and test our proposed model. We evaluated the model's performance by comparing it with existing video prediction models and conducting an ablation study to verify the effectiveness of its utilization of command and point cloud data.
AB - Teleoperation systems are utilized in various controllable systems, including vehicles, manipulators, and quadruped robots. However, during teleoperation, communication delays can cause users to receive delayed feedback, which reduces controllability and increases the risk faced by the remote robot. To address this issue, we propose a delay-free video generation model based on user commands that allows users to receive real-time feedback despite communication delays. Our model predicts delay-free video by integrating delayed data (video, point cloud, and robot status) from the robot with the user's real-time commands. The LiDAR point cloud data, which is part of the delayed data, is used to predict the contents of areas outside the camera frame during robot rotation. We constructed our proposed model by modifying the transformer-based video prediction model VPTR-NAR to effectively integrate these data. For our experiments, we acquired a navigation dataset from a quadruped robot, and this dataset was used to train and test our proposed model. We evaluated the model's performance by comparing it with existing video prediction models and conducting an ablation study to verify the effectiveness of its utilization of command and point cloud data.
KW - Deep learning methods
KW - telerobotics and teleoperation
KW - visual learning
UR - http://www.scopus.com/inward/record.url?scp=85217013053&partnerID=8YFLogxK
U2 - 10.1109/LRA.2025.3536222
DO - 10.1109/LRA.2025.3536222
M3 - Article
AN - SCOPUS:85217013053
SN - 2377-3766
VL - 10
SP - 2846
EP - 2853
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 3
ER -