Pose Networks Unveiled: Bridging the Gap for Monocular Depth Perception

Y. Dayoub; Andrey V. Savchenko; I. Makarov

doi:10.1109/ISMAR-Adjunct64951.2024.00168

Publications

?

Pose Networks Unveiled: Bridging the Gap for Monocular Depth Perception

P. 584–587.

Dayoub Y., Andrey V. Savchenko, Makarov I.

Depth estimation is essential in Augmented Reality applications, enabling realistic object placement, scene understanding, spatial mapping, interaction, and environment awareness. This paper proposes a method to enhance depth model performance without increasing inference costs by improving the pose network in a selfsupervised learning setup. In particular, we enrich spatial information in the pose network by incorporating features from different scales and normalized coordinates. It is experimentally shown on the KITTI dataset that our approach achieves a 2-7% improvement in the abs rel metric when compared to baseline techniques.

Keywords: 3D vision Self-supervised learning Monocular Depth Estimation pose network ego-motion estimation

Publication based on the results of:

Network models, optimization and computational complexity (2024)

In book

2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

IEEE, 2024.

Monocular Depth Estimation Based on Active Learning

Saleh H., Goncharov D., Shadi S. et al., , in: Proceedings 2026 IEEE 11th International Conference on Smart Cloud SmartCloud 2026 8-10 May 2026.: Los Alamitos: IEEE Computer Society, 2026. P. 78–85.

Estimating depth is a necessary task to understand and navigate the environment surrounding us. Over the years, many active sensors have been developed to measure depth, but they are expensive and require additional space for mounting. A cheaper alternative is to estimate depth from a single RGB image taken by an ordinary monocular camera, which can ...

Added: May 12, 2026

Object Localization Based on a Single RGB Camera for a 4-DOF Robotic Arm

Chebotareva E., Mukhamedshin A., Imamov N. et al., , in: 2025 11th International Conference on Automation, Robotics, and Applications (ICARA), 12-14 Feb. 2025.: IEEE, 2025. Ch. 2025 P. 252–256.

Added: March 17, 2026

Inpainting Semantic and Depth Features to Improve Visual Place Recognition in the Wild

Semenkov I., Karpov A., Savchenko A. et al., IEEE Access 2024 Vol. 12 P. 5163–5176

Visual place recognition is one of the core modern computer vision tasks concerned with identifying location based on the image taken there. Modern state-of-the-art approaches heavily rely on RGB images which are largely affected by changes in the same scene such as varying daytime, illumination, seasonal changes, and presence of dynamic objects (people, vehicles). This ...

Added: March 15, 2024

Efficient Monocular Depth Estimation for Edge Computing Platforms

Saleh S., Saleh H., Dmitry Goncharov et al., , in: 2023 International Symposium ELMAR, 11-13 September 2023, Zadar, Croatia.: IEEE, 2023. P. 23–27.

Estimating depth is necessary to understand and navigate the environment surrounding us. Over the years, many active sensors have been developed to measure depth, but they are expensive and require additional space for mounting. A cheaper alternative is estimating depth from a single RGB image taken by an ordinary monocular camera, which can be placed ...

Added: January 26, 2024

SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault Diagnosis in Chemical Processes

Maksim Golyadkin, Vitaliy Pozdnyakov, Leonid Zhukov et al., Artificial Intelligence 2023 Vol. 324 Article 104012

Modern industrial facilities generate large volumes of raw sensor data during the production process. This data is used to monitor and control the processes and can be analyzed to detect and predict process abnormalities. Typically, the data has to be annotated by experts in order to be used in predictive modeling. However, manual annotation of ...

Added: September 20, 2023

Predicting Molecule Toxicity via Descriptor-based Graph Self-supervised Learning

Li X., Makarov I., Kiselev D., IEEE Access 2023 Vol. 11 P. 91842–91849

Predicting molecular properties with Graph Neural Networks (GNNs) has recently drawn a lot of attention, with compound toxicity prediction being one of the biggest challenges. In cases where there is insufficient labeled molecule data, an effective approach is to pre-train GNNs on large-scale unlabeled molecular data and then fine-tune them for downstream tasks. Among pre-training ...

Added: August 30, 2023

Differentiable Rendering with Reparameterized Volume Sampling

Morozov N., Rakitin D., Oleg Desheulin et al., , in: Neural Fields across Fields: Methods and Applications of Implicit Neural Representations. ICLR 2023 Workshop.: [б.и.], 2023. Ch. 8.

In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene pictures. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is ...

Added: July 18, 2023

Neural Fields across Fields: Methods and Applications of Implicit Neural Representations. ICLR 2023 Workshop

[б.и.], 2023.

Addressing problems in different science and engineering disciplines often requires solving optimization problems, including via machine learning from large training data. One class of methods has recently gained significant attention for problems in computer vision and visual computing: coordinate-based neural networks parameterizing a field, such as a neural network that maps a 3D spatial coordinate ...

Added: July 18, 2023

Exploration in Sequential Recommender Systems via Graph Representations

Kiselev D., Makarov I., IEEE Access 2022 Vol. 10 P. 123614–123621

Temporal graph networks are powerful tools for solving the cold-start problem in sequential recommender systems. However, graph models are susceptible to feedback loops and data distribution shifts. The paper proposes a simple yet efficient graph-based exploration method for the mitigation of the issues above. It adopts the counter-based state exploration from reinforcement learning to the ...

Added: September 5, 2022

Self-supervised recurrent depth estimation with attention mechanisms

Makarov I., Bakhanova M., Nikolenko S. et al., PeerJ Computer Science 2022 Vol. 8 Article e865

Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced ...

Added: February 1, 2022

On the Memorization Properties of Contrastive Learning

Sadrtdinov I., Chirkova N., Lobacheva E., , in: ICML 2021 Workshop, Overparameterization: Pitfalls & Opportunities.: [б.и.], 2021.

Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches. In this work, we investigate the memorization properties of SimCLR, a widely used contrastive self-supervised learning approach, and compare them to the memorization of supervised learning and random labels training. We ...

Added: January 25, 2022

JONNEE: Joint Network Nodes and Edges Embedding

Makarov I., Korovina K., Kiselev D., IEEE Access 2021 Vol. 9 P. 144646–144659

Recently, graph embedding models significantly improved the quality of graph machine learning tasks, such as node classification and link prediction. In this work, we propose a model called JONNEE (JOint Network Nodes and Edges Embedding), which learns node and edge embeddings under self-supervision via joint constraints in a given graph and its edge-to-vertex dual representation ...

Added: October 30, 2021

Influence of 3D Centro-Symmetry on a 2D Retinal Image

Sawada T., Symmetry 2020 Vol. 12 No. 11: 1863 P. 1–12

An object is 3D centro-symmetrical if the object can be segmented into two halves and the relationship between them can be represented by a combination of reflection about a plane and a rotation through 180° about an axis that is normal to the plane. A 2D orthographic image of the 3D centro-symmetrical object is always ...

Added: November 12, 2020

Fractal tomography and its application in 3D vision. N Trubochkina. 2018. J. Phys.: Conf. Ser. 955 012020

Trubochkina N. K., Journal of Physics: Conference Series 2018 Vol. 955 P. 1–6

Abstract. A three-dimensional artistic fractal tomography method that implements a non-glasses 3D visualization of fractal worlds in layered media is proposed. It is designed for the glasses-free 3D vision of digital art objects and films containing fractal content. Prospects for the development of this method in art galleries and the film industry are considered. ...

Added: January 29, 2018

МЕТОД ХУДОЖЕСТВЕННОЙ ФРАКТАЛЬНОЙ ТОМОГРАФИИ ДЛЯ 3D ВИДЕНИЯ

Trubochkina N. K., Кондратьев Н. В., Мир техники кино 2017 Т. - № 2 (11) С. 26–35

A method of three-dimensional artistic fractal tomography is proposed that implements a glasses-free 3D visualization of fractal worlds in layered media. Designed for the glasses free 3D vision of digital art objects and films, containing fractal content. Prospects for the development of this method in art galleries and the film industry are considered. ...

Added: December 11, 2017