Object Localization with Deep Learning Techniques

Siyang Li

Ming Hsieh Department of Electrical Engineering at University of Southern California

Object localization aims at finding the locations of existing objects in an image or a video. It serves as a crucial step when machines need to understand an image (or a video) deeper. For example, to understand the relationship between objects, their locations are helpful. With deep learning techniques, the performance of object localizers has been largely improved, given sufficient labeled data. However, labeling a large amount of data is expensive. In the first part of this talk, I will explain our proposed method for the object localization problem when training images are only weakly labeled. The proposed multiple instance curriculum learning (MICL) method injects curriculum learning (CL) into the multiple instance learning (MIL) framework. It starts by automatically picking the easy training examples, where the extent of the segmentation mask agrees with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. In this iterative training process, the detector demonstrates an increasing capability of accurately localizing objects. In the second part, I will focus on moving object localization in videos, where finer locations, i.e., object masks, are required. This problem is referred to as “video object segmentation” and faces the same challenge of expensive training data. To save the annotation of training videos, we transfer the knowledge encapsulated in image-based instance embedding networks. Then, a motion-based bilateral network is trained to estimate the background region. The estimated background is later integrated with the instance embeddings into a graph, so that the embeddings corresponding to the moving objects can be extracted by classifying the graph nodes. Finally, the video frames are segmented based on the graph node labels. The proposed method achieves the state-of-the-art performance on several benchmark datasets.

Share this Post