PointINS: Point-based instance segmentation

Image credit: Unsplash


In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features. Differentiating multiple potential instances within a single PoI feature is challenging, because learning a high-dimensional mask feature for each instance using vanilla convolution demands a heavy computing burden. To address this challenge, we propose an instance-aware convolution. It decomposes this mask representation learning task into two tractable modules as instance-aware weights and instance-agnostic features. The former is to parametrize convolution for producing mask features corresponding to different instances, improving mask learning efficiency by avoiding employing several independent convolutions. Meanwhile, the latter serves as mask templates in a single point. Together, instance-aware mask features are computed by convolving the template with dynamic weights, used for the mask prediction. Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach, building upon dense one-stage detectors. Through extensive experiments, we evaluated the effectiveness of our framework built upon RetinaNet and FCOS. PointINS in ResNet101 backbone achieves a 38.3 mask mean average precision (mAP) on COCO dataset, outperforming existing point-based methods by a large margin. It gives a comparable performance to the region-based Mask R-CNN with faster inference.

In IEEE Transation on Pattern Analysis and Machine Intelligence
Ying-Cong Chen
Ying-Cong Chen
Assistant Professor

Ying-Cong Chen is an Assistant Professor at AI Thrust, Information Hub of Hong Kong University of Science and Technology (Guangzhou Campus). He obtained his Ph.D. degree from the Chinese University of Hong Kong. His research lies in the broad area of computer vision and machine learning, aiming for empowering machine with the capacity to understand human appearance, physiology and psychology. His works contribute to a wide range of applications, including contactless health monitoring, semantic photo synthesis, and intelligent video surveillance.