Exploring Spatial Constraints and Temporal Information Within the MONET Framework

Presented MONet, a compositional generative model for unsupervised scene decomposition and representation learning and successfully incorporated an effective image segmentation method which integrates local spatial constraints to the baseline MONet model. Based on a challenging Atari dataset, showed that our method significantly outperforms the conventional U-net and the MONet baseline model. For future work, extend this method to multi-class image segmentation and cluster the extracted objects based on their visual similarities.

Github