Recent Advances in Artificial Intelligence Research by Our Faculty

Publisher：王乐水Update：2023-04-25Views：11

Publishing time：2022-09-30

Recently, the Intelligent Perception and Processing Laboratory (http://ouc.ai) of the School of Information Science and Engineering at Ocean University of China has made a series of original achievements in artificial intelligence research. These achievements have been published in top-tier international journals such as IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and International Journal of Computer Vision (IJCV), as well as prestigious conferences including CVPR, ICCV, ECCV, and ACM MM. This marks a significant increase in the school's influence in the field of artificial intelligence research. TPAMI and IJCV are recognized as the top international journals in the fields of artificial intelligence, pattern recognition, image processing, and computer vision, with TPAMI's academic influence regularly ranking first among all related journals in computer science, electronic engineering, and artificial intelligence. Meanwhile, CVPR has ranked first in the "Engineering and Computer Science" category of the Google Scholar Metrics for several years, and fourth overall (the top three being Nature, The New England Journal of Medicine, and Science). All of the related work was independently completed by the school's faculty and students and was supported by grants from the National Natural Science Foundation.

The problem of image-to-image transformation has been widely studied in many fields of artificial intelligence, such as computer vision, computer graphics, and multimedia due to its rich application scenarios. It has become a highly influential and challenging research hotspot and difficulty. In order to address the high-quality synthesis problem of image-to-image transformation, the research team proposed an innovative block-based discriminative region candidate mechanism and constructed a generative adversarial network framework to improve the quality of synthesized images. This method can produce higher resolution, more realistic details, and fewer artifacts in high-quality images. The team achieved the best performance at the time in supervised and unsupervised general transformation tasks, and this research achievement was published in the top-level international journal of computer vision, IJCV (2020).

For image-to-image transformation tasks such as image inpainting and extrapolation, the research team was inspired by the human brain's completion process and proposed a spiral generative adversarial network model to improve the authenticity of repaired content. They then proposed a visual attention-guided adversarial learning model to deeply mine known area information and large-scale data knowledge while fully considering the learning and completion mechanisms of the human brain. Additionally, they combined two complementary network structures, self-attention transformation network and convolutional neural network hybrid autoencoder, to address unknown completion positions in real-world problems. These related studies achieved the best performance at the time and were accepted for publication in prestigious international conferences in computer vision and multimedia, including ECCV (2020), ICCV (2021), and ACM MM (2022).

In recent years, the problem of visually unrealistic synthesized images has been widely observed in many computer vision and graphics applications, such as image synthesis, image stitching, image editing, and scene completion. Visually realistic synthesized images are also an important requirement for human daily life and work, such as entertainment, advertising, and film production. As the human eye is very sensitive to subtle appearance differences between synthetic regions, high-quality synthesized images typically require experienced experts to spend a long time carefully adjusting them. Therefore, how to intelligently adjust the appearance of synthesized images to make them more realistic and harmonious has become an important and challenging research hotspot.

The research team analyzed the reasons why the hard processing of existing image processing or deep learning methods directly lead to insufficiently realistic harmonization effects from the essence of the problem. They then proposed a new idea and method for intrinsic image harmonization which maintains the semantic structural information of synthesized images while adjusting the image lighting information. By doing so, they generated holistically perceptually consistent synthesized images and achieved the best performance in image harmonization tasks at the time. This research achievement was published in the prestigious international conference on computer vision, CVPR (2021).

Recently, a research team deeply explored how to efficiently utilize the remote context modeling ability of self-attention transformation networks, or Transformers, to solve visual and graphics problems such as image harmonization in image-to-image conversion. Specifically, they first designed and constructed two visual Transformer frameworks, namely the "encoder-reconstruction Transformer" and the "decoupled Transformer". Secondly, they conducted research and analysis on the two frameworks in image harmonization tasks, and conducted in-depth explorations from aspects such as image encoding and reconstruction methods, token quantity and location, attention head number and layer number, and Transformer encoder and decoder, providing important references for the design and application of visual Transformers.

Thirdly, based on the two frameworks, they proposed non-decoupled and decoupled image harmonization methods, and comprehensively compared and analyzed their performance using ablation experiments. They proved that both methods significantly outperformed all existing methods. Furthermore, they verified the effectiveness, flexibility, and generality of the two visual Transformer frameworks in four classic visual and graphics tasks, including image enhancement, image completion, white balance editing, and portrait relighting. This research is an important exploration of cutting-edge artificial intelligence technology for a general solution to image-to-image conversion problems, and its achievements will promote the development of general artificial intelligence.

On September 15th, the top international journal on artificial intelligence, TPAMI, reported online on the above-mentioned achievements with the theme "Transformer for Image Harmonization and Beyond". The research was jointly completed by Guo Zonghui, a PhD student from the School of Information Science and Engineering (first author), Senior Experimentalist Gu Zhaorui, and Professors Zheng Bing, Dong Junyu, and Zheng Haiyong (corresponding author). It is the first time that the school has published an academic paper as the first or corresponding unit in TPAMI.