Intelligent and Multimedia Science Laboratory

Adversarial Attacks and Defenses
Crowd Counting

Adversarial Attacks and Defenses

	AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models Lifeng Huang, Tian Su, Chengying Gao, Ning Liu, Qiong Huang^* Intro: Adversarial attacks poses a significant threat to the security of AI-based systems. To counteract these attacks, adversarial training (AT) and ensemble learning (EL) have emerged as widely adopted methods for enhancing model robustness. However, a counter-intuitive phenomenon arises where the simple combination of these approaches may potentially compromising adversarial robustness of ensemble models. In this paper, we propose a novel method called Alignment and Unlearning for Training Ensembles (AUTE), aiming to effectively integrate AT and EL to maximize their benefits. Specifically, AUTE incorporates two key components. Firstly, AUTE divides the ensemble into a big peer model and a single member in a loop manner, aligning their outputs for boosting robustness of each member.Secondly, AUTE introduces the concept of unlearning, actively forgetting specific data with over-confident properties to preserve model capacity to learn more robust features. Extensive experiments across various datasets and networks illustrate that AUTE achieves superior performance compared to baselines. For instance, a 5-member AUTE with ResNet-20 networks outperforms state-of-the-art method by 2.1\% and 3.2\% in classifying clean and adversarial data. Additionally, AUTE can easily extend to non-adversarial training paradigm, surpassing current standard ensemble learning methods by a large margin. The source code is publicly available at https://github.com/mesunhlf/AUTE. AAAI, 2025 (中科院 1区/CCF-A) [Paper] [Code]
	LAFED: Towards Robust Ensemble Models Via Latent Feature Diversification Wenzi Zhuang, Lifeng Huang, Chengying Gao, Ning Liu^* Intro: In this work, we revisit the model diversity from the perspective of data and discover that high similarity between training batches decreases feature diversity and weakens ensemble robustness. To this end, we propose LAFED, which reconstructs training sets with diverse features during the optimization, enhancing the overall robustness of an ensemble. For each sub-model, LAFED treats the vulnerability extracted from other sub-models as raw data, which is then combined with round-changed weights with a stochastic manner in the latent space. This results in the formation of new features, remarkably reducing the similarity of learned representations between the sub-models. Furthermore, LAFED enhances feature diversity within the ensemble model by utilizing hierarchical smoothed labels. Pattern Recognition (PR), 2023 (中科院 1区) [Paper] [Code]
	FASTEN: Fast Ensemble Learning For Improved Adversarial Robustness Lifeng Huang, Qiong Huang, Peichao Qiu, Shuxin Wei, Chengying Gao^* Intro: Recent works show that adversarial attacks threaten the security of deep neural networks (DNNs). To tackle this issue, ensemble learning methods have been proposed to train multiple sub-models and improve adversarial resistance without compromising accuracy. However, these methods often come with high computational costs, including multi-step optimization to generate high-quality augmentation data and additional network passes to optimize complicated regularization. In this paper, we present the FAST ENsemble learning method (FASTEN) to significantly reduce training costs in terms of data and optimization. Firstly, FASTEN employs a single-step technique to initialize poor augmentation data and recycles optimization knowledge to enhance data quality, which considerably reduces the data generation budget. Secondly, FASTEN introduces a low-cost regularizer to increase intra-model similarity and inter-model diversity, with most of the regularization components computed without network passes, further decreasing training costs. IEEE Transactions on Information Forensics and Security (TIFS), 2023 (中科院 1区/CCF-A) [Paper] [Code]
	Erosion Attack: Harnessing Corruption To Improve Adversarial Examples Lifeng Huang, Chengying Gao^, Ning Liu Intro:* Although adversarial examples pose a serious threat to deep neural networks, most transferable adversarial attacks are ineffective against black-box defense models. This may lead to the mistaken belief that adversarial examples are not truly threatening. In this paper, we propose a novel transferable attack that can defeat a wide range of black-box defenses and highlight their security limitations. We identify two intrinsic reasons why current attacks may fail, namely data-dependency and network-overfitting. They provide a different perspective on improving the transferability of attacks. To mitigate the data-dependency effect, we propose the Data Erosion method. It involves finding special augmentation data that behave similarly in both vanilla models and defenses, to help attackers fool robustified models with higher chances. In addition, we introduce the Network Erosion method to overcome the network-overfitting dilemma. The idea is conceptually simple: it extends a single surrogate model to an ensemble structure with high diversity, resulting in more transferable adversarial examples. Two proposed methods can be integrated to further enhance the transferability, referred to as Erosion Attack (EA). We evaluate the proposed EA under different defenses that empirical results demonstrate the superiority of EA over existing transferable attacks and reveal the underlying threat to current robust models. IEEE Transactions on Image Processing (TIP), 2023 (中科院 1区/CCF-A) [Paper] [Code]
	DEFEAT: Decoupled Feature Attack Across Deep Neural Networks Lifeng Huang, Chengying Gao, Ning Liu Intro: Adversarial attacks pose a security challenge for deep neural networks, motivating researchers to build various defense methods. Consequently, the performance of black-box attacks turns down under defense scenarios. A significant observation is that some feature-level attacks achieve an excellent success rate to fool undefended models, while their transferability is severely degraded when encountering defenses, which give a false sense of security. In this paper, we explain one possible reason caused this phenomenon is the domain-overfitting effect, which degrades the capabilities of feature perturbed images and makes them hardly fool adversarially trained defenses. To this end, we study a novel feature-level method, referred to as Decoupled Feature Attack (DEFEAT). Unlike the current attacks that use a round-robin procedure to estimate gradient estimation and update perturbation, DEFEAT decouples adversarial example generation from the optimization process. In the first stage, DEFEAT learns a distribution full of perturbations with high adversarial effects. And it then iteratively samples the noises from learned distribution to assemble adversarial examples. On top of that, we can apply transformations of existing methods into the DEFEAT framework to produce more robust perturbations. We also provide insights into the relationship between transferability and latent features that helps the community to understand the intrinsic mechanism of adversarial attacks. Neural Networks, 2022 (中科院 1区/CCF-B) [Paper] [Code]
	Cyclical Adversarial Attack Pierces Black-box Deep Neural Networks Lifeng Huang, Shuxin Wei, Chengying Gao, Ning Liu Intro: In this paper, we propose Cyclical Adversarial Attack (CA2), a general and straightforward method to boost the transferability to break defenders. We first revisit the momentum-based methods from the perspective of optimization and find that they usually suffer from the transferability saturation dilemma. To address this, CA2 performs cyclical optimization algorithm to produce adversarial examples. Unlike the standard momentum policy that accumulates the velocity to continuously update the solution, we divide the generation process into multiple phases and treat the velocity vectors from the previous phase as proper knowledge to guide a new adversarial attack with larger steps. Moreover, CA2 applies a novel and compatible augmentation algorithm at every optimization in a loop manner for enhancing the black-box transferability further, referred to as cyclical augmentation. Pattern Recognition (PR), 2022 (中科院 1区/CCF-B) [Paper] [Code]
	Enhancing Adversarial Examples Via Self-Augmentation Lifeng Huang, Wenzi Zhuang, Chengying Gao, Ning Liu Intro: Recently, adversarial attacks pose a challenge for the security of Deep Neural Networks, which motivates researchers to establish various defense methods. However, do current defenses achieve real security enough? To answer the question, we propose self-augmentation method (SA) for circumventing defenders to transferable adversarial examples. Concretely, self-augmentation includes two strategies: (1) self-ensemble, which applies additional convolution layers to an existing model to build diverse virtual models that be fused for achieving an ensemble-model effect and preventing overfitting; and (2) deviation-augmentation, which based on the observation of defense models that the input data is surrounded by highly curved loss surfaces, thus inspiring us to apply deviation vectors to input data for escaping from their vicinity space. Notably, we can naturally combine self-augmentation with existing methods to establish more transferable adversarial attacks. Extensive experiments conducted on four vanilla models and ten defenses suggest the superiority of our method compared with the state-of-the-art transferable attacks. International Conference on Multimedia & Expo (ICME, 2021) *(oral) (CCF-B)** [Paper] [Code]
	Universal Physical Camouflage Attacks on Object Detectors Lifeng Huang, Chengying Gao, Yuyin Zhou, Changqing Zou, Cihang Xie, Alan Yuille, Ning Liu Intro: In this paper, we study physical adversarial attacks on object detectors in the wild. Previous works on this matter mostly craft instance-dependent perturbations only for rigid and planar objects. To this end, we propose to learn an adversarial pattern to effectively attack all instances belonging to the same object category (e.g., person, car), referred to as Universal Physical Camouflage Attack (UPC). Concretely, UPC crafts camouflage by jointly fooling the region proposal network, as well as misleading the classifier and the regressor to output errors. In order to make UPC effective for articulated non-rigid or non-planar objects, we introduce a set of transformations for the generated camouflage patterns to mimic their deformable properties. We additionally impose optimization constraint to make generated patterns look natural to human observers. To fairly evaluate the effectiveness of different physical-world attacks on object detectors, we present the first standardized virtual database, AttackScenes, which simulates the real 3D world in a controllable and reproducible environment. Extensive experiments suggest the superiority of our proposed UPC compared with existing physical adversarial attackers not only in virtual environments (AttackScenes), but also in real-world physical environments. Computer Vision and Pattern Recognition (CVPR, 2020) (CCF-A) [Paper] [Project Page]
	G-UAP: Generic Universal Adversarial Perturbation that Fools RPN-based Detectors Xing wu, Lifeng Huang, Chengying Gao* Intro: Our paper proposed the G-UAP which is the first work to craft universal adversarial perturbations to fool the RPN-based detectors. G-UAP focuses on misleading the foreground prediction of RPN to background to make detectors detect nothing. Asian Conference on Machine Learning (ACML, 2019) (CCF-C) [Paper]

Back to top

Crowd Counting

	Scale-aware Progressive Optimization Network Ying Chen, Lifeng Huang, Chengying Gao, Ning Liu Intro: Crowd counting has attracted increasing attention due to its wide application prospect. One of the most essential challenge in this domain is large scale variation, which impacts the accuracy of density estimation. To this end, we propose a scale-aware progressive optimization network (SPO-Net) for crowd counting, which trains a scale adaptive network to achieve high-quality density map estimation and overcome the variable scale dilemma in highly congested scenes. Concretely, the first phase of SPO-Net, band-pass stage, mainly concentrates on preprocessesing the input image and fusing both high-level semantic information and low-level spatial information from separated multi-layer features. And the second phase of SPO-Net, rolling guidance stage, aims to learn a scale-adapted network from multi-scale features as well as rolling training manner. For better learning local correlation of multi-size regions and reducing redundant calculations, we introduce different supervisions with analogy objective in each rolling, refer to as progressive optimization strategy. Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF) not only demonstrate the efficacy of each part in SPO-Net, but also suggest the superiority of our proposed method compared with the state-of-the-art approaches. ACM MultiMedia (ACM MM, 2020) (CCF-A) [Paper]
	Self-Bootstrapping Pedestrian Detection in Downward-Viewing Fisheye Cameras Using Pseudo-Labeling Kaishi Gao, Qun Niu, Haoquan You, Chengying Gao Intro: Downward-viewing fisheye cameras have attracted much attention in surveillance systems due to the wide coverage and less occlusion. However, pedestrian detection in downward-viewing fisheye cameras remains an open problem due to a lack of large-scale labeled dataset as existing datasets are usually based on oblique-viewing perspective cameras. Furthermore, it's time-consuming to label a downward-viewing fisheye dataset manually. To address this, we propose a self-bootstrapping pedestrian detection method, which automatically pseudo-labels downward-viewing fisheye images by making full use of spatial and temporal consistency of pedestrians in the cameras to promote the accuracy of pedestrian detection. We segment the downward-viewing fisheye images into two regions and propose the pseudo-labeling methods for them progressively: a cyclic fine-tuned detector for the oblique region and a visual tracking method for the vertical region. Combining the pseudo-labels from two regions, we fine-tune the detection network for better accuracy. Experimental results show that the proposed approach reduces time consumption by about 95% compared with labor-intensive manual labeling while it still reaches competitive and comparable Average Precision (AP). International Conference on Multimedia & Expo (ICME, 2020) (CCF-B) [Paper]
	Scale-Aware Rolling Fusion Network for Crowd Counting Ying Chen, Chengying Gao, Zhuo Su, Xiangjian He, Ning Liu Intro: Due to wide application prospects and various challenges such as large scale variation, inter-occlusion between crowd people and background noise, crowd counting is receiving increasing attention. In this paper, we propose a scale-aware rolling fusion network (SRF-Net) for crowd counting, which focuses on dealing with scale variation in highly congested noisy scenes. SRF-Net is a two-stage architecture that consists of a band-pass stage and a rolling guidance stage. Compared with the existing methods, SRF-Net achieves better results in retaining appropriate multi-level features and capturing multi-scale features, thus improving the quality of density estimation maps in crowded scenarios with large scale variation. We evaluate our method on three popular crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), and extensive experiments show its outperform over the state-of-the-art approaches. International Conference on Multimedia & Expo (ICME, 2020) (CCF-B) [Paper]
	ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu Computer Vision and Pattern Recognition (CVPR, 2019) (CCF-A) [Paper]
	Weak-structure-aware visual object tracking with bottom-up and top-down context exploration Liu Ning, Liu Chang, Wu Hefeng, and Zhu Hengzheng Signal Processing: Image Communication (SPIC, 2018) (CCF-C)* [Paper]

Back to top