Crowd counting has attracted increasing attention due to its wide application prospect.
One of the most essential challenge in this domain is large scale variation,
which impacts the accuracy of density estimation.
To this end, we propose a scale-aware progressive optimization network (SPO-Net)
for crowd counting, which trains a scale adaptive network to achieve high-quality density map estimation
and overcome the variable scale dilemma in highly congested scenes.
Concretely, the first phase of SPO-Net, band-pass stage, mainly concentrates on preprocessesing the input image
and fusing both high-level semantic information and low-level spatial information from separated multi-layer features.
And the second phase of SPO-Net, rolling guidance stage,
aims to learn a scale-adapted network from multi-scale features as well as rolling training manner.
For better learning local correlation of multi-size regions and reducing redundant calculations,
we introduce different supervisions with analogy objective in each rolling,
refer to as progressive optimization strategy.
Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF)
not only demonstrate the efficacy of each part in SPO-Net, but also suggest the superiority
of our proposed method compared with the state-of-the-art approaches.
Downward-viewing fisheye cameras have attracted much attention in surveillance systems due to the wide coverage and less occlusion.
However, pedestrian detection in downward-viewing fisheye cameras remains an open problem
due to a lack of large-scale labeled dataset as existing datasets are
usually based on oblique-viewing perspective cameras.
Furthermore, it's time-consuming to label a downward-viewing fisheye dataset manually.
To address this, we propose a self-bootstrapping pedestrian detection method,
which automatically pseudo-labels downward-viewing fisheye images by making full use of spatial
and temporal consistency of pedestrians in the cameras to promote the accuracy of pedestrian detection.
We segment the downward-viewing fisheye images into two regions and propose the pseudo-labeling methods
for them progressively: a cyclic fine-tuned detector for the oblique region and a visual tracking method
for the vertical region. Combining the pseudo-labels from two regions,
we fine-tune the detection network for better accuracy.
Experimental results show that the proposed approach reduces time consumption by about 95% compared with labor-intensive manual labeling while it still reaches competitive and comparable Average Precision (AP).
Due to wide application prospects and various challenges such as large scale variation,
inter-occlusion between crowd people and background noise,
crowd counting is receiving increasing attention.
In this paper, we propose a scale-aware rolling fusion network (SRF-Net) for crowd counting,
which focuses on dealing with scale variation in highly congested noisy scenes.
SRF-Net is a two-stage architecture that consists of a band-pass stage and a rolling guidance stage.
Compared with the existing methods, SRF-Net achieves better results in retaining
appropriate multi-level features and capturing multi-scale features,
thus improving the quality of density estimation maps in crowded scenarios with large scale variation.
We evaluate our method on three popular crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF),
and extensive experiments show its outperform over the state-of-the-art approaches.