Device-free surveillance using non-imaging sensors in a sparsely populated outdoor environment

Choudhary, P.

DSpace Home
→
Ph.D Theses
→
Year- 2023
→
View Item

dc.contributor.author	Choudhary, P.
dc.date.accessioned	2023-06-20T10:40:16Z
dc.date.available	2023-06-20T10:40:16Z
dc.date.issued	2023-06-20
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/4373
dc.description.abstract	Surveillance is the process of continuously monitoring an area, a person, or a group of people for ongoing activity in order to gather meaningful information. Video-based surveillance (such as CCTV cameras) is the most prevalent form of surveillance. CCTV cameras are severely afected by occlusion and illumination. Moreover, video-based surveillance applications require abundant storage, computing, and processing power. The proliferation and usage of microelectromechanical sensors have made computing operations ubiquitous, mobile, and resilient. Consequently, sensors are an indispensable component of security, surveillance, and reconnaissance applications. Typically, surveillance is divided into two categories, viz., (i) active and (ii) passive (device-free). In the active category of surveillance, a device is attached to a target’s body to collect data. But intruders are not supposed to wear or carry devices on their bodies to aid the surveillance system. Thus, they are not suitable for security applications. Device-free sensing techniques, on the other hand, are better suited for security applications. Device-free sensing techniques infer the changes caused by a target in the surrounding environment using diferent intrinsic traits. A target’s intrinsic traits are classifed as either static or dynamic. Static traits are always present in a target, regardless of its activities. For example, weight, shape, scent, refectivity, and attenuation. When a target engages in an activity, dynamic traits are generated. For example, footstep vibration and sound are generated when a person walks or speaks. Numerous device-free sensing techniques are available, but we only use seismic and audio sensing techniques in our work. Audio and seismic sensors are non-intrusive, inexpensive, and easy to install. Moreover, they are immune to temperature, wind, and lighting changes. Applications based on audio or seismic modality require less storage, computing, and processing power. This thesis focuses on localization and activity recognition using audio and seismic sensors in an outdoor environment for a single human target. First, we localize a human target using only seismic sensors. In this work, the sensors used are identical. Moreover, seismic sensors are omnidirectional, and their sensing range is assumed to be circular. The intersection point of three circles can be treated as an estimated target location. Required circle parameters are computed using either a regression or an energy attenuation model. But when we deployed this approach in real-life settings, we found that circles may or may not intersect. That is why a heuristic is proposed. But solutions ofered by the heuristic show high localization error. Therefore, we replace the proposed heuristic with audio direction information. In this approach, target distance is computed using regression, and audio direction information is fused to localize a target. Audio directions may be missing in our experiment, so we propose a mathematical model for estimating missing audio directions. Missing audio direction estimations are based on the assumption that at least one audio sensor has captured the target direction. The missing direction estimator uses an estimated target distance. But distance estimation may be erroneous, so angle estimates also turn out to be erroneous. As an alternative, we perform an early fusion of audio and seismic modalities for location estimation. This approach employs multiple audio-seismic features and multi-output regression to localize a target. Extensive experiments show promising localization results with an error of 0.735 in an area of 324 meter2. After localization, we focus on non-overlapping human activity recognition for a single human target using seismic, audio, and audio-seismic modalities. We frst employ seismic sensors to recognize six human activities: running, jogging, walking, jumping jacks, and inactivity. The proposed approach uses an autoencoder network to learn a deep representation of 16 diferent time and frequency domain features with reduced dimensionality. An artifcial neural network classifer is applied to deep representation for activity recognition. Extensive experiments demonstrated the precision and recall values of 0.72 and 0.68, respectively. We found that the recognizing accuracy is afected by background noise and inter-activity misclassifcation. To reduce misclassifcation, we now employ the audio modality to recognize an extended set of human activities. The proposed approach uses a 2D convolution neural network and reduces inter-activity misclassifcation. However, the efect of background noise is still prevalent. Typically, multi-modal fusion shows better performance than a single modality. With this motivation, we fuse data from audio-seismic modalities using a 1D convolution neural network. We found that a multi-modal human activity recognition framework reduces inter-activity misclassifcations and reduces the efect of background noise. We achieved an F1-score of 92.00% on a nine-class classifcation problem. Thus, we conclude that audio and seismic sensors can be used for both localization and activity recognition with high accuracy	en_US
dc.language.iso	en_US	en_US
dc.subject	Device-free surveillance	en_US
dc.subject	Seismic sensor	en_US
dc.subject	Audio sensor	en_US
dc.subject	Fusion	en_US
dc.subject	Target localization	en_US
dc.subject	Activity recognition	en_US
dc.title	Device-free surveillance using non-imaging sensors in a sparsely populated outdoor environment	en_US
dc.type	Thesis	en_US