Abstract

Action recognition infrastructure can be applied anywhere behavior analysis is required and represents presently a domain of maximum actuality in security and surveillance. The model based on 3D Convolutions is a middle ground between simple key-frame approaches based on 2D convolutions, and other more complex approaches based on Recurrent Neural Networks. Behavior analysis represents a domain greatly improved by action recognition. By placing human actions in different categories it is possible to extract statistics regarding a person’s behavior, characteristics, abilities and preferences which can be processed later by specialized personnel, depending on the selected domain. The proposed model follows simple 3D convolution architecture. Hidden layers are composed of a convolution operation, an activation function and, sometimes, a pooling layer. Leaky ReLU was used as activation function to alleviate the problem of vanishing gradients. Batch Normalization is a technique used for scaling and adjusting the output of an activation layer, and it has been used to reduce over-fitting and decrease the training time. The 3D Convolution structure has the advantage of learning spatio-temporal features, because the convolution is applied over a sequence of frames. In the present paper is presented a proposed 3D convolution model that has average results, with an accuracy of approximately 55% on the NTU RGB+D dataset.

This content is only available via PDF.
You do not currently have access to this content.