Action Recognition Based On Conceptors Of Skeleton Joint Trajectories

Jiao Bao

Abstract


With the tremendous popularity of the Kinect, recognizing human actions or gestures from skeletal data becomes more feasible. Skeletal data is a more exact data than RGB video while it eliminates the occlusions that caused by the limbs of the actor. Previous neural network based approaches recognize actions by learning spatial-temporal features. However, nobody can explain what are those features represent. Different from them, we propose a novel action recognition framework based on conceptors of skeleton joint trajectories. Conceptor is a mechanism of neurodynamical organization. We compute conceptor for the trajectory of each dimension of the skeleton joint, and use the singular value vector to represent the trajectory. Then, we encode singular value vectors as binary vectors by using a clustering method. At last, we use softmax regression to recognize the trajectory codes. This is a novel framework which recognizes actions using the conceptual level information. Extensive experiments on benchmark datasets confirm the efficiency of this framework.

Full Text:

PDF

References


DawnD.D., ShaikhS.H. (2016). A comprehensive surveyof human actionrecognition withspatio-temporal interest point (STIP) detector, The Visual Computer,32(3), 1-18, DOI:0.1007/s00371-015-1066-2.

EllisC., MasoodS.Z., TappenM.F., Jr., J.J.L., SukthankarR. (2013). Exploring the trade-off between accuracyand observational latency in action recognition, International Journal of Computer Vision, 101(3), 420-436, DOI: 10.1007/s11263-012-0550-7.

Escalera S., Gonzlez J., Bar X., Reyes M., Lopes O., Guyon I., Athistos V., Escalante H.J. (2013). Multimodal gesture recognition challenge 2013: Dataset and results, ACM ChaLearn Multi-Modal Gesture Recognition Grand Challenge and Workshop, 445-452, DOI:10.1145/2522848.2532595.

Gowayyed M.A., Torki M., Hussein M.E., ElSaban M. (2013). Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition, IEEE International Joint Conference on Articial Intelligence, 1351-1357.

Han J., Shao L., Xu D., Shotton J. (2013). Enhanced computer vision with microsoft kinectsensor:A review, IEEETrans. On Cybernetics,43(5), 1318-1334, DOI:10.1109/TCYB.2013.2265378.

Jaeger H. (2014). Controlling recurrent neural networks by conceptors, Jacobs University Technical Report Nr 31.

Jiang X., Zhong F., Peng Q., Qin X. (2014). Online robust action recognition based on a hierarchical model, The Visual Computer, 30(9), 1021-1033, DOI:10.1007/s00371-014-0923-8 .

Lv F., Nevatia R. (2006). Recognition and segmentation of 3d human action using HMM and multi-class adaboost, European Conf. On Computer Vision, 3954, 359-372, DOI:10.1007/11744085_28.

Martens J., Sutskever I. (2011). Learning recurrent neural networks with hessian-free optimization, IEEE Conf. on Machine Learning, 1033-1040.

Mingxi Zhang (2016), Optimized moving body behavior recognition model based on multi-texture gradient feature, Revista Tecnica De La Facultad De Ingenieria Universidad Del Zulia, 39 (5), 299-305, DOI:10.21311/001.39.5.39.

Muller M., Roder T. (2006). Motion templates for automatic classification and retrieval of motion capture data, ACMSIGGRAPH/Eurographics Symposium on Computer animation, 137-146, DOI:10.2312/SCA/SCA06/137-146.

Murphy K.P.(2012). Machine learning: a probabilistic perspective, The MIT Press, 58(8), 27-71, DOI:10.1038/217994a0.

Ofli F., Chaudhry R., Kurillo G., Vidal R., Bajcsy R. (2014). Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition, Journal of Visual Communication and Image Representation, 25(1), 24-38, DOI:10.1109/CVPRW.2012.6239231.

Ohn-Bar E., Trivedi M.M. (2013). Joint angles similiarities and HOG2 for action recognition, In Proc. Workshops of IEEE Conf. on Computer Vision and Pattern Recognition, 465-470, DOI:10.1109/CVPRW.2013.76.

Palangi H., Deng L., Ward R.K. (2013). Learning input and recurrent weight matrices in echo state networks, IEEE Conf. on Neural Information Processing Systems.

Pei L.S., Ye M., Xu P., Li T. (2014). Fast multi-class action recognition by querying inverted index tables, Multi-media Tools and Applications, 74(23), 1081-10822. DOI: 10.1007/s11042-014-2207-8.

Pei L.S., Ye M., Xu P., Zhao X.Z., Guo G. (2014). One example based action detection in hough space, Multimedia Tools and Applications, 72(2), 1751-1772, DOI:10.1007/s11042-013-1478-9.

Pei L.S., Ye M., Zhao X.Z., Dou Y.M., Bao J. (2014). Action recognition by learning temporal slowness invariant features, The Visual Computer, 1-10, DOI: 10.1007/s00371-015-1090-2.

Pei L.S., Ye M., Zhao X.Z., Xiang T., Li T. (2014). Learning spatio-temporal features for action recognition from the side of the video, Signal, Image and Video Processing, 1-8, DOI: 10.1007/s11760-014-0726-4

Qiao R., Liu L., Shen C., Hengel A.V.D. (2015). Learning discriminative trajectory let detector sets for accurate skeleton-based action recognition, IEEE Conf. on Computer Vision and Pattern Recognition, 1-10.

Toshev A., Szegedy C. (2014). Deeppose: Human pose estimation via deep neural networks, IEEE Conf. on Computer Vision and Pattern Recognition, 1653-1660, DOI:10.1109/CVPR.2014.214.

Wang C., Wang Y., Yuille A.L. (2016). Mining 3D key-pose-motifs for action recognition, IEEE Conf. on Computer Vision and Pattern Recognition, 2639-2647.

Wang J., Liu Z., Wu Y., Yuan J. (2012). Mining actionlet ensemble for action recognition with depth cameras, IEEE Conf. on Computer Vision and Pattern Recognition, 36(50), 1290-1297, DOI:10.1109/CVPR.2012.6247813.

Wu D., Shao L. (2014). Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition, IEEE Conf. on Computer Vision and Pattern Recognition, 724-731, DOI:10.1109/CVPR.2014.98.

Wu J., Cheng J., Zhao C., Lu H. (2013). Fusing multi-modal features for gesture recognition, In proceeding of the 15th ACM on International Conference on Multi-modal Interaction, 453-460, DOI:10.1145/2522848.2532589.

Xia L., Chen C.C., Aggarwal J. (2012). View invariant human action recognition using histograms of 3D joints, In Proc.Workshops of IEEE Conf. on Computer Vision and Pattern Recognition, 20-27, DOI:10.1109/CVPRW.2012.6239233.

Yang X., Tian Y. (2012). Eigenjoints-based action recognition using naive-bayes-nearest-neighbor, IEEE Conference on Computer Vision and Pattern Recognition Workshops, 38(3C), 14-19, DOI:10.1109/CVPRW.2012.6239232.

Yao A., Gall J., Gool L.V. (2012). Coupled action recognition and pose estimation from multiple views, IEEE International Journal of Computer Vision, 100(1), 16-37, DOI:10.1007/s11263-012-0532-9.

Zhao X., Li X., Pang C., Zhu X., Sheng Q.Z. (2013). Online human gesture recognition from motion data streams, ACM international conference on Multimedia, 23-32, DOI:10.1145/2502081.2502103.


Refbacks

  • There are currently no refbacks.


Revista de la Facultad de Ingeniería,

ISSN: 2443-4477; ISSN-L:0798-4065

Edif. del Decanato de la Facultad de Ingeniería,

3º piso, Ciudad Universitaria,

Apartado 50.361, Caracas 1050-A,

Venezuela.

© Universidad Central de Venezuela