\r\nscenario, specially in single-feed videos recorded in tight courts,

\r\nwhere cluttering and occlusions cannot be avoided. This paper

\r\npresents an analysis of several geometric and semantic visual features

\r\nto detect and track basketball players. An ablation study is carried

\r\nout and then used to remark that a robust tracker can be built with

\r\nDeep Learning features, without the need of extracting contextual

\r\nones, such as proximity or color similarity, nor applying camera

\r\nstabilization techniques. The presented tracker consists of: (1) a

\r\ndetection step, which uses a pretrained deep learning model to

\r\nestimate the players pose, followed by (2) a tracking step, which

\r\nleverages pose and semantic information from the output of a

\r\nconvolutional layer in a VGG network. Its performance is analyzed

\r\nin terms of MOTA over a basketball dataset with more than 10k

\r\ninstances.","references":"[1] K. Bernardin and R. Stiefelhagen, \u201cEvaluating multiple object tracking\r\nperformance: the clear mot metrics,\u201d Journal on Image and Video\r\nProcessing, vol. 2008, pp. 1, 2008.\r\n[2] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, \u201cRealtime multi-person 2d\r\npose estimation using part affinity fields,\u201d in IEEE Conf. on Computer\r\nVision and Pattern Recognition, 2017, pp. 1302\u20131310.\r\n[3] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, \u201cOpenPose GitHub\r\nRepository\u201d in https:\/\/github.com\/CMU-Perceptual-Computing-Lab\/\r\nopenpose, last accessed May 18th 2019.\r\n[4] J. Deng,and W. Dong, and R. Socher, and L. Li, and K. Li, and\r\nL. Fei-Fei, \u201cImageNet: A Large-Scale Hierarchical Image Database,\u201d\r\nCVPR09, 2009\r\n[5] A. Doering, U. Iqbal, and J. Gall, \u201cJoint flow: Temporal flow fields for\r\nmulti person tracking,\u201d arXiv preprint arXiv:1805.04596, 2018.\r\n[6] R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran,\r\n\u201cDetect-and-track: Efficient pose estimation in videos,\u201d in IEEE Conf.\r\non Computer Vision and Pattern Recognition, 2018, pp. 350\u2013359.\r\n[7] R. Grompone Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall,\r\n\u201cLsd: A fast line segment detector with a false detection control,\u201d IEEE\r\ntransactions on pattern analysis and machine intelligence, vol. 32, no.\r\n4, pp. 722\u2013732, 2010. [8] R. Grompone Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall,\r\n\u201cLsd: a line segment detector,\u201d Image Processing On Line, vol. 2, pp.\r\n35\u201355, 2012.\r\n[9] R. Henschel, L. Leal-Taix\u00b4e, D. Cremers, and B. Rosenhahn, \u201cFusion\r\nof head and full-body detectors for multi-object tracking,\u201d in IEEE\r\nConf. on Computer Vision and Pattern Recognition Workshops, 2018,\r\npp. 1509\u2013150909.\r\n[10] E. Insafutdinov, M. Andriluka, L. Pishchulin, S. Tang, E. Levinkov,\r\nB. Andres, and B. Schiele, \u201cArttrack: Articulated multi-person tracking\r\nin the wild,\u201d in IEEE Conf. on Computer Vision and Pattern Recognition,\r\n2017, vol. 4327.\r\n[11] U. Iqbal, A. Milan, and J. Gall, \u201cPosetrack: Joint multi-person pose\r\nestimation and tracking,\u201d in IEEE Conf. on Computer Vision and Pattern\r\nRecognition, 2017, pp. 2011\u20132020.\r\n[12] A. Milan, L. Leal-Taix\u00b4e, K. Schindler, and I. Reid, \u201cJoint tracking and\r\nsegmentation of multiple targets,\u201d in IEEE Conf. on Computer Vision\r\nand Pattern Recognition, 2015, pp. 5397\u20135406.\r\n[13] Y. Qi, and S. Zhang, and L. Qin, and H. Yao, and Q. Huang, and J. Lim,\r\nand M.-H. Yang, \u201cHedged deep tracking,\u201d Proceedings of the IEEE\r\nconference on computer vision and pattern recognition, 2016\r\n[14] V. Ramakrishna, D. Munoz, M. Hebert, J. Andrew Bagnell, and\r\nY. Sheikh, \u201cPose Machines: Articulated Pose Estimation via Inference\r\nMachines,\u201d in IEEE European Conf. Computer Vision, 2014, pp. 33\u201347.\r\n[15] V. Ramanathan, J. Huang, S. Abu-El-Haija, A. Gorban, K. Murphy, and\r\nL. Fei-Fei, \u201cDetecting events and key actors in multi-person videos,\u201d\r\nin IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp.\r\n3043\u20133053.\r\n[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \u201cYou only\r\nlook once: Unified, real-time object detection,\u201d in Proceedings of the\r\nIEEE conference on computer vision and pattern recognition, 2016, pp.\r\n779\u2013788.\r\n[17] J. S\u00b4anchez, \u201cComparison of motion smoothing strategies for video\r\nstabilization using parametric models,\u201d Image Processing On Line, vol.\r\n7, pp. 309\u2013346, 2017.\r\n[18] A. Senocak, T.-H. Oh, J. Kim, and I. S. Kweon, \u201cPart-based player\r\nidentification using deep convolutional representation and multi-scale\r\npooling,\u201d in IEEE Conf. on Computer Vision and Pattern Recognition\r\nWorkshops, 2018, pp. 1732\u20131739.\r\n[19] K. Simonyan, and A Zisserman, \u201cVery deep convolutional networks for\r\nlarge-scale image recognition,\u201d arXiv preprint arXiv:1409.1556, 2014\r\n[20] G. Thomas, R. Gade, T. B. Moeslund, P. Carr, and A. Hilton, \u201cComputer\r\nvision for sports: current applications and research topics,\u201d Computer\r\nVision and Image Understanding, vol. 159, pp. 3\u201318, 2017.\r\n[21] Q. Wang, and J. Gao, and J. Xing, and M. Zhang, and W. Hu, \u201cDcfnet:\r\nDiscriminant correlation filters network for visual tracking,\u201d arXiv\r\npreprint arXiv:1704.04057, 2017\r\n[22] X. Wang, and A. Jabri, and A. Efros, \u201cLearning Correspondence from\r\nthe Cycle-Consistency of Time,\u201d arXiv preprint arXiv:1903.07593, 2019\r\n[23] N. Wang, and Y. Song, and C. Ma, and W. Zhou, and W. Liu, and H. Li,\r\n\u201cUnsupervised Deep Tracking,\u201d arXiv preprint arXiv:1904.01828, 2019\r\n[24] S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, \u201cConvolutional\r\nPose Machines,\u201d in IEEE Conf. on Computer Vision and Pattern\r\nRecognition, 2016, pp. 4724\u20134732.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 151, 2019"}