Paper submission instructions

  Important Dates

  Review procedure

  Workshop Organizers

  Workshop Chairs

  Technical Program Committee

  Kenote Talk

IEEE Workshop on Machine Learning and Artificial Intelligence for Multimedia Creation

July 23-27, 2018 @ San Diego, CA, USA

In conjunction with the 2018 IEEE International Conference on Multimedia and Expo (ICME)

All accepted papers which are registered by the author deadline and presented at the conference will be included in IEEE Xplore.



Time: 8:30 - 12:30
Date: Friday, July 27, 2018
Room: Milos

Time Title Presenter/Author
8:30 – 8:40 Opening remarks Dr. Sijia Liu
8:40 – 9:20 Keynote Talk: A Multi-task Learning framework for Head Pose Estimation and Actor-Action Semantic Video Segmentation Prof. Yan Yan
9:21 – 9:38 Paper #46 Video Super Resolution Based on Deep Convolution Neural Network with Two-stage Motion Compensation Haoyu Ren, Mostafa El Khamy, Jungwon Lee
9:39 – 9:56 Paper #55 A Fast No-reference Screen Content Image Quality Prediction using Convolutional Neural Networks Zhengxue Cheng, Masaru Takeuchi, Kenji Kanai, Jiro Katto
9:57 – 10:14 Paper #57 An Enhanced Deep Convolutional Neural Network for Person Re-identification Tiansheng Guo, Dongfei Wang, Zhuqing Jiang, Aidong Men, Yun Zhou
10:15 – 10:32 Paper #71 Single Image Haze Removal via Joint Estimation of Detail and Transmission Shengdong Zhang, Yao Jian, Wenqi Ren
Coffee Break (10:33 – 10:45)
10:46 – 11:03 Paper #82 Deep Global and Local Saliency Learning with New Re-ranking for Person Re-Identification Wei Fei, Zhicheng Zhao, Fei Su
11:04 – 11:21 Paper #95 Hierarchical Learning of Sparse Image Representations using Steered Mixture of Experts Rolf Jongebloed, Ruben Verhack, Lieven Lange, Thomas Sikora
11:22 – 11:39 Paper #123 HDR Image Reconstruction Using Locally Weighted Linear Regression Xiaofen Li,Yongqing Huo
11:40 – 11:57 Paper #124 Supporting Collaboration Among Cyber Security Analysts Through Visualizing their Analytical Reasoning Processes Lindsey Thomas, Adam Vaughan, Zachary Courtney, Chen Zhong, Awny Alnusair
11:58 – 12:15 Paper #146 Robust Weighted Regression for Ultrasound Image Super-Resolution Walid Sharabati, Bowei Xi
12:16 – 12:33 Paper #150 A Two Layer Pairwise Framework to Approximate Super pixel-based Higher order Conditional Random filed for Semantic Segmentation Li Sulimowicz, Ishfaq Ahmad, Alexander Aved


This workshop focuses on the emerging field of multimedia creation using machine learning (ML) and artificial intelligence (AI) approaches. It aims to bring together researchers from ML and AI and practitioners from multimedia industry to foster multimedia creation. Multimedia creation, including style transfer and image synthesis, have been a major focus of machine learning and AI societies, owing to the recent technological breakthroughs such as generative adversarial networks (GANs). This workshop seeks to reinforce the implications to multimedia creation. It publishes papers on all emerging areas of content understanding and multimedia creation, all traditional areas of computer vision and data mining, and selected areas of artificial intelligence, with a particular emphasis on machine learning for pattern recognition. The applied fields such as art content creation, medical image and signal analysis, massive video/image sequence analysis, facial emotion analysis, control system for automation, content-based retrieval of video and image, and object recognition are also covered. The workshop is expected to provide an interactive platform to researchers, scientists, professors, and students to exchange their innovative ideas and experiences in the areas of Multimedia, and to specialize in the field of multimedia from underlying cutting-edge technologies to applications.

We intend to have a half-day workshop with four to five regular talks.



Potential topics of interest include ML and AI on Multimedia in areas of but not limited to:


Paper submission instructions


Important Dates


Review procedure

All submitted paper will be reviewed by 3 program committee members.


Workshop Organizers


Workshop Chairs


Technical Program Committee

Keynote Talk

A Multi-task Learning framework for Head Pose Estimation and Actor-Action Semantic Video Segmentation

Prof. Yan Yan, Assistant Professor at Texas State University

Abstract: Multi-task learning, as one important branch of machine learning, has developed very fast during the past decade. Multi-task learning methods aim to simultaneously learn classification or regression models for a set of related tasks. This typically leads to better models as compared to a learner that does not account for task relationships. In this talk, we will investigate a multi-task learning framework for head pose estimation and actor-action segmentation. (1) Head pose estimation from low-resolution surveillance data has gained in importance. However, monocular and multi-view head pose estimation approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. We propose FEGA-MTL, a novel framework based on multi-task learning for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. (2) Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, we propose a novel, robust multi-task ranking model for weakly-supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively.

Yan Yan is currently an Assistant Professor at Texas State University. He was a research fellow at the University of Michigan and at the University of Trento. He received his Ph.D in computer science from the University of Trento Italy, and the M.S. degree from Georgia Institute of Technology. He was a visiting scholar with Carnegie Mellon University in 2013 and a visiting research fellow with the Advanced Digital Sciences Center (ADSC), UIUC, Singapore in 2015. His research interests include computer vision, machine learning, and multimedia. He received the Best Student Paper Award in ICPR 2014 and Best Paper Award in ACM Multimedia 2015. He has published papers in CVPR / ICCV / ECCV / TPAMI / AAAI / IJCAI / ACM Multimedia. He has been PC members for several major conferences and reviewers for referred journals in computer vision and multimedia. He served as a guest editor in TPAMI, CVIU and TOMM. He is a member of the IEEE and the ACM.