Contributors¶

Jun Yu

Jun Yu is an associate professor and Ph.D. supervisor in the Department of Automation at the University of Science and Technology of China, Ph.D., Huawei Most Valuable Instructor (MVI), and Huawei/MindSpore dual-certified developer evangelist. His research focuses on multimedia computing and intelligent robotics. He has led 40 research projects, including 5 National Natural Science Foundation of China projects, 1 National Aviation Science Fund project, 3 Chinese Association for Artificial Intelligence-Huawei Academic Fund projects, and 3 Huawei flagship research programs. He has led the development of a series of model suites that have entered Huawei computing product lines. He has published more than 200 academic papers and books, including more than 100 first-author or corresponding-author papers in IEEE/ACM top journals, CCF-A international conferences, and SCI Q1 journals.

As the first-ranked contributor, he has received the Wu Wenjun Science and Technology Award, the highest Chinese award in intelligent technology; 6 best paper awards at top international conferences (CVPR_PBVS/ICCV_MFR/ICME/FG); more than 100 championships in top international AI challenges (CVPR/ICCV/IJCAI/AAAI/MM/ECCV, etc.); the Anhui Provincial First Prize for AI Scientific and Technological Progress; excellent completion awards from the Chinese Association for Artificial Intelligence-Huawei MindSpore Academic Reward Fund (2 projects); NetEase's "Excellent Teacher" Award; the Huawei MindSpore "Excellent Advisor" Award; the Ministry of Education-Huawei "Intelligent Base" Outstanding Teacher Award; the Chinese Academy of Sciences Wang Kuancheng Talent Cultivation Award; and the University of Science and Technology of China Xingye Securities Education Award. As the second-ranked contributor, he has received the Anhui Provincial Natural Science Second Prize, the Anhui Provincial Teaching Achievement First Prize (5 projects), the Anhui Provincial Teaching Achievement Second Prize (1 project), and the Chinese Association of Higher Education "School-Enterprise Cooperation Double Hundred Plan" nomination award. He has been granted more than 20 patents as first inventor.

He has long served as an SPC member for multiple top international conferences (IJCAI/AAAI/CVPR/ICCV/ICML/NeurIPS/MM/ICLR, etc.) and is a member of the Huawei MindSpore Technical Committee. As the sole supervising instructor, he has guided students to win the World Robot Contest championship (2 times), the national first prize in the "Challenge Cup" science and technology competition special challenge (2 times), the Huawei Ascend AI Innovation Competition silver medal, and the Huawei MindSpore Outstanding Developer award (2 students).

He teaches undergraduate foundational courses including Data Structures and Algorithms, Introduction to Pattern Recognition, Introduction to Artificial Intelligence, and Digital Logic Circuits, as well as the graduate foundational course Computer Vision, with an average of 350 teaching hours per year. Four core AI courses have been selected for the Huawei Intelligent Base program. He has led 9 Ministry of Education-Huawei industry-university collaborative education and provincial quality engineering projects, including Computer Vision and Pattern Recognition. He has edited 8 textbooks, including Computer Vision and Pattern Recognition, Embedded Efficient Visual Perception: From Theory to Practice, and Multi-modal Human Modeling, Analysis and Synthesis, one of which received the Huawei ICT Excellent Textbook Award. He led the development of the Huawei MindSpore Face Suite (MindFace), now available at https://github.com/mindspore-lab/mindface, and was a core contributor to the Huawei MindSpore Optical Character Recognition Suite (MindOCR), now available at https://github.com/mindspore-lab/mindocr.

Changwen Chen

Changwen Chen is Chair Professor of Visual Computing and Interim Dean of the Faculty of Computer and Mathematical Sciences at The Hong Kong Polytechnic University. He previously served as Dean of Science and Engineering at The Chinese University of Hong Kong, Shenzhen, and Deputy Director at Peng Cheng Laboratory. Professor Chen has held editorial leadership roles as Editor-in-Chief of IEEE Transactions on Multimedia and IEEE Transactions on Circuits and Systems for Video Technology. His distinguished career has been recognized with the Alexander von Humboldt Award, the SUNY Chancellor's Award for Excellence in Scholarship, and the UIUC ECE Distinguished Alumni Award. He is an IEEE Fellow, SPIE Fellow, and member of Academia Europaea. With decades of experience in visual computing and multimedia systems, Professor Chen brings authoritative insight into the data engineering challenges addressed in this book.

Fan Yu

Fan Yu is currently a senior architect of AI computing framework MindSpore. He received the 2020 OSCAR Open Source Person award and was appointed as a visiting professor at Harbin Institute of Technology. He has led or contributed to the design and implementation of AI system core algorithms, cloud computing resource scheduling, and SDN large-scale routing architectures and algorithms. He has published more than 30 papers and patents. He graduated from the University of Science and Technology of China with a degree in computer science.

Cong Wang

Cong Wang received his M.S. degree from the School of Computer Science and Technology at the University of Science and Technology of China. His research interests include multimodal large model training and inference, AI data engineering, and the research and engineering implementation of Agentic AI technologies.

Yang Luo

Yang Luo received his M.S. degree from the College of Information Science and Electronic Engineering, Zhejiang University. His research interests include deep learning frameworks, AI data processing, data synthesis, large language model post-training, and data-centric AI agents. He has contributed to MindSpore framework optimization, LLM post-training data preparation, and the research and development of intelligent data agents, with expertise in large-scale data processing systems and the practical deployment of data agent systems.

Ran Zhang

Ran Zhang is currently pursuing the M.S. degree in Control Science and Engineering at the University of Science and Technology of China (USTC). His research interests include multimedia computing and deep learning.

Wenzhuo Du

Wenzhuo Du is currently pursuing the M.S. degree in Control Science and Engineering at the University of Science and Technology of China (USTC). Her research interests include multimedia computing and deep learning.

Xin Xu

Xin Xu is currently pursuing the M.S. degree in Electronic Information at the University of Science and Technology of China (USTC). His research interests include multimedia computing and deep learning.

Ke Wang

Ke Wang is currently pursuing the M.S. degree at the Institute of Advanced Technology, University of Science and Technology of China (USTC), Hefei, China. His research interests include AI infrastructure, multimodal large language models, and AI agents.

Zhili Wang

Zhili Wang is currently working toward the M.S. degree in Computer Technology at the University of Science and Technology of China (USTC). His research interests include multimodal large models and agents.

Zhongyi Liu

Zhongyi Liu received his Ph.D. from the University of Illinois Urbana-Champaign (UIUC). His research interests include large language models, agent post-training, reinforcement learning, tool use, multimodal data synthesis, and data-centric AI agents. He has contributed to pre-training data preparation, multimodal data synthesis, and the training of intelligent data-querying agents, with experience across the full pipeline from data construction and training optimization to the practical deployment of AI agent systems.

Xuhong Cao

Xuhong Cao graduated from Northwestern University. His work and research interests include high-performance distributed communication frameworks, distributed computing engines, AP database kernels, large language models, and data-centric AI agents. He has contributed to high-performance communication for microservices, the kernel development of storage-compute decoupled AP database systems, heterogeneous computing engines, and data agent systems, with experience across the full pipeline from low-level system architecture design, core engine development, and performance optimization to the practical deployment of LLM-powered data agents.

Guanlin Mu

Guanlin Mu is currently working toward the M.S. degree in Computer Technology at the University of Science and Technology of China (USTC). His research interests include multimodal generation and system optimization.

Guanjun Liu

Guanjun Liu is currently a graduate student in the Department of Automation at the University of Science and Technology of China (USTC). His research interests include optical character recognition (OCR), document intelligence, multimodal large language models, and table understanding. He is currently working on OCR modeling and document parsing tasks.

Yuefeng Zou

Yuefeng Zou is currently pursuing the M.S. degree in Computer Technology at the University of Science and Technology of China (USTC). His research interests include vision-language models and deep learning.

Lin Xu

Lin Xu is currently pursuing the M.S. degree in the Department of Automation at the University of Science and Technology of China (USTC), China. He is affiliated with the USTC-Unisound Joint Laboratory of Multimedia Intelligence. His research interests include medical artificial intelligence, multimodal large language models, medical vision-language learning, and radiology image understanding. His current research focuses on enhancing the visual perception and cross-modal reasoning capabilities of medical multimodal models for clinical applications.

Xinyu Chen

Xinyu Chen received the B.E. degree in 2023 from the University of Science and Technology of China, Hefei, China, where he is currently working toward the Ph.D. degree with the School of Information Science and Technology. His research interests include neural architecture search, multimodal large models, and reinforcement learning.

Fengxin Chen

Fengxin Chen is pursuing an Eng.D. degree in the Department of Automation at the University of Science and Technology of China. He received a B.S. degree from Northeast Forestry University and an M.Sc. degree from Hefei University of Technology. His research interests include speech large language models, data engineering, and image enhancement.

Xuan Li

Xuan Li received the B.E. degree from Zhengzhou University and is working toward a doctorate in engineering with the University of Science and Technology of China. His research interests mainly include perception and latent reasoning of multimodal models.

Gongpeng Zhao

Gongpeng Zhao, Alibaba Group, China.

Can Wang

Can Wang received his M.B.A. from Zhejiang University. He has worked at Alibaba Group for many years and has extensive experience in cross-border e-commerce. His current work and research interests focus on AI agents and their applications in enterprise and e-commerce scenarios.

Feng Zhao

Feng Zhao, Xi'an University of Posts and Telecommunications, Xi'an, China.

Ye Yu

Ye Yu, Hefei University of Technology, Hefei, China.

Fang Gao

Fang Gao, Guangxi University, Nanning, China.

Jiaen Liang

Jiaen Liang, Unisound AI Technology Co., Ltd., China.

Wei Huang

Wei Huang, Unisound AI Technology Co., Ltd., China.

Shengping Liu

Shengping Liu, Unisound AI Technology Co., Ltd., China.

Qingsong Liu

Qingsong Liu, Unisound AI Technology Co., Ltd., China.

Jianqing Sun

Jianqing Sun, Unisound AI Technology Co., Ltd., China.