😎 About Me (李云鑫)

I am a Tenure-Track Associate Professor at Harbin Institute of Technology, Shenzhen. I earned PhD from the Harbin Institute of Technology, Shenzhen, advised by Prof. Baotian Hu, Prof. Yuxin Ding, and Prof. Min Zhang. I obtained a Master of Engineering degree from Harbin Institute of Technology, Shenzhen and a Bachelor of Science degree from Harbin Institute of Technology. Long-term cooperation with Dr. Lin Ma, Meituan; Prof. Wenhan Luo, HKUST; Dr. Longyue Wang, Alibaba International Group; Dr. Yuxiang Wu, Weco AI.

The long-term goal of my research is to help humans with more capable artificial intelligence. Dream of building an intelligent metaverse and interesting research directions, including:

  • Multimodal Reasoning and Planning
  • Omnimodal Large Model
  • Multimodal Agent
  • Embodied Intelligence

I am actively seeking cooperators (本科生、硕士生、博士生) who share my interest in developing large multimodal reasoning models to support scalable, agentic, and adaptive reasoning and planning in complex, real-world environments. 📧Email: liyx@hit.edu.cn

哈工深计算与智能研究院Lychee大模型团队长期招收大模型硕博研究生、本硕博实习生。团队可依靠哈工深、深圳河套人工智能学院、鹏程实验室、中科院信工所、苏州大学等高校和实验室联合培养,做有价值的科研,欢迎积极踊跃报名!

🔥 News

📕 Selected Publications

  • [Technical Report, ArXiv 2025] Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
    Yunxin Li, Xinyu Chen, Shenyuan Jiang, Haoyuan Shi, Zhenyu Liu, Xuanyu Zhang, Nanhao Deng, Zhenran Xu, Yicheng Ma, Meishan Zhang, Baotian Hu, Min Zhang
    [pdf] [web] [code] GitHub Stars

  • [Survey, ArXiv 2025] Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
    Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang
    [pdf] [web] [huggingface] GitHub Stars

  • [IEEE TPAMI 2025] Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
    Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang
    [pdf] [web] [code] GitHub Stars

  • [ACL 2025] VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
    Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang
    [pdf] [web] [code]

  • [SIGGRAPH Asia 2024] Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
    Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang
    [pdf] [code] GitHub Stars

  • [ACL 2024] Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
    Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang
    [pdf] [code]

  • [ICML 2024] VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
    Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang
    [pdf] [code]

  • [IEEE TMM 2024] LMEye: An Interactive Perception Network for Large Language Models
    Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang
    [pdf] [code]

  • [LREC-COLING 2024] A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
    Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang
    [pdf] [code]

  • [ArXiv 2023] Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
    Yunxin Li, Baotian Hu, Wei Wang, Xiaochun Cao, Min Zhang
    [pdf] [code]

  • [ACM MM 2023] Training Multimedia Event Extraction With Generated Images and Captions
    Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li
    [pdf] [code]

  • [ACL 2023] A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
    Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang
    [pdf] [code]

  • [ACL 2023] A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
    Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang
    [pdf] [code]

  • [ACM MM 2022] Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
    Qian Yang#, Yunxin Li#, Baotian Hu, Lin Ma, Yuxing Ding, Min Zhang
    [pdf] [code]

  • [SIGKDD 2022] Medical Dialogue Response Generation with Pivotal Information Recalling
    Yu Zhao#, Yunxin Li#, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang
    [pdf] [code]

  • [IEEE TMM 2022] Fast and Robust Online Handwritten Chinese Character Recognition with Deep Spatial & Contextual Information Fusion Network
    Yunxin Li, Qian Yang, Qingcai Chen, Baotian Hu, Xiaolong Wang, Yuxin Ding, Lin Ma
    [pdf]

🏅 Award

  • Provincial Outstanding Graduates, 2019
  • National Scholarship (2018, 2021, 2024)
  • Baidu Scholarship (Global Top 40), 2024
  • Huawei TopMinds (华为天才少年), 2025
  • JD TGT (京东顶尖技术人才计划), 2025
  • Tencent Qingyun (腾讯青云人才计划), 2025
  • Young Talent Support Project-Doctor (首届中国科协青托博士生计划), CAST, 2024
  • Outstanding Doctoral Dissertation of HIT (哈工大优博), 2025

💼 Research Experience

  • HKUST Research Assistant (2025.03 - 2025.08)
  • ByteDance Doubao (Seed) Team (2024.10 - 2025.02)
  • Tencent AILab (2024.04 - 2024.08)
  • Tencent PCG (2021.10 - 2022.06)

💁 Service

  • Conference Reviewer: ACL ARR (2023-), ICLR (2023-), NeurIPS (2024-), ICML (2024-), AAAI (2024-), ACM SIGGRAPH (2025-), CVPR (2025-), ACM MM (2023-), and IJCAI (2023-).
  • Journal Reviewer: ACM Computing Survey, IEEE TPAMI, IEEE TIP, IEEE TMM, IEEE TNNLS, IEEE TCSVT, IEEE TAI, and Neural Networks.