👋 About me
 I am a third-year undergraduate student in the School of Software Engineering at South China University of Technology. My passion lies in developing intelligent systems that can perceive, understand, and interact with our complex, multi-modal world. Currently, my research interests mainly focus on:
Multi-Modal Large Models: Exploring the capabilities and applications of models that integrate various data types. My interests involve enhancing model efficiency through token pruning, developing unified frameworks for video and image generation, and creating architectures that unify multi-modal understanding and generation.
Diffusion Language Models: Researching novel architectures and applications of diffusion-based models for language. Furthermore, I envision a new paradigm — diffusion is all you need.
Embodied AI: Focusing on foundational models for robotics that leverage force-feedback data to enhance physical interaction.
AI for Industry: Investigating the practical application of AI technologies to transform various sectors, including Education, Recruitment, Healthcare, and ERP.
As an undergraduate researcher in the early stages of my academic journey, I maintain an open mindset. I am enthusiastic about exploring emerging technologies and innovative applications. Feel free to contact me if you are interested in discussing or exploring collaborations.
Research
|  | ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments Accepted by ACM MM 2025(CCF-A); co-first author (i) We construct ExpInstruct, the first dataset tailored for experiment commentary generation, featuring over 7K step-level commentaries across 21 scientific subjects. Each sample includes procedural descriptions along with potential scientific principles and safety guidelines. (ii) We propose ExpStar, an automatic experiment commentary generation model that leverages a retrieval-augmented mechanism to adaptively access, evaluate, and utilize external knowledge. (iii) Extensive experiments demonstrate that ExpStar achieves state-of-the-art performance, and outperforms 16 leading LMMs. (paper page coming soon) [Paper] | 
|  | Vision-and-Language Navigation in Continuous Environments,VLNCE 连续环境下的视觉语言导航 旨在让机器人根据自然语言指令在真实/仿真环境中自主导航。我们旨在微调大型语言模型(LLM),通过图结构与文本对齐的方式,帮助智能体选择合理导航点,完成自主导航任务。我的工作:图数据集构造,基于Scaling VLN提出的全景图数据集,结合BEVBert和ETPNav提出的图数据集构造方法,完成大规模的图数据构造,共352890条,用于LLM微调。 [Code] | 
|  | 基于ChatGLM3-6B大模型的代码生成研究 收集和预处理代码数据集,通过LLaMa Factory微调模型,采用chatchat搭建知识库,将微调模型和知识库结合,得到代码生成模型。使用humaneval和人工案例分析进行模型评价。 [Slides] [Code] | 
Projects
|  | 基于Multi-Agent和RAG的个性化知识学习框架 一种基于多智能体和检索增强生成的个性化学习框架。采用多智能体协同工作模式,包括知识提取智能体、规划教学智能体、知识巩固智能体和测试评估智能体四个智能体,以及检索增强生成模块。 帮助用户个性化学习,实现因材施教。 [Demo] [Slides] | 
|  | 面向电网技术文档的智能信息提取与分析研究 建立一套适用于电网设备技术文件的智能算法和多模态模型,形成结构化通用化技术模板。通过智能识别和语义匹配技术,实施开工审查和出厂审查技术文件比对自动化。 [Slides] | 
|  | 优才知路产教融合大模型 AI求职规划师 | 赋能求职路:为个体精准导航职业机会。根据求职者具体情况,查找对应岗位,并提供切实可操的“求职陪跑”规划方案。 [Slides] | 
|  | BookDone书掂 如何搞定一本书 —— 通过知识可视化技术,将复杂文本转化为动态结构,为学习者个性化构建一个高效、直观的阅读学练生态。 [Demo] [Slides] | 
|  | 并行计算案例交流平台系统 并行计算案例交流平台系统,前台:可发布、修改、删除需求。用户能针对需求,发布、修改、删除回复。可对需求和回复进行条件筛选。后台:实现注册和登录模块;可以展示、修改用户信息;可以修改密码、头像。 [Demo] [Code] [Slides] | 
🏆 Award
- 2024: National Scholarship of SCUT
- 2024: Hongping Evergreen Scholarship of SCUT
- 2023: Macau Alumni Association Scholarship of SCUT
🎯 Misc
- 👋 BIG FAN of AIGC!
- 🏀 Addicted to playing BASKETBALL!(Superfan of Kyrie & Harden)
- 🎹 Enjoy playing the electronic keyboard、📚 Reading(G.E.B.)、🎬 Watching anime(Arcane/Attack on Titan/Cyberpunk: Edgerunners)
