基于ArXiv和GitHub数据的多模态大模型技术发展研究
DOI:
CSTR:
作者:
作者单位:

1.中国科学技术信息研究所 北京;2.中国人民大学;3.中国铁道科学研究院集团有限公司电子计算技术研究所

作者简介:

通讯作者:

中图分类号:

基金项目:

国家科技创新2030—“新一代人工智能”重大项目“人工智能治理与安全能力评估研究”(2022ZD0116200)


Research on the Development of Multimodal Large Model Technology Based on ArXivand GitHub Data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着人工智能技术的突破性进展,多模态大模型已成为推动人工智能向跨模态、场景化理解与生成发展的关键方向。为系统梳理该领域的发展脉络、揭示技术热点与演进趋势,本文基于ArXiv学术论文及GitHub开源项目数据,通过数据爬取与多方法分析,对多模态大模型的研究现状与未来走向进行研究。通过对ArXiv平台上多模态相关论文及其核心数据(如发表时间、关键词等),以及从GitHub收集的多模态相关开源项目数据进行爬取与格式化,并采用时间趋势分析、时间序列预测以及主题模型挖掘等方法,系统梳理了多模态研究的热点、技术走向及应用方向。本研究共分析ArXiv相关论文2,065篇及GitHub项目数百个,分析结果揭示了多模态模型的发展路径和前景,并基于统计模型预测了未来的技术趋势。研究结果表明,多模态大模型呈现了融合性与多元性的发展特征,“模态融合”“语义生成”“多模态表示”等成为当前研究的核心方向;其应用领域从自然语言处理和计算机视觉逐渐扩展至教育、医疗以及其他更多场景。

    Abstract:

    With the breakthrough advancements in artificial intelligence technology, multimodal large models have become a key direction driving AI towards cross-modal, scenario-based understanding and generation. To systematically review the development trajectory of this field and reveal its technological hotspots and evolutionary trends, this paper conducts research on the current status and future directions of multimodal large models based on data from ArXiv academic papers and GitHub open-source projects, utilizing data crawling and multiple analytical methods. By crawling and formatting multimodal-related papers from the ArXiv platform along with their core data (such as publication dates, keywords, etc.), as well as collecting multimodal-related open-source project data from GitHub, and employing methods such as temporal trend analysis, time series forecasting, and topic modeling, this study systematically outlines the hotspots, technological trajectories, and application directions of multimodal research. A total of 2,065 relevant papers from ArXiv and hundreds of GitHub projects were analyzed. The results reveal the developmental pathways and prospects of multimodal models, and future technological trends are forecasted based on statistical models. The findings indicate that multimodal large models exhibit characteristics of integration and diversity, with "modality fusion," "semantic generation," and "multimodal representation" emerging as core research directions. Their application domains are gradually expanding from natural language processing and computer vision to education, healthcare, and various other scenarios.

    参考文献
    相似文献
    引证文献
引用本文

赵丹,张晓莹,赛秋玥,杨柳.基于ArXiv和GitHub数据的多模态大模型技术发展研究[J].,2026,(1).

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-02-26
  • 最后修改日期:2026-01-05
  • 录用日期:2025-04-21
  • 在线发布日期: 2026-05-18
  • 出版日期:
文章二维码

联系电话:020-37635126(一、三、五)/83568469(二、四)(查稿)、37674300/82648174(编校)、37635521/82640284(财务)、83549092(传真)

联系地址:广东省广州市先烈中路100号大院60栋3楼302室(510070) 广东省广州市越秀区东风西路207-213星河亚洲金融中心A座8楼(510033)

邮箱:kjgl83568469@126.com kjgl@chinajournal.net.cn

科技管理研究 ® 2026 版权所有
技术支持:北京勤云科技发展有限公司
关闭
关闭