基于BM25的勘察设计企业科研项目重复性检测方法研究
作者:
作者单位:

上海勘测设计研究院有限公司

中图分类号:

TP311.5;F224;G301

基金项目:

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    由于勘察设计企业科研重复投入的情况日益凸显,是对资金、人力、信誉乃至科研精神的损耗,不利于尖端技术的孵化,因此通过智能化手段自动识别科研课题重复性,最大化复用科研成果势在必行。结合BM25算法的基础理论,融合勘察设计企业的数据属性,引入领域、专业、负责人等特征值,提出一种聚焦企业内部的科研项目重复性检测方法。该方法涉及4个步骤,包括文本预处理,建立匹配库,根据TF-IDF算法、BM25算法分别计算输入课题与匹配库中课题的相似度,最后分析计算结果。该算法在新能源、工程数字化和信息化领域的研究课题中应用,相较于TF-IDF算法在区分度上具备明显优势,该算法的计算时间小于0.1 s,可满足商用,在科研课题立项重复性校验、成果重合度判定中发挥支撑作用,计算结果经技术研发人员复验,准确性满足业务管理需要在勘察设计行业具有推广价值。

    Abstract:

    The increasing prominence of redundant research investment in survey and design enterprises highlights a depletion of funds, human resources, reputation, and even the spirit of scientific research, which is detrimental to the incubation of cutting-edge technologies. Hence, it is imperative to automatically identify the redundancy of scientific research topics and maximize the reuse of scientific research outcomes through intelligent means. This paper proposes a method for detecting the redundancy of scientific research projects within enterprises, integrating the basic theory of the BM25 algorithm and combining the data attributes of survey and design enterprises with characteristic values such as domain, specialty, and project leaders. The method involves four steps: text preprocessing, establishing a matching library, calculating the similarity between the input topic and the topics in the matching library using the TF-IDF algorithm and the BM25 algorithm, respectively, and finally analyzing the calculation results. Applied in the research topics of new energy, engineering digitalization, and informatization, this algorithm demonstrates a distinct advantage in differentiation over the TF-IDF algorithm, with a computation time of less than 0.1 seconds, meeting commercial needs. It supports the verification of redundancy in research topic initiation and the determination of overlap in outcomes. The accuracy of the calculation results has been verified by technical research and development personnel, meeting the needs of business management and holds promotional value in the survey and design industry.

    参考文献
    相似文献
    引证文献
引用本文

王扬,曹德威,王剑刚,钱锋,钱常运.基于BM25的勘察设计企业科研项目重复性检测方法研究[J].,2024,44(4).

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-09-18
  • 最后修改日期:2024-03-22
  • 录用日期:2023-12-01
  • 在线发布日期: 2025-03-19
文章二维码

联系电话:020-37635126(一、三、五)/83568469(二、四)(查稿)、37674300/82648174(编校)、37635521/82640284(财务)、83549092(传真)

联系地址:广东省广州市先烈中路100号大院60栋3楼302室(510070) 广东省广州市越秀区东风西路207-213星河亚洲金融中心A座8楼(510033)

邮箱:kjgl83568469@126.com kjgl@chinajournal.net.cn

科技管理研究 ® 2025 版权所有
技术支持:北京勤云科技发展有限公司
请使用 Firefox、Chrome、IE10、IE11、360极速模式、搜狗极速模式、QQ极速模式等浏览器,其他浏览器不建议使用!
关闭
关闭