[关键词]
[摘要]
项目查重是保证科技项目立项公平性的必要手段,近年来已成为科技管理领域备受关注的问题之一。本文重点对已有的科技项目查重方法进行系统性的综述,为其他研究者快速了解相关的背景和方法提供有用的知识和线索。首先给出了科技项目查重的定义及其实现的一般过程,然后从文本预处理、特征提取、模型构建和相似度判别等维度对常用的方法进行分析和总结,讨论了其优点和不足,最后阐述了科技项目查重方法的未来发展趋势。
[Key word]
[Abstract]
Identification of highly similar scientific projects is an essential way of ensuring fairness of project approval. In recent years, it has been one of hot topics in science and technology management. This paper reviewed identification methods of highly similar scientific projects in a systemic way, which provided effective knowledge and clues for other researchers to quickly understand relevant background and methods. Firstly, a concept of identification of highly similar scientific projects and its general realization process were described. And then, we summarized common methods for text pre-processing, feature extraction, model construction and similarity discrimination, including their advantages and disadvantages. Finally, future development trends were discussed for identification methods of highly similar scientific projects.
[中图分类号]
G311
[基金项目]
国家自然科学“大数据挖掘在科技项目查重中的应用研究”(项目编号:71303223)