Abstract:Aiming at the problems of lack of semantic context, weak interpretability and vague topic definition in the patent technology subject identification method, this study proposes a technology subject identification and analysis method that integrates patent structure data and text semantics to solve the above problems, which will help to solve the above problems. As researchers grasp the technical research content, provide scientific support for R D decision-making. In this paper, patent IPC is used as structural data to improve the topic modeling of plain text to obtain topic word vectors guided by IPC and expert classification opinions. At the same time, word2vec is used to obtain semantic word vectors, and the two results are vector spliced to obtain accurate technical topics that are easy to explain. , to meet the fine-grained analysis requirements. Finally, taking the field of non-small cell lung cancer treatment as an empirical study, the scientificity and practicality of the method are confirmed.