Abstract:To address the issue of low accuracy in identifying core patents, the indicator system was reconstructed. To address the problem of the traditional core patent identification method"s poor performance in handling imbalanced data, a combined model of resampling techniques and ensemble algorithms was proposed. First, patent inventors" relevant indicators were added to the traditional indicator construction foundation. Second, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm was used to increase the number of minority samples to solve the data imbalance problem. Then, the Local Outlier Factor (LOF) algorithm was used to denoise the newly generated samples, and combined with the Adaptive Boosting (Adaboost) algorithm to form the SMOTE-LOF-Adaboost model. Finally, taking the 22,077 photovoltaic field patent data from the Patsnap patent database from 2012 to 2016 as an example, SVM, Adaboost, SMOTE-Adaboost, and SMOTE-LOF-Adaboost were used for empirical analysis. The results showed that the SMOTE-LOF-Adaboost model had a mean AUC of 0.9776, a mean Recall of 0.9860, and a mean F1 score of 0.9607, which were superior to the other three models, and the standard deviation of each indicator was smaller. This indicates that the SMOTE-LOF-Adaboost model not only improves the accuracyof core patent prediction but also has higher model stability.