Abstract:This study aims to develop an efficient and precise classification model to distinguish between basic and applied research in individual scientific papers. We constructed a BERT-TextCNN model combined with semi-automatic annotation. This approach minimizes manual annotation effort and augments the model's classification efficiency. BERT is employed to produce text vectors, while TextCNN is used to distill key features. Furthermore, we analyzed the classification outcomes in quantum information using bibliometric techniques and the BERTopic model. The F1 score achieved by this model is 0.896, marking an increase of 2.1 and 7.9 percentage points over BERT and TextCNN, respectively. Impressively, it outperforms prominent large language models such as Baichuan4-Turbo, Deepseek-v3, and GLM-4-plus by 12.2, 13.1, and 18.8 percentage points, respectively. These results underscore the efficacy of integrating semantic representation with local features and effectively address the prevalent "high recall, low precision" challenge in domain-specific classifications using large language models. In the context of quantum information, the model reveals that basic research primarily centers on areas like quantum states, entanglement, and ion spins, whereas applied research predominantly focuses on key distribution, quantum sensing, and network components. This research introduces an innovative method for classifying scientific literature, holding substantial implications for research evaluation and resource optimization.