With the development of artificial intelligence technology, various achievements have been realized in data-driven nuclear power plant fault diagnosis. Even endowed with high flexibility and practicability, most of the proposed data-driven methods are based on the same assumptions that the test data is in the same distribution as the training data. In practice, nuclear power plants may be in variable operating conditions, which brings challenges to the generalization of the diagnosis model trained by finite data. In this paper, the widely used data-driven models in nuclear power plant fault diagnosis: Random Forest (RF), K-Nearest Neighbor algorithm (KNN), Fully Connected Neural Network (FCNN) and Convolutional Neural Network (CNN) are taken as examples to study the influence of the distribution discrepancy between training data (source domain) and test data (target domain) on their generalization. The results show that the distribution discrepancy exert serious adverse effects on the diagnostic performance of the data-driven models. At the same time, to improve the generalization of data-driven models, a nuclear power plant fault diagnosis transfer learning method based on pre-trained model is proposed, which can utilize the fault diagnosis knowledge from the source domain task to accelerate the model training in the target domain task, so that the model can achieve better diagnosis performance with limited labeled data in target domain.