x-variables in test data do not match original training data

跑这句代码，testing_surv <- coxph(Surv(OS.time, OS) ~ fit.p$predicted,data = testing)
。错误：Error in generic.predict.rfsrc(object, newdata, m.target = m.target, importance = importance, :
x-variables in test data do not match original training data. 数据集的变量名、类型和模型变量名、类型完全一样，为什么还会出现这样的错误?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
檀越@新空间 2024-03-26 19:11
关注
蘑菇仙人晚上好🌙🌙🌙
本答案参考ChatGPT-3.5

出现这样的错误可能是由于以下原因：

变量名不匹配：检查测试数据集的变量名是否与原始训练数据集的变量名完全匹配。请确保变量名的大小写、空格和标点符号等完全一致。

变量类型不匹配：检查测试数据集的变量类型是否与原始训练数据集的变量类型匹配。例如，确保数值变量、字符变量和因子变量等的类型一致。

缺失值处理不一致：检查测试数据集中是否存在缺失值，并确保对缺失值的处理方式与原始训练数据集一致。例如，如果在训练模型时对缺失值进行了填充或删除操作，那么在测试数据集中也需要进行相同的处理。

解决方案：

确保测试数据集的变量名、类型和缺失值处理方式与原始训练数据集一致。

如果数据集比较大，可以使用函数如setdiff()来比较测试数据集的变量名与训练数据集的变量名，以找出不匹配的变量名。

如果变量类型不匹配，可以使用as.numeric()或as.factor()等函数将测试数据集中的变量转换为正确的类型。

如果测试数据集中有缺失值，可以使用函数如na.omit()或complete.cases()来删除缺失值，或使用函数如na.mean()或na.median()来填充缺失值。

修改后的代码示例：

# 检查变量名和变量类型是否匹配 if(all(colnames(testing) == colnames(training)) && all(sapply(training, class) == sapply(testing, class))){ # 根据预测值构建Cox模型 testing_surv <- coxph(Surv(OS.time, OS) ~ fit.p$predicted, data = testing) } else { stop("x-variables in test data do not match original training data.") }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Coursera | Applied Data Science with Python 专项课程 | Applied Machine Learning in Python
2023-01-11 23:48

NJ_Xavier的博客本文为学习笔记，记录了由University of Michigan推出的Coursera专项课程——Applied Data Science with Python中Applied Machine Learning in Python全部Assignment代码，已通过测试。
Python Interview Questions And Answers For Data Analyst
2025-06-27 01:33

FrankStewart的博客 Python Interview Questions And Answers For Data Analyst
Recent Advances in Zero-Shot Recognition(Toward data-efficient understanding of visual content)
2019-08-30 19:55

bxg1065283526的博客 Specifically, in the settings of zero-shot recognition, the recognition model should leverage training data from source/ auxiliary dataset/domain to identify the unseen target/testing dataset/domain....
Text Classification using Machine Learning Techniques in NLP
2023-07-28 00:50

程序员光剑的博客作者：禅与计算机程序设计艺术In this article we will explore text classification techniques used by Natural Language Processing (NLP) to classify documents or sentences into different categories based ...
“Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming”笔记
2024-09-15 15:32

狐狸25的博客但是QAT有三个主要限制：(a) they require the large training set to avoid over-fitting, (b) they approximate the back-propagation gradients through discrete function (the quantizer) and (c) they have ...
Assign02: Categorical Variables
2023-05-27 14:43

grinningGrace的博客 about what proportion of data should be in the test dataset rand_state - an int that is provided as the random state for splitting the data into training and test OUTPUT: test_score - float - r2 ...
可视化：从TensorFlow项目中可视化数据的2种方式
2023-08-04 01:14

程序员光剑的博客 (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data() x_val = x_train[-5000:] y_val = y_train[-5000:] x_train = x_train[:-5000] y_train = y_train[:-5000] mean = np.mean(x_train...
mpf6_Time Series Data_quandl_更正kernel PCA_AIC_BIC_trend_log_return_seasonal_decompose_sARIMAx_ADFull
2022-08-10 08:54

LIQING LIN的博客 mpf6_Time Series Data_quandl_更正kernel PCA_AIC_BIC_trend_log_return_seasonal_decompose_sARIMAx_Augmented Dickey-Fuller Test_ADF_Non-stationary time series_P-values_unit root_likelihood function
INT303 BIG DATA ANALYTICS
2024-12-24 10:36

＾茶＆日记＾的博客本文中蓝色部分的内容是为了在学习中更好地理解知识点，在复习时可忽略一、Introduction to Big Data Analytics 1. 数据简介 1.1 什么是数据？数据的定义：数据是由一组对象及其属性组成的集合。Collection of ...
深度学习 --- stanford cs231 编程作业(assignment1，Q2: SVM分类器)
2024-06-05 20:03

松下J27的博客 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' # Cleaning up variables to prevent loading data multiple times (which may cause memory issue) try: del X_train, y_train del X_test, y_test print('...
吴恩达深度学习编程作业（1-3）- Planar data classification with one hidden layer（平面花形状）
2019-10-04 19:12

duanyuchen的博客 Do not modify the (# GRADED FUNCTION [function name]) comment in some cells. Your work would not be graded if you change this. Each cell containing that comment should only contain one function. ...
INT303 Big Data Analytics 笔记
2024-12-31 12:39

Scabbards_的博客 In Databases, it is usually assumed that the table is dense (few null values) • There are a lot of data in this form • There are also a lot of data which do not fit well in this form Sparse data: ...
LLMs之Guanaco：《QLoRA：Efficient Finetuning of Quantized LLMs》翻译与解读
2023-06-30 00:40

一个处女座的程序猿的博客 Standard Finetuning 默认LoRA超参数与16位性能不匹配Default LoRA hyperparameters do not match 16- bit performance. 4位NormalFloat比4位浮点数性能更好4-bit NormalFloat yields better performance than 4-...
CS231n（1）：图片分类笔记与KNN编程作业
2019-11-26 22:28

自动驾驶小学生的博客声明：本博客笔记部分为CS231n官网笔记，...This is an introductory lecture designed to introduce people from outside of Computer Vision to the Image Classification problem, and the data-driven approach....
Machine Learning and Data Mining（机器学习与数据挖掘）
2018-01-22 20:49

weixin_30512785的博客 Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes ...
LLMs：《PaLM: Scaling Language Modeling with Pathways》翻译与解读
2022-06-27 00:29

一个处女座的程序猿的博客 LLMs：《PaLM: Scaling Language Modeling with Pathways》翻译与解读目录《PaLM: Scaling Language Modeling with Pathways》翻译与解读 ...4、Training Infrastructure训练基础设施 5、Trai
Accurate Semantic Image Segmentation Using Attention Mechanism
2023-10-09 02:24

程序员光剑的博客作者：禅与计算机程序设计艺术Semantic image segmentation (SIs) is one of the most challenging tasks in computer vision and medical imaging fields due to its high variability and complexity of realistic...
Python深度学习-快速指南
2020-09-21 04:27

cunzai1985的博客 In unsupervised learning, we make inferences from the input data that is not labelled or structured. If we have a million medical records and we have to make sense of it, find the underlying ...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 4月5日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 3月28日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 3月26日

x-variables in test data do not match original training data

1条回答 默认 最新

问题事件

1条回答默认最新