# fit regression design to each clustery I have a cultivate data set (C1) and also a check data set (C2). Each one has actually 129 variables. I did k method cluster evaluation on the C1 and then separation my data collection based on swarm membership and also created a perform of different clusters (C1<<1>>, C1<<2>>, ..., C1<>). I additionally assigned a cluster membership come each instance in C2 and also created C2<<1>>,..., C2<>. Then i fit a linear regression to each cluster in C1. Mine dependant variable is "Death". Mine predictors are various in every cluster and also vars<> (i=1,...,k) shows a list of predictors" name. I desire to predict death for each instance in check data collection (C2<<1>>,..., C2<). When I run the adhering to code, for several of the clusters.

You are watching: Prediction from a rank-deficient fit may be misleading

I acquired this warning:

In predict.lm(y<>, C2<>) :prediction indigenous a rank-deficient fit might be misleadingI review a lot about this warning yet I couldn"t number out what the concern is.


r statistics linear-regression lm
re-superstructure
improve this question
follow
edited Oct 1 "19 in ~ 21:03
*

Karolis Koncevičius
7,84799 gold badges5050 silver badges7777 bronze badges
inquiry Oct 25 "14 in ~ 1:56
*

MahsaMahsa
49111 gold badge55 silver- badges99 bronze title
1
add a comment |

3 answers 3


active oldest Votes
58
You can examine the predict duty with body(predict.lm). There you will check out this line:

if (p This warning checks if the location of your data procession is at the very least equal come the number of parameters you want to fit. One way to invoke the is having actually some upright covariates:

data notification that x3 and also x4 have the very same direction in data. One is the lot of of the other. This can be checked with length(fit$coefficients) > fit$rank

Another way is having an ext parameters than available variables:

fit2
re-superstructure
enhance this price
follow
edited Dec 23 "14 at 1:53
answered Oct 25 "14 in ~ 7:44

*

Karolis KoncevičiusKarolis Koncevičius
7,84799 gold badges5050 silver- badges7777 bronze badges
2
add a comment |
16
This warning:

In predict.lm(model, test) : prediction native a rank-deficient fit may be misleadingGets thrown native R"s predict.lm. See: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html

Understand rank deficiency: ask R come tell friend the location of a matrix:

train A matrix that go not have "full rank" is said to be "rank deficient". A matrix is claimed to have full rank if its rank is either same to its variety of columns or come its number of rows (or to both).

The trouble is that predict.lm will throw this warning even if your matrices are complete rank (not location deficient) because predict.lm pulls a rapid one under the hood, by throwing the end what that considers useless features, editing and enhancing your full rank intake to be rank-deficient. It climate complains around it through a warning.

Also this warning appears to be a catch-all for other cases like for instance you have actually too many input features and your data thickness is as well sparse and it"s giving up it"s opinion the predictions space brittle.

See more: What Is The Name Of The Projections On The Inner Surface Of The Small Intestine?

Example that passing full rank matrices, yet predict.lm quiet complains of rank deficiency

train workaround:

Assuming suspect is returning an excellent predictions, you have the right to ignore the warning. Predict.lm provides up it"s opinion given insufficient perspective and also here you are.

So disable warnings top top the predict step like this:

options(warn=-1) #turn turn off warningspredict(model, test)options(warn=1) #turn warnings earlier on