python - Found input variables with inconsistent numbers of samples error

Teaching RandomRorest –

Here’s the code:

X_train, X_test, y_train, y_test = train_test_split (X, y,
                 test_size = 0.8,
                 random_state = 241)
RFC = RandomForestClassifier (n_estimators = 37, random_state = 241)
RFC.fit (X_train, y_train)
scor_test = []
for predict in RFC.predict_proba (X_test):
  x_scor = log_loss (y_test, predict)
  scor_test.apend (x_scor)

After executing the last block, error:

ValueError Traceback (most recent call last)
& lt; ipython-input-152-01347a72f1da & gt; in & lt; module & gt;
   1 scor_test = []
   2 for predict in RFC.predict_proba (X_test):
---- & gt; 3 x_scor = log_loss (y_test, predict)
   4 scor_test.apend (x_scor)
~ \ Anaconda3 \ lib \ site-packages \ sklearn \ metrics \ classification.py in log_loss (y_true, y_pred, eps, normalize, sample_weight, labels)
  1762 "" "
  1763 y_pred = check_array (y_pred, ensure_2d = False)
- & gt; 1764 check_consistent_length (y_pred, y_true, sample_weight)
  1765
  1766 lb = LabelBinarizer ()
~ \ Anaconda3 \ lib \ site-packages \ sklearn \ utils \ validation.py in check_consistent_length (* arrays)
  233 if len (uniques) & gt; 1:
  234 raise ValueError ("Found input variables with inconsistent numbers of"
- & gt; 235 "samples:% r"% [int (l) for l in lengths])
  236
  237
ValueError: Found input variables with inconsistent numbers of samples: [2, 3001]
Found input variables with inconsistent numbers of samples

Where did I go wrong?

Additional information:

y_test.shape - (3001,)
RFC.predict_proba (X_test) .shape - (3001, 2)

Is there a problem in the dimension of the matrices?

Answer 1, authority 100%

try this:

In [6]: X_train.shape
Out [6]: (750, 1776)
In [7]: RFC = RandomForestClassifier (n_estimators = 37, random_state = 241)
  ...: RFC.fit (X_train, y_train)
  ...:
Out [7]:
RandomForestClassifier (bootstrap = True, class_weight = None, criterion = 'gini',
      max_depth = None, max_features = 'auto', max_leaf_nodes = None,
      min_impurity_decrease = 0.0, min_impurity_split = None,
      min_samples_leaf = 1, min_samples_split = 2,
      min_weight_fraction_leaf = 0.0, n_estimators = 37, n_jobs = None,
      oob_score = False, random_state = 241, verbose = 0, warm_start = False)
In [8]: predicted = RFC.predict (X_test)
In [9]: loss = log_loss (y_test, predicted)
In [10]: loss
Out [10]: 9.27641427545646

PS This answer shows how to get rid of the error indicated in the question. But it is not clear from the question what the author originally wanted to do. Why count “logistic loss” and even in a loop …

check the accuracy of the model on a test sample:

In [11]: RFC.score (X_test, y_test)
Out [11]: 0.7314228590469843

Found input variables with inconsistent numbers of samples error

Answer 1, authority 100%

Programmers, Start Your Engines!

Recent questions

yandex cards disappear labels with zoom

Embarcadero C++ Builder 10.3 does not give prompts by code

Found input variables with inconsistent numbers of samples error

Return to previous page

Lua C++ error handling