Annealing

Classify Steel Annealing

Source:

Donors: 
David Sterling and Wray Buntine


Papers That Cite This Data Set1:

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.

Qingping Tao Ph. D. MAKING EFFICIENT LEARNING ALGORITHMS WITH EXPONENTIALLY MANY FEATURES. Qingping Tao A DISSERTATION Faculty of The Graduate College University of Nebraska In Partial Fulfillment of Requirements. 2004. 


Yuan Jiang and Zhi-Hua Zhou. Editing Training Data for kNN Classifiers with Neural Network Ensemble. ISNN (1). 2004. 

Attribute Data set Categorical Continuous Size Class annealing 33 5 798 6 credit 9 6 690 2 glass 0 9 214 7 hayes-roth 4 0 132 3 iris 0 4 150 3 liver 0 6 345 2 pima 0 8 768 2 soybean 35 0 683 19 wine 0 13 178 3 zoo 16


Jihoon Yang and Rajesh Parekh and Vasant Honavar. DistAl: An inter-pattern distance-based constructive learning algorithm. Intell. Data Anal, 3. 1999. 

TABLE II Comparison of generalization accuracy between various algorithms. DistAl is the results of our approach and NN is the best results in [40]. Dataset DistAl NN Annealing 96.6 96.1 Audiology 66.0 77.5 Bridge 63.0 60.6 Cancer 97.8 95.6 Credit 87.7 81.5 Flag 65.8 58.8 Glass 70.5 72.4 Heart 86.7 83.1 Heart (Cleveland) 85.3 80.2 Heart (Hungary) 85.9


Pedro Domingos. Knowledge Discovery Via Multiple Models. Intell. Data Anal, 2. 1998. 

between the learner's bias and that of the probability estimation procedure is important for good results. Disabling the generation of missing values had a large negative impact in the annealingdataset, where very large numbers of missing values are present, and a less discernible one in the datasets where fewer such values occur. C4.5RULES's pruning parameters during the meta-learning phase can


Zhi-Hua Zhou and Xu-Ying Liu. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

is apparently worse than that of sole BP. Table X and Fig. 4 also show that threshold-moving is always effective, and soft-ensemble only causes negative effect on the most seriously imbalanced data set annealing SMOTE and hardensemble cause negative effect on soybean and annealing. It is noteworthy that the sampling methods cause negative effect on almost all data sets suffering from class


James J. Liu and James Tin and Yau Kwok. An Extended Genetic Rule Induction Algorithm. Department of Computer Science Wuhan University. 

368 15 7 2 iris 150 0 4 3 vehicle 846 0 18 4 Table 2: Average accuracies and standard deviations over the ten folds (Numbers in bold indicate the highest accuracy obtained over the four methods). Dataset majority RISE SIA ESIA annealing 76.17#0.06 90.65#0.02 86.53#0.03 93.32 #0.01 australian 55.51#0.04 85.36#0.02 72.46#0.19 80.58#0.10 breast 65.52#0.02 91.85#0.07 84.84#0.02 94.71 #0.04 cleveland


http://archive.ics.uci.edu/ml/datasets/Annealing


Evaluation

The evaluation of this dataset is done using Area Under the ROC curve (AUC).

An example of its application are ROC curves. Here, the true positive rates are plotted against false positive rates. An example is below. The closer AUC for a model comes to 1, the better it is. So models with higher AUCs are preferred over those with lower AUCs.

Please note, there are also other methods than ROC curves but they are also related to the true positive and false positive rates, e. g. precision-recall, F1-Score or Lorenz curves.

                                            Example of a ROC curve

AUC is used most of the time to mean AUROC,  AUC is ambiguous (could be any curve) while AUROC is not.


Interpreting the AUROC

The AUROC has several equivalent interpretations:

  • The expectation that a uniformly drawn random positive is ranked before a uniformly drawn random negative.
  • The expected proportion of positives ranked before a uniformly drawn random negative.
  • The expected true positive rate if the ranking is split just before a uniformly drawn random negative.
  • The expected proportion of negatives ranked after a uniformly drawn random positive.
  • The expected false positive rate if the ranking is split just after a uniformly drawn random positive.

Computing the AUROC

Assume we have a probabilistic, binary classifier such as logistic regression.

Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept ofconfusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:

  • We predict 0 while we should have the class is actually 0: this is called a True Negative, i.e. we correctly predict that the class is negative (0). For example, an antivirus did not detect a harmless file as a virus .
  • We predict 0 while we should have the class is actually 1: this is called a False Negative, i.e. we incorrectly predict that the class is negative (0). For example, an antivirus failed to detect a virus.
  • We predict 1 while we should have the class is actually 0: this is called a False Positive, i.e. we incorrectly predict that the class is positive (1). For example, an antivirus considered a harmless file to be a virus.
  • We predict 1 while we should have the class is actually 1: this is called a True Positive, i.e. we correctly predict that the class is positive (1). For example, an antivirus rightfully detected a virus.

To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:

enter image description here

In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.

Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:

  • True positive rate (TPR), aka. sensitivity, hit rate, and recall, which is defined as TPTP+FN. Intuitively this metric corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. In other words, the higher TPR, the fewer positive data points we will miss.
  • False positive rate (FPR), aka. fall-out, which is defined as FPFP+TN. Intuitively this metric corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. In other words, the higher FPR, the more negative data points we will missclassified.

To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example 0.00;0.01,0.02,,1.00) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.

The following figure shows the AUROC graphically:

enter image description here

In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.

If you want to get some first-hand experience:

Source : http://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it

Rules

  • One account per participant

    You cannot sign up from multiple accounts and therefore you cannot submit from multiple accounts.

  • No private sharing outside teams

    Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the forums.

  • Submission Limits

    You may submit a maximum of 5 entries per day.

    You may select up to 2 final submissions for judging.

Specific Understanding

  • Use of external data is not permitted. This includes use of pre-trained models.
  • Hand-labeling is allowed on the training dataset only. Hand-labeling is not permitted on test data and will be grounds for disqualification.