public class JCHAIDStar extends JCHAID implements weka.core.OptionHandler, weka.core.Drawable, weka.core.Matchable, weka.classifiers.Sourcable, weka.core.Summarizable, weka.core.AdditionalMeasureProducer, weka.core.TechnicalInformationHandler, weka.core.PartitionGenerator
@article{Ibarguren2016, author = "Igor Ibarguren and Aritz Lasarguren and Jes\'us M. P\'erez and Javier Muguerza and Ibai Gurrutxaga and Olatz Arbelaitz", title = "BFPART: Best-First PART", journal = "Information Sciences", volume = "367-368", pages = "927-952", year = "2016", doi = "10.1016/j.ins.2016.07.023", abstract = "In supervised classification, decision tree and rule induction algorithms possess the desired ability to build understandable models. The PART algorithm creates partially developed C4.5 decision trees and extracts a rule from each tree. Some of the criteria used by this algorithm can be modified to yield better results. In this work, we propose and compare 16 variants of the PART algorithm from the perspectives of discriminating capacity, complexity of the models, and the computational cost, for 36 real-world problems obtained from the UCI repository. The use of the Best-First optimization algorithm to find the next node to develop in a partial tree improves the results of the PART algorithm. The best-performing variant also ranks first when compared to the well-established C4.5 algorithm and a modified version of the CHAID decision tree induction algorithm that handles continuous features, which is also proposed in this paper. In order to study its performance in comparison to other rivals, this comparison of algorithms also includes the original PART algorithm. For all performance measures, we test the results for statistical significance using state-of-the-art methods." }
@article{Kass1980, author = "G. V. Kass", title = "An Exploratory Technique for Investigating Large Quantities of Categorical Data", journal = "Journal of the Royal Statistical Society. Series C (Applied Statistics)", year = "1980", volume = "29 (2)", pages = "119-127", abstract = "The technique set out in the paper, CHAID, is an offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard AID include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.", url = "http://www.jstor.org/stable/2986296" }Valid options are: J48 options
-U Use unpruned tree.
-C <pruning confidence> Set confidence threshold for pruning. (default 0.25)
-M <minimum number of instances> Set minimum number of instances per leaf. (default 2)
-S Don't perform subtree raising.
-L Do not clean up after the tree has been built.
-A Laplace smoothing for predicted probabilities.
-Q <seed> Seed for random data shuffling (default 1).CHAID options
-CH-A <attribute significance level> Set the significance level for the selection of the attribute to split a node. (default 0.05)
-CH-M <merge-split significance level> Set the significance level for the quest of the best combination of values of attributes. (default 0.05)
-CH-S <minimum number of instances to split a node> Set minimum number of instances to consider a node to be split. (default 3)
-CH-O <att1,att2-att4,...> Specifies list of attribute indexes to set as ordinal. First and last are valid indexes. Warning: The list of attributes includes the class! (default none)
Modifier and Type | Field and Description |
---|---|
private static long |
serialVersionUID
for serialization
|
m_CHminNumObjSplit, m_CHordinalAtts, m_CHsearchBestSplit, m_CHsigLevelAtt, m_CHsigLevelMergeSplit, m_XRFFUsed
m_binarySplits, m_CF, m_collapseTree, m_doNotMakeSplitPointActualValue, m_minNumObj, m_noCleanup, m_numFolds, m_reducedErrorPruning, m_root, m_Seed, m_subtreeRaising, m_unpruned, m_useLaplace, m_useMDLcorrection
Constructor and Description |
---|
JCHAIDStar()
Constructor for JCHAIDStar in order to change the default value of the
Collapse Tree option of the J48 class.
|
Modifier and Type | Method and Description |
---|---|
void |
buildClassifier(weka.core.Instances instances)
Generates the classifier.
|
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
weka.core.TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
java.lang.String |
globalInfo()
Returns a string describing the classifier
|
static void |
main(java.lang.String[] argv)
Main method for testing this class
|
java.lang.String |
toString()
Returns a description of the classifier.
|
binarySplitsTipText, CHminNumObjSplitTipText, CHordinalAttributeIndicesTipText, CHsearchBestSplitTipText, CHsigLevelAttTipText, CHsigLevelMergeSplitTipText, getCHminNumObjSplit, getCHordinalAttributeIndices, getCHsearchBestSplit, getCHsigLevelAtt, getCHsigLevelMergeSplit, getOptions, listOptions, numFoldsTipText, prepareOrdinalAtts, reducedErrorPruningTipText, setBinarySplits, setCHminNumObjSplit, setCHordinalAttributeIndices, setCHsearchBestSplit, setCHsigLevelAtt, setCHsigLevelMergeSplit, setNumFolds, setOptions, setReducedErrorPruning, setUseMDLcorrection, toStringOrdinalAttributesList, useMDLcorrectionTipText
classifyInstance, collapseTreeTipText, confidenceFactorTipText, distributionForInstance, doNotMakeSplitPointActualValueTipText, enumerateMeasures, generatePartition, getBinarySplits, getCollapseTree, getConfidenceFactor, getDoNotMakeSplitPointActualValue, getMeasure, getMembershipValues, getMinNumObj, getNumFolds, getReducedErrorPruning, getRevision, getSaveInstanceData, getSeed, getSubtreeRaising, getUnpruned, getUseLaplace, getUseMDLcorrection, graph, graphType, measureNumLeaves, measureNumRules, measureTreeSize, minNumObjTipText, numElements, prefix, saveInstanceDataTipText, seedTipText, setCollapseTree, setConfidenceFactor, setDoNotMakeSplitPointActualValue, setMinNumObj, setSaveInstanceData, setSeed, setSubtreeRaising, setUnpruned, setUseLaplace, subtreeRaisingTipText, toSource, toSummaryString, unprunedTipText, useLaplaceTipText
batchSizeTipText, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, postExecution, preExecution, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
private static final long serialVersionUID
public JCHAIDStar()
public weka.core.Capabilities getCapabilities()
getCapabilities
in interface weka.classifiers.Classifier
getCapabilities
in interface weka.core.CapabilitiesHandler
getCapabilities
in class JCHAID
Capabilities
public java.lang.String globalInfo()
globalInfo
in class JCHAID
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface weka.core.TechnicalInformationHandler
getTechnicalInformation
in class JCHAID
public void buildClassifier(weka.core.Instances instances) throws java.lang.Exception
buildClassifier
in interface weka.classifiers.Classifier
buildClassifier
in class JCHAID
instances
- the data to train the classifier withjava.lang.Exception
- if classifier can't be built successfullypublic java.lang.String toString()
public static void main(java.lang.String[] argv)
argv
- the command line options