public class CHAIDDistribution
extends weka.classifiers.trees.j48.Distribution
Modifier and Type | Field and Description |
---|---|
protected static ChiSquareSplitCrit |
chiSquareCrit
Static reference to splitting criterion.
|
protected double |
m_chiSquaredProb
ChiSquared probability of split.
|
private weka.classifiers.trees.j48.Distribution |
m_distri_orig
Saves a copy of the original distribution
|
protected boolean |
m_hasMissingValues
Indicates if there are missing values or not
|
private int[] |
m_indicators
The indicators used to map the old values.
|
protected int |
m_minNoObj
Minimum number of objects in a split.
|
protected int |
m_missingOriginalIdx
Index of Missing values in its original position
|
protected int |
m_numBagsNotEmpty
Indicates the number of bags not empty in the initial distribution
|
private boolean |
m_ordered
Indicate if the nature of the categories is ordered, that is to say,
if the values have to be merged with contiguous categories (Ordinal attributes)
or any grouping of categories is permissible (Nominal attributes)
|
protected boolean |
m_searchBestSplit
Indicates if the quest of the best binary split will be done, after merging 3 or more categories
This process could add a considerable latency and that is why it is optional.
|
protected double |
m_sigLevelAtt
Set the significance level for the selection of the attribute to split a node.
|
protected double |
m_sigLevelMergeSplit
Set the significance level for the quest of the best combination of the categories of an attribute
|
private static long |
serialVersionUID
for serialization
|
Constructor and Description |
---|
CHAIDDistribution(CHAIDDistribution toMerge)
Creates distribution with only one bag by merging all bags of given
distribution.
|
CHAIDDistribution(weka.core.Instances source,
CHAIDSplit modelToUse)
Creates a distribution according to given instances and split model.
|
CHAIDDistribution(int numBags,
int numClasses,
int minNoObj,
double sigLevelAtt,
double sigLevelMergeSplit,
boolean searchBestSplit,
boolean ordered)
Creates and initializes a new distribution.
|
Modifier and Type | Method and Description |
---|---|
double |
getChiSquaredProb()
Get the ChiSquared probability of split.
|
private int |
getIndexLastKnownBag()
Returns the index of the last known bag
|
int |
getIndicator(int originalCategoryIndex)
Returns the indicator where the given category is merged
|
protected int |
getIndicatorsLength()
Returns m_indicators' length, that is,
the original number of values of the attribute plus one more for missing values
|
private double[] |
getMatrixRow(int row)
Returns the given row of m_perClassPerBag matrix
|
int |
getMissingCurrentIndex()
Get the current index of the missing values
|
java.lang.String |
getRevision()
Returns the revision string.
|
void |
handleMissingValues()
Merges missing values with the most similar other group, if sufficiently less significant
|
private void |
initializeAndEliminateEmptyBags()
Eliminates empty rows from Distribution's structures
and initializes m_indicators adequately.
|
void |
initializeIndicators()
Initializes m_indicators adequately.
|
void |
mergeAll()
Merge all bags into only one bag
|
private void |
mergeCategories(int toMergeOne,
int toMergeTwo)
Merges the rows of two categories; always the second one over the first one
|
private void |
mergeIndicators(int toMergeOne,
int toMergeTwo)
Updates membership indicators after merging two categories; always the second one over the first one
|
private void |
mergeSmallGroups()
Merges any category having fewer observations than the specification for the minimum
subgroup size with the most similar other category, as measured by the smallest pairwise
chi-square
|
void |
mergeValues()
Merges values based on CHAID algorithm and returns list of subset indicators for the values.
|
private int |
numKnownBags()
Returns number of known bags.
|
void |
setHasMissingValues()
Sets m_hasMissingValues true
|
private void |
splitCategories(int mergedBagIndex,
int[] categoriesMerged,
int categoriesCount,
int i_bestSplit)
Splits the given bag by mergedBagIndex according to the best split computed
|
private void |
splitIndicators(int mergedBagIndex,
int[] categoriesMerged,
int categoriesCount,
int i_bestSplit)
Updates membership indicators after splitting the given bag by mergedBagIndex
according to the best split computed
|
private void |
splitLargeGroups(int mergedBagIndex)
Finds the most significant binary split of the just merged bag (Step 3),
if compound category consisting of three or more
// In some implementations this step is optional:
// http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/alg_tree-chaid_algorithm_merging.htm
|
actualNumBags, actualNumClasses, actualNumClasses, add, add, addInstWithUnknown, addRange, addWeights, check, clone, del, delRange, dumpDistribution, initialize, laplaceProb, laplaceProb, matrix, maxBag, maxClass, maxClass, numBags, numClasses, numCorrect, numCorrect, numIncorrect, numIncorrect, perBag, perClass, perClassPerBag, prob, prob, shift, shiftRange, sub, subtract, total
private static final long serialVersionUID
protected int m_missingOriginalIdx
protected boolean m_hasMissingValues
protected int m_numBagsNotEmpty
private int[] m_indicators
private weka.classifiers.trees.j48.Distribution m_distri_orig
protected final int m_minNoObj
protected double m_sigLevelAtt
protected double m_sigLevelMergeSplit
protected boolean m_searchBestSplit
protected double m_chiSquaredProb
private boolean m_ordered
protected static ChiSquareSplitCrit chiSquareCrit
public CHAIDDistribution(int numBags, int numClasses, int minNoObj, double sigLevelAtt, double sigLevelMergeSplit, boolean searchBestSplit, boolean ordered)
public CHAIDDistribution(CHAIDDistribution toMerge)
public CHAIDDistribution(weka.core.Instances source, CHAIDSplit modelToUse) throws java.lang.Exception
java.lang.Exception
- if something goes wrongpublic void initializeIndicators()
private void initializeAndEliminateEmptyBags()
public void mergeValues()
private void mergeCategories(int toMergeOne, int toMergeTwo)
toMergeOne
- Index of the first category to be mergedtoMergeTwo
- Index of the second category to be mergedprivate void mergeIndicators(int toMergeOne, int toMergeTwo)
toMergeOne
- Index of the first category to be mergedtoMergeTwo
- Index of the second category to be mergedprivate void splitLargeGroups(int mergedBagIndex)
mergedBagIndex
- index of the merged bagprivate void splitCategories(int mergedBagIndex, int[] categoriesMerged, int categoriesCount, int i_bestSplit)
mergedBagIndex
- index of the bag to be splitcategoriesMerged
- original indexes of the categories merged in the bagcategoriesCount
- number of categories mergedi_bestSplit
- indicates the best combination found to split the bagprivate void splitIndicators(int mergedBagIndex, int[] categoriesMerged, int categoriesCount, int i_bestSplit)
mergedBagIndex
- index of the bag to be splitcategoriesMerged
- original indexes of the categories merged in the bagcategoriesCount
- number of categories mergedi_bestSplit
- indicates the best combination found to split the bagprivate void mergeSmallGroups()
public void handleMissingValues()
public void mergeAll()
public int getIndicator(int originalCategoryIndex)
protected int getIndicatorsLength()
private int numKnownBags()
private int getIndexLastKnownBag()
private double[] getMatrixRow(int row)
public int getMissingCurrentIndex()
public double getChiSquaredProb()
public void setHasMissingValues()
public java.lang.String getRevision()
getRevision
in interface weka.core.RevisionHandler
getRevision
in class weka.classifiers.trees.j48.Distribution