Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Alternating Decision Tree: Fundamentals and Applications
Alternating Decision Tree: Fundamentals and Applications
Alternating Decision Tree: Fundamentals and Applications
Ebook206 pages2 hours

Alternating Decision Tree: Fundamentals and Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What Is Alternating Decision Tree


A categorization strategy that may be learned by machine learning is known as an alternating decision tree, or ADTree. It is connected to boosting and generalizes decision trees at the same time.


How You Will Benefit


(I) Insights, and validations about the following topics:


Chapter 1: Alternating Decision Tree


Chapter 2: Decision Tree Learning


Chapter 3: AdaBoost


Chapter 4: Random Forest


Chapter 5: Gradient Boosting


Chapter 6: Propositional Calculus


Chapter 7: Support Vector Machine


Chapter 8: Method of Analytic Tableaux


Chapter 9: Boolean Satisfiability Algorithm Heuristics


Chapter 10: Multiplicative Weight Update Method


(II) Answering the public top questions about alternating decision tree.


(III) Real world examples for the usage of alternating decision tree in many fields.


(IV) 17 appendices to explain, briefly, 266 emerging technologies in each industry to have 360-degree full understanding of alternating decision tree' technologies.


Who This Book Is For


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of alternating decision tree.

LanguageEnglish
Release dateJun 23, 2023
Alternating Decision Tree: Fundamentals and Applications

Read more from Fouad Sabry

Related to Alternating Decision Tree

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Alternating Decision Tree

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Alternating Decision Tree - Fouad Sabry

    Chapter 1: Alternating decision tree

    A machine learning technique for classification is called an alternating decision tree (ADTree). It connects to boosting and generalizes decision trees.

    An ADTree is made up of prediction nodes, which carry a single number, and decision nodes, which provide a predicate condition. An ADTree categorizes an instance by traversing all paths for which all prediction nodes are true and adding any decision nodes that are true.

    Yoav Freund and Llew Mason presented ADTrees. There are implementations in Weka and JBoost.

    Decision trees or decision stumps were often utilized as weak hypotheses in the original boosting algorithms.

    For illustration, boosting decision stumps creates a set of T weighted decision stumps (where T is the number of boosting iterations), which subsequently cast votes based on their weights for the final classification.

    Individual judgments are weighted based on how well they can categorize the data.

    Boosting a simple learner results in an unstructured set of T hypotheses, preventing the inference of connections between attributes.

    By mandating that they build off a hypothesis that was formed in an earlier iteration, alternating decision trees provide the set of hypotheses structure.

    Based on the connection between a hypothesis and its parent, the resulting collection of hypotheses can be represented as a tree.

    The fact that the data is given a different distribution at each iteration is another crucial aspect of boosted algorithms. Occurrences that are incorrectly classified are given more weight, whereas instances that are correctly categorised are given less weight.

    Decision nodes and prediction nodes make up an alternating decision tree. A predicate condition is specified by decision nodes. Prediction nodes only have one number in them. Prediction nodes always serve as the root and leaves of ADTrees. ADTrees classify instances by traversing all pathways for which all decision nodes are true and adding any traveled prediction nodes. Contrary to binary classification trees like CART (Categorization and Regression Tree) or C4.5, where an instance takes a single path through the tree, this type of tree has several classification levels.

    The spambase dataset was used to create the following tree using JBoost. Regular email is coded as 1 and spam is coded as 1 in this case.

    An ADTree for 6 iterations on the Spambase dataset.

    Part of the details for a single instance are shown in the following table.

    By adding up all of the prediction nodes that the instance passes, it is given a score. In the scenario described above, the score is determined as

    The occurrence is categorized as spam because the final score of 0.657 is positive. The value's magnitude serves as a barometer for forecast confidence. Three degrees of interpretation for the collection of characteristics detected by an ADTree are listed by the original authors:

    It is possible to assess each node's capacity for prediction separately.

    It is possible to perceive groups of nodes on the same path as having a joint effect.

    One can understand the tree as a whole.

    Individual nodes should be interpreted with caution because the scores represent a reweighting of the data for each iteration.

    The alternating decision tree algorithm's inputs are:

    A set of inputs (x_1,y_1),\ldots,(x_m,y_m) where x_{i} is a vector of attributes and y_{i} is either -1 or 1.

    Additionally known as instances, inputs.

    A set of weights w_{i} corresponding to each instance.

    The rule is the core component of the ADTree algorithm. Precondition, condition, and two scores make up a single rule. A predicate using the syntax attribute comparison> value is a condition. Simply said, a precondition is the logical union of conditions. In order to evaluate a rule, there are two nested if statements:

    1 if (precondition)

    2 if (condition)

    3 return score_one

    4 else

    5 return score_two

    6 end if

    7 else

    8 return 0

    9 end if

    The algorithm also needs a number of auxiliary functions:

    W_+(c) returns the sum of the weights of all positively labeled examples that satisfy predicate c

    W_-(c) returns the sum of the weights of all negatively labeled examples that satisfy predicate c

    W(c) = W_+(c) + W_-(c) returns the sum of the weights of all examples that satisfy predicate c

    The algorithm looks like this:

    1 function ad_tree

    2 input Set of m training instances

    3

    4 wi = 1/m for all i

    5 a = \frac 1 2 \textrm{ln}\frac{W_+(true)}{W_-(true)}

    6 R0 = a rule with scores a and 0, precondition true and condition true.

    7 \mathcal{P} = \{true\}

    8 \mathcal{C} = the set of all possible conditions

    9 for j = 1 \dots T

    10 p \in \mathcal{P}, c \in \mathcal{C} get values that minimize

    z = 2 \left( \sqrt{W_+(p \wedge c) W_-(p \wedge c)} + \sqrt{W_+(p \wedge \neg c) W_-(p \wedge \neg c)} \right) +W(\neg p)

    11 \mathcal{P} += p \wedge c + p \wedge \neg c

    12 a_1=\frac{1}{2}\textrm{ln}\frac{W_+(p\wedge c)+1}{W_-(p \wedge c)+1}

    13

    a_2=\frac{1}{2}\textrm{ln}\frac{W_+(p\wedge \neg c)+1}{W_-(p \wedge \neg c)+1}

    14 Rj = new rule with precondition p, condition c, and weights a1 and a2

    15 w_i = w_i e^{ -y_i R_j(x_i) }

    16 end for

    17 return set of Rj

    The set {\mathcal {P}} grows by two preconditions in each iteration, and by noting the precondition that is used in each succeeding rule, it is able to determine the tree structure of a set of rules.

    ADTrees are often just as reliable as boosted decision trees and boosted decision stumps, as seen in the original paper's Figure 6. Usually, a much simpler tree structure than recursive partitioning algorithms can attain equal accuracy.

    {End Chapter 1}

    Chapter 2: Decision tree learning

    Learning via the use of decision trees is a kind of supervised learning that is used in the fields of statistics, data mining, and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to derive conclusions about a collection of data. [C]lassification decision trees [R]egression decision trees [C]lassification decision trees [D]ecision trees.

    Classification trees are tree models in which the goal variable may take on a finite number of values. In these tree structures, leaves indicate class labels, and branches represent conjunctions of characteristics that lead to those class labels. Regression trees are a kind of decision tree that are used when the target variable may take on continuous values, which are commonly represented by real numbers. In a broader sense, the idea of a regression tree may be applied to any kind of object that has pairwise dissimilarities, such as categorical sequences.

    A decision tree is a useful tool for decision analysis because it may graphically and clearly reflect choices and the decision-making process. In data mining, a decision tree explains data (but the resulting classification tree can be an input for decision making).

    Data mining often makes use of a methodology known as decision tree learning. To build a model that can accurately forecast the value of a target variable given numerous input factors is the objective of this project.

    A decision tree is a straightforward form that may be used to categorize cases. Assume that all of the input features have finite discrete domains for the sake of this section, and that there is a single goal feature that is referred to as the classification. The term class refers to each individual component that makes up the overall domain of the categorization. A tree is referred to as a decision tree or a classification tree when each internal node, which is a node that is not a leaf, is labeled with an input characteristic. The arcs emanating from a node that is labeled with an input feature are either labeled with every conceivable value that may be assigned to the target feature or they go to a subordinate decision node that is labeled with a different input feature. The data set has been classified by the tree into either a specific class or into a particular probability distribution, and each leaf of the tree is labeled with either a class or a probability distribution over the classes. This indicates that the data set has been assigned to one of these two categories (which, if the decision tree is well-constructed, is skewed towards certain subsets of classes).

    In order to construct a tree, the source set, which serves as the tree's root node, must first be partitioned into subsets, which then give rise to the tree's successor offspring. The division is done according to a predetermined set of criteria that are determined by the categorization characteristics. This technique is carried out in a recursive fashion known as recursive partitioning, where it is applied to each derived subset. When the subset at a node has all the same values of the target variable, or when splitting no longer adds value to the predictions, the recursion has reached its conclusion and the process is complete. This method, which involves building decision trees from the top down (TDIDT)

    In the field of data mining, decision trees may also be defined as the mix of mathematical and computational methods that are used to assist in the description, classification, and generalization of a particular data set.

    The information is stored in records of the form:

    ({\textbf {x}},Y)=(x_{1},x_{2},x_{3},...,x_{k},Y)

    The dependent variable, Y , is the variable that we are focusing our attention on attempting to comprehend, either categorize or generalize it.

    The vector {\textbf {x}} is composed of the features, x_{1},x_{2},x_{3} etc, things are employed in performing the duty.

    There are two primary varieties of decision trees used in data mining:

    The conclusion of a classification tree analysis is the class (discrete) to which the data belongs, and the technique is named after its namesake.

    When the projected result may be regarded a real number (like the price of a property or the amount of time a patient spends in the hospital, for example), regression tree analysis is used.

    CART analysis, which stands for classification and regression tree analysis, is an umbrella phrase that may be used to refer to any of the aforementioned processes. This concept was initially established by Breiman et al. in 1984.

    Some approaches, which are sometimes referred to as ensemble methods, include the construction of many decision trees:

    Boosted trees Putting together an ensemble bit by little by teaching each new training instance to highlight the training instances that were previously mismodeled. One example that is usual is AdaBoost. These are useful for doing regression analysis. difficulties of both the type and the categorization type.

    An early approach known as bootstrap aggregated decision trees, also known as bagged decision trees, creates many decision trees by continually resampling training data with replacement, then voting on the trees to see which one produces the most accurate forecast.

    There are several varieties of bootstrap aggregating, one of which is called a random forest classifier.

    Every decision tree in a rotational forest is trained by first doing principal component analysis (PCA) on a different subset of the input characteristics. This is done in a rotational forest.

    Among the most notable decision tree algorithms are::

    ID3 (Iterative Dichotomiser 3)

    C4.5 (successor of ID3)

    CART (Classification And Regression Tree)

    Chi-square automated interaction detection (CHAID). When constructing classification trees, this function performs splits on several levels.

    MARS is an extension of decision trees that improves their ability to deal with numerical input.

    Inference trees based on conditions. method based on statistics that divides data based on non-parametric tests as a criterion, with numerous tests taken into account and compensated for to prevent overfitting. This method yields a selection of predictors that is objective and does not call for any trimming.

    Both ID3 and CART were separately developed around the same time, between the years 1970 and 1980, although they use a very similar method for building a decision tree using training tuples.

    In addition to this, the use of notions from fuzzy set theory for the formulation of a customized version of decision trees has been suggested, often referred to as a Fuzzy Decision Tree (FDT).

    In accordance with this particular form of fuzzy categorization, generally, an input vector {\textbf {x}} is associated with multiple classes, every single one with a unique degree of assurance.

    In recent years, boosted ensembles

    Enjoying the preview?
    Page 1 of 1