28 DEZEMBRO / SÁBADO / 11:06
FCUP PT 
 EN
 
 
APRESENTAÇÃO
PESSOAS
ENSINO
INVESTIGAÇÃO
BIBLIOTECA
NOTÍCIAS
CONTACTOS

Parallel Algorithms for Decision Trees Induction

Nuno Reis Amado

Mestrado em Inteligência Artificial e Computação
Universidade do Porto

January 2002


Abstract

In the field of Data Mining and Machine Learning the amount of data available for building classification procedures is growing very fast. For the construction of classification procedures, the use of larger amounts of data increases the probability of achieving procedures with higher classification accuracy. However, it also implies higher costs in time and complexity for building those procedures. Therefore, there is a great need for algorithms capable of building classifiers from larger datasets and, simultaneously, being computationally efficient and scalable. One possible solution for reducing the amount of time used in the construction of classification procedures from very large datasets, and, by this, keeping their classification accuracy, is the use of parallelism.

In this work we only considered those classification procedures based upon decision trees since they have been extensively used in many fields and considered as the most suitable models for data mining and machine learning. We present a few strategies used in the implementation of parallel decision tree induction algorithms based on techniques such as task parallelism, data parallelism and hybrid parallelism.

Based upon the C4.5 decision tree induction algorithm and on some of the introduced strategies and techniques, we developed a new parallel decision tree induction algorithm, implemented it using MPI and analysed its performance on a distributed memory cluster.


FCUP 2024