Publication:
A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

dc.contributor.advisor Abbass, Hussein en_US
dc.contributor.advisor Lokan, Chris en_US
dc.contributor.author Dam, Hai Huong en_US
dc.date.accessioned 2022-03-22T09:12:04Z
dc.date.available 2022-03-22T09:12:04Z
dc.date.issued 2008 en_US
dc.description.abstract Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks. en_US
dc.identifier.uri http://hdl.handle.net/1959.4/38865
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/au/ en_US
dc.subject.other Data mining en_US
dc.subject.other Action map en_US
dc.subject.other Classification en_US
dc.subject.other Data stream en_US
dc.subject.other Neural network en_US
dc.subject.other Noisy data en_US
dc.subject.other Non-stationary environment en_US
dc.subject.other Reinforcement learning en_US
dc.subject.other Rule-based system en_US
dc.subject.other Static environment en_US
dc.subject.other Stream data mining en_US
dc.subject.other Supervised learning en_US
dc.subject.other Distributed data mining en_US
dc.subject.other Dynamic environment en_US
dc.subject.other Ensemble learning en_US
dc.subject.other Evolutionary computation en_US
dc.subject.other Genetic algorithm en_US
dc.subject.other Knowledge discovery en_US
dc.subject.other Learning classifier system en_US
dc.subject.other Negative correlation learning en_US
dc.title A scalable evolutionary learning classifier system for knowledge discovery in stream data mining en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Dam, Hai Huong
dspace.entity.type Publication en_US
unsw.accessRights.uri https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi https://doi.org/10.26190/unsworks/18102
unsw.relation.faculty UNSW Canberra
unsw.relation.originalPublicationAffiliation Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW en_US
unsw.relation.originalPublicationAffiliation Abbass, Hussein, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW en_US
unsw.relation.originalPublicationAffiliation Lokan, Chris, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW en_US
unsw.relation.school School of Engineering and Information Technology *
unsw.thesis.degreetype PhD Doctorate en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
whole.pdf
Size:
4.28 MB
Format:
application/pdf
Description:
Resource type