A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

Dam, Hai Huong

doi:10.26190/unsworks/18102

Publication:

A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

dc.contributor.advisor	Abbass, Hussein	en_US
dc.contributor.advisor	Lokan, Chris	en_US
dc.contributor.author	Dam, Hai Huong	en_US
dc.date.accessioned	2022-03-22T09:12:04Z
dc.date.available	2022-03-22T09:12:04Z
dc.date.issued	2008	en_US
dc.description.abstract	Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.	en_US
dc.identifier.uri	http://hdl.handle.net/1959.4/38865
dc.language	English
dc.language.iso	EN	en_US
dc.publisher	UNSW, Sydney	en_US
dc.rights	CC BY-NC-ND 3.0	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/au/	en_US
dc.subject.other	Data mining	en_US
dc.subject.other	Action map	en_US
dc.subject.other	Classification	en_US
dc.subject.other	Data stream	en_US
dc.subject.other	Neural network	en_US
dc.subject.other	Noisy data	en_US
dc.subject.other	Non-stationary environment	en_US
dc.subject.other	Reinforcement learning	en_US
dc.subject.other	Rule-based system	en_US
dc.subject.other	Static environment	en_US
dc.subject.other	Stream data mining	en_US
dc.subject.other	Supervised learning	en_US
dc.subject.other	Distributed data mining	en_US
dc.subject.other	Dynamic environment	en_US
dc.subject.other	Ensemble learning	en_US
dc.subject.other	Evolutionary computation	en_US
dc.subject.other	Genetic algorithm	en_US
dc.subject.other	Knowledge discovery	en_US
dc.subject.other	Learning classifier system	en_US
dc.subject.other	Negative correlation learning	en_US
dc.title	A scalable evolutionary learning classifier system for knowledge discovery in stream data mining	en_US
dc.type	Thesis	en_US
dcterms.accessRights	open access
dcterms.rightsHolder	Dam, Hai Huong
dspace.entity.type	Publication	en_US
unsw.accessRights.uri	https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi	https://doi.org/10.26190/unsworks/18102
unsw.relation.faculty	UNSW Canberra
unsw.relation.originalPublicationAffiliation	Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Abbass, Hussein, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Lokan, Chris, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW	en_US
unsw.relation.school	School of Engineering and Information Technology	*
unsw.thesis.degreetype	PhD Doctorate	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: whole.pdf
Size:: 4.28 MB
Format:: application/pdf
Description:

Download

Resource type

Thesis

Publication: A scalable evolutionary learning classifier system for knowledge discovery in stream data mining

Files

Original bundle

Resource type

Publication:

A scalable evolutionary learning classifier system for knowledge discovery in stream data mining