Publication:
Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining

dc.contributor.advisor Abbass, Hussein en_US
dc.contributor.advisor Petraki, Eleni en_US
dc.contributor.author El-Fiqi, Heba en_US
dc.date.accessioned 2022-03-21T13:08:53Z
dc.date.available 2022-03-21T13:08:53Z
dc.date.issued 2013 en_US
dc.description.abstract Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. The identification of translator stylometry has many contributions in fields such as intellectual-property, education, and forensic linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient machine learning literature on the topic. Some authors even claimed that detecting who translated a piece of text is a problem with no solution; a claim we will challenge in this thesis. In this thesis, we evaluated the use of existing lexical measures for the translator stylometry problem. It was found that vocabulary richness could not identify translator stylometry. This encouraged us to look for non-traditional representations to discover new features to unfold translator stylometry. Network motifs are small sub-graphs that aim at capturing the local structure of a real network. We designed an approach that transforms the text into a network then identifies the distinctive patterns of a translator by employing network motif mining. During our investigations, we redefined the problem of translator stylometry identification as a new type of classification problems that we call Comparative Classification Problem (CCP). In the pair-wise CCP (PWCCP), data are collected on two subjects. The classification problem is to decide given a piece of evidence, which of the two subjects is responsible for it. The key difference between PWCCP and traditional binary problems is that hidden patterns can only be unmasked by comparing the instances as pairs. A modified C4.5 decision tree classifier, we call PWC4.5, is then proposed for PWCCP. A comparison between the two cases of detecting the translator using traditional classification and PWCCP demonstrated a remarkable ability for PWCCP to discriminate between translators. The contributions of the thesis are: (1) providing an empirical study to evaluate the use of stylistic based features for the problem of translator stylometry identification; (2) introducing network motif mining as an effective approach to detect translator stylometry; (3) proposing a modified C4.5 methodology for pair-wise comparative classification. en_US
dc.identifier.uri http://hdl.handle.net/1959.4/53020
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/au/ en_US
dc.subject.other Parallel Translations en_US
dc.subject.other Stylometry Analysis en_US
dc.subject.other Translator Stylometry Identification en_US
dc.subject.other Computational Linguistics en_US
dc.subject.other C4.5 en_US
dc.subject.other PWC4.5 en_US
dc.subject.other Social Network Analysis en_US
dc.subject.other Network Motifs en_US
dc.subject.other Machine learning en_US
dc.subject.other Pattern Recognition en_US
dc.subject.other Classification Algorithms en_US
dc.subject.other Paired Classification en_US
dc.subject.other Comparative Classification Problems en_US
dc.subject.other Decision Trees en_US
dc.title Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder El-Fiqi, Heba
dspace.entity.type Publication en_US
unsw.accessRights.uri https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi https://doi.org/10.26190/unsworks/16460
unsw.relation.faculty UNSW Canberra
unsw.relation.originalPublicationAffiliation El-Fiqi, Heba, Engineering & Information Technology, UNSW Canberra, UNSW en_US
unsw.relation.originalPublicationAffiliation Abbass, Hussein, Engineering & Information Technology, UNSW Canberra, UNSW en_US
unsw.relation.originalPublicationAffiliation Petraki, Eleni, Faculty of Arts and Design, University of Canberra en_US
unsw.relation.school School of Engineering and Information Technology *
unsw.thesis.degreetype PhD Doctorate en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
whole.pdf
Size:
4.41 MB
Format:
application/pdf
Description:
Resource type