Developing a complex detector and analyzer for combating Internet scams

Scams are an increasingly common threat nowadays. These unsolicited emails are usually sent to convince victims to willingly give up personal data and money. However, our results from the fall semester of 2020 showed that a modern spam detector is not efficient at filtering these emails. In response, QUADRON Cybersecurity Services proposed the KaraK system, a system that can filter scam emails more effectively and is also capable of educating users about scams to raise awareness. The tasks of the system include detecting, parsing and forwarding scam emails to a central component for analysis. The central component could provide intelligence to Internet Service Providers and law enforcement agencies for more effective response to scam campaigns. Our task in the PARIPA program is to design and develop parts of this system. In this semester, we developed the detection method of KaraK. Our designed detection method uses semantic analysis, including word embedding and machine learning. Specifically, we calculate a vector for each word in incoming emails using the Doc2Vec word embedding method. We then compute the distances from vectors of each keyword of malicious intent, the vectors of which are also derived using word embedding. We use the first five smallest distances as an incoming email’s features and use a Random Forest Classifier to detect scam emails. The accuracy of the Random Forest Classifier for individual emails was around 96%. We also tested our trained model with a few longer scamming email streams procured from the 419eater forum. Unfortunately, these tests revealed that rerunning the feature extraction and classification phases can yield vastly different result. While we do have a theory that a non-deterministic part of our feature extraction method may be causing the problem, we need to investigate the issue further. In the long run, our task remains the development of KaraK. We need to review and fix our feature extraction method. Our next plan is to implement our detection method as a plugin. Later on, we can work on developing other parts of KaraK.

Futóné Papp Dorottya

2021-07-12

Támogató: QUADRON Cybersecurity Services