Machine Learning Based Network Traffic Anomaly Detection

March 13, 2018

Modern computer threats and intrusion attacks are far more complicated than those seen in the past. In IOT devices, these threats and attacks become even more visible and predominant as the heterogeneous and distributed characters of the devices make conventional intrusion detection methodologies hard to deploy. [1] discusses the principal cyber threats for IoT devices like Denial of Service, Malware based attacks, data breaches and weakening parameters, enlists the security issues identified in IoT as per the Open Web Application Security Project (OWASP) and also highlights few of the past example attacks identified towards IoT. Detecting these threats requires new tools, which are able to capture the essence of their behavior, rather than looking for fixed signatures in the attacks. Anomaly detection algorithms, which are able to learn the normal behavior of systems and alert for abnormalities, with or without any prior knowledge on the system model, nor any knowledge on the characteristics of the attack, can be a key to handle such complexities.

The importance of anomaly detection is due to the fact that anomalies in data translate to significant (and often critical) actionable information in a wide variety of application domains.

This paper discusses the use of Machine Learning based Network Traffic Anomaly detection, to approach the challenges in securing devices and detect network intrusions.

Anomalies and Anomaly Detection

Anomaly detection [2] [3] is the “identification of items, events or observations which do not conform to an expected pattern or other items in a dataset”. Some examples of where anomaly detection is useful are provided below:

  • In IT Operations, to detect systems outages before they actually occur and proactively keep your depending services up and running to meet your business needs. E.g. - fault detection in safety critical systems
  • In Security, to detect anomalous behavior of entities to detect potential indicators for breaches before they occur. E.g. - intrusion detection for cyber-security, military surveillance for enemy activities
  • In Business Analytics, to spot customer churn or find patterns that indicate severe business impacts. E.g. - fraud detection for credit cards or insurance
  • In IoT, to find devices that suddenly turn into an unhealthy state or detect anomalies in sensor data that indicate potentially bad product usage.

Generally, there are two approaches to tackle the detection problem [4]. Most commercial intrusion detection systems employ some kind of signature-matching algorithms to detect malicious activities. Generally, such systems have very low false alarm rates and work very well in the event that the corresponding attack signatures are present, but they also have a potential drawback: missing signatures inevitably lead to undetected attacks. One of the common counter-measures taken by Trojans, for example, was to dynamically change their code signatures during run time, or during replication, making signature based tracking ineffective. Here, the second approach, termed anomaly detection, becomes more useful. Anomaly detection maps normal behavior to a baseline profile and tries to detect deviations. One of the ways to create a baseline profile can be using supervised learning which uses data instances from the past with pre-labeled intrusion instances to train a supervised learning model using supervised learning algorithms. [5] discusses the various machine learning techniques, the related classifiers and algorithms which can be used in intrusion detection systems.

The Role of Machine Learning and Artificial Intelligence

Anomaly based detection systems rely on artificial intelligent (AI) and machine learning (ML) to detect anomalies. The idea behind AI and ML is to make a machine capable of learning by itself and distinguish between normal and abnormal behavior on the system. The process of teaching a machine takes different forms; supervised, unsupervised and reinforcement learning.

  • Supervised learning (classification based) has the data instances labeled in the training phase. Several supervised learning algorithms like the k-nearest neighbor algorithm, decision tree algorithms (c4.5, ID3), Naïve Bayes classifier, Support Vector machines, Artificial Neural networks can be used to create supervised learning models.
  • Unsupervised learning has the data instances which are unlabeled. A prominent way for this learning technique is clustering. Some of the unsupervised learning algorithms are K-means clustering, Fuzzy clustering, Self Organizing maps.

Regardless of the means of teaching, a machine needs to be trained in order to be able to predict. Several ML algorithms have been used in intrusion detections. Most of the intrusion detection systems use a combination of algorithms to cluster sample data into groups, label them, and then use a classifier to train the intrusion detection systems to distinguish between these groups. Over the past, a lot of study has been conducted on the intrusion detection systems using various machine learning techniques. The study varies from the use of single classifier solutions where a single classifier algorithm is used to create a machine learning model to hybrid classifiers where a combination of more than one machine learning algorithm is used to improve the intrusion detection systems performance. [5] describes the various machine learning techniques, the classifiers, algorithms, the existing study done around creation of machine learning based intrusion detection systems based on single, hybrid and ensemble classifier design and also provides a statistical comparison of the classifier algorithms used in all these research work. Table IV and V in [5] statically compares the intrusion detection model related research using the single classifier and the hybrid classifier, the algorithms used and the performance accuracy results.

Anomaly Detection Approach for IOT devices and challenges

Anomaly based intrusion detection systems are said to be computing intensive systems. It requires a lot of processing power and memory to work fast especially if the system is a real time intrusion detection system. Using anomaly based detection in IoT is more challenging and harder than using it with non-IoT networks for several reasons.

  • Given the large number of IoT devices, attackers have more targets to attack than any time before.
  • IoT Devices are relatively easier to attack than traditional computers since their hardware capabilities in terms of processing and memory is very limited in a way that can render it difficult to use host-based intrusion detection systems.
  • IoT devices produce data of different structures and formats and communicate it over various types of networks including, the Internet, wireless sensory network (WSN), radio frequency identification (RFID), Bluetooth and many more.

Research has been conducted around these challenges and showcase the use of anomaly based detection approach to detect anomalies in IoT. Following are few of the IoT based anomaly detection solutions:

  • Artificial neural network[6] based approach which showcases the ability to detect anomalies in IoT and achieved 99.4% overall accuracy – The approach uses supervised ANN and trains a model using internet packet traces and showcases the ability to thwart DDoS/DoS attacks from an external user in a simulated IoT network and is able to demonstrate that using ANN algorithm, the model is able to successfully detect DDoS/DoS attacks against legitimate IoT traffic.
  • Two tier classifiers [7] based approach which showcases the ability to detect anomalies in IoT backbone networks and achieved 84.82%. – This intrusion detection approach uses two-layer dimension reduction (uses principal component analysis (PCA) and linear discriminate analysis (LDA) to reduce a high dimensioned dataset to a lower one with lesser features) and then applies two-tier classification module (Naïve Bayes and Certainty factor version of K-nearest Neighbor to identify suspicious behavior), and is designed to detect malicious activities such as user to root and remote to local attacks.

HSC’s role in ML based Anomaly Detection

HSC’s key areas of focus are:

  • Apply machine-learning approaches and algorithms to develop and train Anomaly Detection Models using ML
  • Deploy created models to Detect Network Intrusions, Service attacks, Rogue devices in the network
  • Provide Network intelligence and security insights to IOT Service Providers
  • Application of above ML based approach in verticals like Connected Cars, Home Automation, Industrial IOT


Works Cited

[1] "Internet of Things: How Much are We Exposed to Cyber Threats?," [Online]. Available:

[2] Wikipedia, "Anomaly Detection," [Online]. Available:

[3] V. Chandola, A. Banerjee and V. Kumar, "Anomaly Detection : A Survey," [Online]. Available:

[4] T. Sherasiya and H. Upadhyay, "Intrusion Detection System for Internet of Things," [Online]. Available:

[5] N. Haq, A. Onik, A. Hriaoy, M. Rafani, F. Shah and D. Farid, "Application of Machine Learning Approaches in Intrusion Detection System: A Survey," 2015. [Online]. Available:

[6] E. Hodo, X. Bellekens, A. Hamilton, P. Dubouilh, E. Iorkyase, C. Tachtatzis and R. Atkinson, "Threat analysis of IoT networks Using Artificial Neural Network Intrusion Detection System," [Online]. Available:

[7] H. Pajouh, R. Javidan, R. Khayami, D. Ali and K. Choo, "A Two-layer Dimension Reduction and Two-tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks," [Online]. Available: DOI: 10.1109/TETC.2016.2633228.

No Comments

Add Comment