DEEP OC Massive Online Data Streams

By DEEP-Hybrid-DataCloud Consortium | Created:

services, docker

License: Apache License 2.0

Build Status

This use case analyzes online data streams in order to generate alerts with time-bounded constrains and in real-time. The main study is focused on building additional intelligent module using NN and DL techniques in co-function with underlying Intrusion Detection Systems (IDS) supervising traffic networks of compute centers. Preserving old data for historical purposes, security analysts will be able to supervise generated alerts and to enhance cyber security [1, 2] for such centers when large IT infrastructures and devices products a huge amount of data streaming continuously and dynamically.

The principle of the solution is proactive time-series prediction [5] adopting NNs as well as DL to build prediction models capable to predict next step(s) in near future based on given current and past steps. The discrepancy between the prediction and the reality gives an indication of anomaly (i.e. anomaly detection).

The challenge of the solution is it aims to scalable edge technologies [4] to support extensive data analysis and modelling as well as to improve the cyber-resilience by adopting an heuristic approach, that combines misuse detection in real-time with the building intelligence module using NN and DL.

Current modelling approach using DL techniques [3]: LSTM (vanilla, stacked, bidirectional, seq2seq encoder/decoder), GRU, CNN, and MLP


[1]: Bhattacharyya, D.K. and Kalita, J.K., 2013. Network anomaly detection: A machine learning perspective. Chapman and Hall/CRC.

[2]: Dua, S. and Du, X., 2016. Data mining and machine learning in cybersecurity. Auerbach Publications.

[3]: Yann LeCun, Yoshua Bengio, and Geofrey Hinton. Deep learning. Nature, 521(7553):436–444, may 2015.

[4]: Nguyen, G., Nguyen, B.M., Tran, D. and Hluchy, L., 2018. A heuristics approach to mine behavioural data logs in mobile malware detection system. Data & Knowledge Engineering, 115, pp.129-151.

[5]: Tran, N., Nguyen, T., Nguyen, B.M. and Nguyen, G., 2018. A Multivariate Fuzzy Time Series Resource Forecast Model for Clouds using LSTM and Data Correlation Analysis. Procedia Computer Science, 126, pp.636-645.

Run locally on your computer

Using Docker

You can run this module directly on your computer, assuming that you have Docker installed, by following these steps:

$ docker pull deephdc/deep-oc-mods
$ docker run -ti -p 5000:5000 deephdc/deep-oc-mods

Using udocker

If you do not have Docker available or you do not want to install it, you can use udocker within a Python virtualenv:

$ virtualenv udocker
$ source udocker/bin/activate
$ git clone
$ cd udocker
$ pip install .
$ udocker pull deephdc/deep-oc-mods
$ udocker create deephdc/deep-oc-mods
$ udocker run -p 5000:5000  deephdc/deep-oc-mods

Once running, point your browser to and you will see the API documentation, where you can test the module functionality, as well as perform other actions (such as training).

For more information, refer to the user documentation.

Run on our pilot e-Infrastructure

In order to execute this module in our pilot e-Infrastructure you would need to be registered in the DEEP IAM.
The following instructions make use of the orchent CLI. You would need to install and configure orchent as shown in this tutorial.

Mesos (CPU)

$ curl -o deep-oc-massive-online-data-streams.yml \
$ orchent deep-oc-massive-online-data-streams.yml  '{"rclone_conf"="...", "rclone_url"="...", "rclone_vendor"="...", "rclone_user"="...", "rclone_pass"="..."}'

Check the status of your job

$ orchent depshow <Deployment UUID>

Once its state is CREATE_COMPLETE, you will get the endpoint to access the service, e.g:

"endpoint": ""
Point your browser to the provided URL.

For more information, refer to the user documentation.