Let's start scrolling
and learn more about me.
Graduate of the School of Electrical and Computer Engineering of Technical University of Crete. I received my diploma (5-year curriculum) under the supervision of Antonios Deligiannakis.
I have a strong interest in distributed data and streaming processing systems, as well as the design and implementation of end-to-end streaming and batching data pipelines, with a particular focus on cloud environments, though not limited to them. I am actively learning and working on Database Management Systems(DBMS), Distributed Data Processing Systems, Analytics over Big Data and last but least Distributed Machine Learning Algorithms. Currently, I am working as a Data Engineer around the Databricks Ecosystem on Azure Cloud Platform.
Regarding the machine learning algorithms, the whole idea boils down to the attempt of integrating machine learning algorithms into distributed streaming systems targeting to obtain the online/real-time version of machine learning algorithms, while providing high-performance and scalable systems capable of handling high-speed and distributed data streams. My latest work relied on Bayesian Networks and a special case of this, the well-known Naive Bayes Classifier. In particular, we focus on learning parameters of Bayesian Networks using the least communication cost under the umbrella of continuous distributed model. We proposed an alternative approach using the Functional Geometric Monitoring(FGM) method to have the online maintenance of networks(parameters) in distributed streaming systems.
Nikolaos Tzimos
Greece, 56123
nikostzim12@gmail.com
Below, we can see the frameworks, programming languages and tools I am familiar with.
My interests are in the broad area of Big Data Management Systems, including distributed data stream processing, analytics over data streams, data synopses, approximate query processing, and distributed machine learning algorithms. Additionally, I have worked with the development and deployment of web-based application. In particular, my interests focus on five domains: Big Data Systems, Database Management Systems, Database Architectures, Machine Learning, and Algorithms and Data Structures. The following image summarizes all the concepts I’m dealing with.
Proin gravida nibh vel velit auctor aliquet. Aenean sollicitudin, lorem quis bibendum auctor, nisi elit consequat ipsum, nec sagittis sem nibh id elit.
Apache Flink,Apache Kafka,Apache Hadoop,Apache Maven,Java,Python
We implement a general, extensible and scalable system for the online maintenance of the well-known graphical model, the Bayesian Network(BNs), and a special of this the Naïve Bayes Classifier in Apache Flink platform. We focus on the learning parameters of the Bayesian Network using the Maximum Likelihood Estimation (MLE) algorithm. The first objective is to accurately estimate the joint probability distribution of the Bayesian Network while providing user-defined error guarantees. The second objective focuses on using the minimum communication cost and at the same time implementing a system capable of scaling and handling high-dimensional, distributed, high-throughput, and rapid data streams. To solve this problem there are two approaches. The first approach uses approximate distributed counters, we implement two types of distributed counters, the first type refers to the Randomized Counters(RC) and the second one refers to the Deterministic Counters(DC). The second approach is based on the use of the Functional Geometric Monitoring(FGM) method. The second approach resulted in an improvement of 100-1000x in communication cost over the maintenance of exact MLEs and an improvement of 10-100x in communication cost over the first approach, while providing estimates of joint probability distribution with nearly the same accuracy as obtained by exact MLEs.
Apache Flink,Apache Kafka,Apache Hadoop,Apache Maven,Java,Python
We integrate the Functional Geometric Monitoring (FGM) method in the Apache Flink platform. Functional Geometric Monitoring is a technique that can be applied to any monitoring problem in order to perform distributed and scalable monitoring with minimal communication cost. The FGM method is a method that is independent of the monitoring problem, to achieve this the method uses a problem-specific family of functions termed safe functions. Finally, the FGM method can be naturally adapted under adverse conditions of the monitoring problem such as very tight monitoring bounds and the presence of skew in the distribution of data among the distributed nodes.
Apache Spark,Apache Kafka,Apache Hadoop,Java,Scala,Sbt
This is a project implementing a real-time fraud-detection system (FDS) for Credit Card Fraud Transactions using Adaptive Random Forest in Apache Spark platform. The increasing use of credit card in online transactions reflects the rise of a new, fast and easy way of interchange in modern world. Based on extensive data collected for online card payments in the United Kingdom, is expected that by 2026, the number of card payments per day will grow to 60 million. This new form of transaction involves the risk of fraudulent attacks and fishing attempts, therefore provokes the urgent need of developing fast and online FDS. This project suggests a new scalable system for detecting and monitoring online transactions using the latest Apache Spark processing engine, Structured Streaming, and implementing an Adaptive Ensemble Classification Method, Random Forest. Our goal is to design a learner, for extremely large datasets which do not fit in main memory, adapting to concept drift and evolving data. We show the effectiveness of our method on both synthetic and real world datasets and we manage to reach a 92% accuracy on average.
Apache Flink,Apache Kafka,Apache Maven,Java
This is a project implenting Random Sampling for Group-By Queries in Flink Platform. The goal of this project is to sample streaming tuples, answering effectively Single Aggregate along with a single group-by clause. This implementation is a two-pass algorithm which is devided into two phases. The first phase has been taken from the first job (pre-processing) and the second phase from the second job (reservoir sampling).
Docker,HTML,CSS,JavaScript,jQuery,Express,nodejs,MySQL,mongodb,Google Cloud Platform
Web-based application using the Docker container. Development of user and cloud interfaces. The application was developed on the Google Cloud platform. Development of user authentication mechanism using the OAuth protocol with KEYROCK IDM service and development of proxy mechanism using the PEP-PROXY WILMA service for the protection of backend containers from unauthorized users. Finally, the development of a publish-subscribe (Pub-Sub) mechanism using the Orion Context Broker service. Design of REST APIs from scratch for the communication of services with backend containers.