I'm Nikolaos Tzimos

I'm Nikolaos Tzimos and i am Software Engineering.
I am excited about distributed streaming systems and machine learning algorithms. I am enthusiastic about designing scalable and distributed systems from scratch and i am also supporter of open source technologies and frameworks.
Let's start scrolling and learn more about me.

Scroll Down
TZ

About Me

Graduate of the School of Electrical and Computer Engineering of Technical University of Crete. I received my diploma (5 year curriculum) under the supervision of Antonios Deligiannakis.

I am excited about streaming distributed systems and machine learning algorithms. The last two years, I am actively working on Database Management Systems(DBMS), Distributed Streaming Engines, Distributed Machine Learning , and Analytics over Big Data. The whole idea boils down to the attempt of integrating machine learning algorithms into distributed streaming systems targeting to obtain the online/real-time version of machine learning algorithms, while providing high-performance and scalable systems capable of handling high-speed and distributed data streams. My latest work relied on Bayesian Networks and a special case of this the well-known Naive Bayes Classifier. In particular, we focus on learning parameters of Bayesian Networks using the least communication cost under the umbrella of continuous distributed model. We proposed an alternative approach using the Functional Geometric Monitoring(FGM) method to have the online maintenance of networks(parameters) in distributed streaming systems.


Contact Details

Nikolaos Tzimos
Greece, 56123
nikostzim12@gmail.com

Career

External Researcher

Technical University of Crete – Chania, Greece Feb 2023 - Jul 2023

  • Research on online structure learning of Bayesian Networks
  • Online maintenance of Bayesian Networks using Graphical Model Sketches.
  • Apache Flink and Sketches

Mandatory Military Service

Office of Research and Informatics January 2022 - November 2022

  • IT support
  • Server and Networking infrastructure maintenance
  • Computer maintenance and repair

Freelance Full Stack Engineer

Color Spare Parts THE UNION Ltd – Kozani, Greece Jun 2022 - Aug 2022

  • Database Managements Systems(DBMS)
  • Implementation of database system
  • Implementation of REST API

Education

School of ECE

B.A. Degree in Electrical and Computer Engineering

  • Technical University of Crete,Greece
  • Grade: 8.4/10
  • Grade: 9/10 (Computer Science)
  • Class Rank: 5.7%

Technologies

Below, we can see the frameworks, programming languages and tools I am familiar with.

Frameworks

flink spark storm hadoop kafka mysql postgresql mongodb cassandra

Programming Languages

java python scala Bash scripting c c++

Tools

git jupyter google cloud redis apache server apache tomcat nginx nodejs docker apache airflow apache nifi databricks sbt apache maven R R Spring

Interests

My interests are in the broad area of Big Data Management Systems, including distributed data stream processing, analytics over data streams, data synopses, approximate query processing, distributed machine learning algorithms, and Artificial Intelligence(AI). In addition, i have worked with the development and deployment of web-based application. In particular, my interests focus on six domains: Big Data Systems,Database Management Systems,Machine Learning,Artificial Intelligence,Web-Based Application Deployment and Development, and Algorithms. The following image summarizes all the concepts I’m dealing with.

A Few Of My Latest Projects

Distributed and Online Maintenance of Graphical Models

We implement a general, extensible and scalable system for the online maintenance of the well-known graphical model, the Bayesian Network(BNs), and a special of this the Naïve Bayes Classifier in Apache Flink platform. We focus on the learning parameters of the Bayesian Network using the Maximum Likelihood Estimation (MLE) algorithm. The first objective is to accurately estimate the joint probability distribution of the Bayesian Network while providing user-defined error guarantees. The second objective focuses on using the minimum communication cost and at the same time implementing a system capable of scaling and handling high-dimensional, distributed, high-throughput, and rapid data streams. To solve this problem there are two approaches. The first approach uses approximate distributed counters, we implement two types of distributed counters, the first type refers to the Randomized Counters(RC) and the second one refers to the Deterministic Counters(DC). The second approach is based on the use of the Functional Geometric Monitoring(FGM) method. The second approach resulted in an improvement of 100-1000x in communication cost over the maintenance of exact MLEs and an improvement of 10-100x in communication cost over the first approach, while providing estimates of joint probability distribution with nearly the same accuracy as obtained by exact MLEs.

Implementation of a General Method for Monitoring Arbitrary Queries

We integrate the Functional Geometric Monitoring (FGM) method in the Apache Flink platform. Functional Geometric Monitoring is a technique that can be applied to any monitoring problem in order to perform distributed and scalable monitoring with minimal communication cost. The FGM method is a method that is independent of the monitoring problem, to achieve this the method uses a problem-specific family of functions termed safe functions. Finally, the FGM method can be naturally adapted under adverse conditions of the monitoring problem such as very tight monitoring bounds and the presence of skew in the distribution of data among the distributed nodes.

[CCFD-RF] Real-Time Credit Card Fraudulent Detection

This is a project implementing a real-time fraud-detection system (FDS) for Credit Card Fraud Transactions using Adaptive Random Forest in Apache Spark platform. The increasing use of credit card in online transactions reflects the rise of a new, fast and easy way of interchange in modern world. Based on extensive data collected for online card payments in the United Kingdom, is expected that by 2026, the number of card payments per day will grow to 60 million. This new form of transaction involves the risk of fraudulent attacks and fishing attempts, therefore provokes the urgent need of developing fast and online FDS. This project suggests a new scalable system for detecting and monitoring online transactions using the latest Apache Spark processing engine, Structured Streaming, and implementing an Adaptive Ensemble Classification Method, Random Forest. Our goal is to design a learner, for extremely large datasets which do not fit in main memory, adapting to concept drift and evolving data. We show the effectiveness of our method on both synthetic and real world datasets and we manage to reach a 92% accuracy on average.

Random Sampling for Group-By Queries

This is a project implenting Random Sampling for Group-By Queries in Flink Platform. The goal of this project is to sample streaming tuples, answering effectively Single Aggregate along with a single group-by clause. This implementation is a two-pass algorithm which is devided into two phases. The first phase has been taken from the first job (pre-processing) and the second phase from the second job (reservoir sampling).

Web application deployment using Docker in the Google Cloud Platform

Web-based application using the Docker container. Development of user and cloud interfaces. The application was developed on the Google Cloud platform. Development of user authentication mechanism using the OAuth protocol with KEYROCK IDM service and development of proxy mechanism using the PEP-PROXY WILMA service for the protection of backend containers from unauthorized users. Finally, the development of a publish-subscribe (Pub-Sub) mechanism using the Orion Context Broker service. Design of REST APIs from scratch for the communication of services with backend containers.