Nikos Tzimos

About Me

Graduate of the School of Electrical and Computer Engineering of Technical University of Crete. I received my diploma (5-year curriculum) under the supervision of Antonios Deligiannakis.

I have a strong interest in distributed data and streaming processing systems, as well as the design and implementation of end-to-end streaming and batching data pipelines, with a particular focus on cloud environments, though not limited to them. I am actively learning and working on Database Management Systems(DBMS), Distributed Data Processing Systems, Analytics over Big Data and last but least Distributed Machine Learning Algorithms. Currently, I am working as a Data Engineer around the Databricks Ecosystem on Azure Cloud Platform.

Regarding the machine learning algorithms, the whole idea boils down to the attempt of integrating machine learning algorithms into distributed streaming systems targeting to obtain the online/real-time version of machine learning algorithms, while providing high-performance and scalable systems capable of handling high-speed and distributed data streams. My latest work relied on Bayesian Networks and a special case of this, the well-known Naive Bayes Classifier. In particular, we focus on learning parameters of Bayesian Networks using the least communication cost under the umbrella of continuous distributed model. We proposed an alternative approach using the Functional Geometric Monitoring(FGM) method to have the online maintenance of networks(parameters) in distributed streaming systems.

Contact Details

Nikolaos Tzimos
Greece, 56123
nikostzim12@gmail.com

Download CV

Career

Implementation and optimization of end-to-end batching and streaming solutions on cloud infrastructure
Configuration and optimization of cloud resources, targeting in cost efficiency and optimal performance
Monitoring and troubleshooting of cloud resources
Testing and quality assurance
Microsoft Cloud Computing, Apache Spark, Databricks, DeltaLake, ReadyAPI

Research on online structure learning of Bayesian Networks
Online maintenance of Bayesian Networks using Graphical Model Sketches.
Apache Flink and Sketches

IT support
Server and Networking infrastructure maintenance
Computer maintenance and repair

Database Managements Systems(DBMS)
Implementation and Design of Database System(ER Modeling)
ETL - Cleansing and transformation of data
Connectivity to database using REST API

Education

Technical University of Crete, Greece
Grade: 8.4/10
Grade: 9/10 (Computer Science)
Class Rank: 5.7%

Certifications

Technologies

Below, we can see the frameworks, programming languages and tools I am familiar with.

Frameworks

Tools

Interests

My interests are in the broad area of Big Data Management Systems, including distributed data stream processing, analytics over data streams, data synopses, approximate query processing, and distributed machine learning algorithms. Additionally, I have worked with the development and deployment of web-based application. In particular, my interests focus on five domains: Big Data Systems, Database Management Systems, Database Architectures, Machine Learning, and Algorithms and Data Structures. The following image summarizes all the concepts I’m dealing with.

Distributed and Online Maintenance of Graphical Models

LINK PDF

Frameworks

We implement a general, extensible and scalable system for the online maintenance of the well-known graphical model, the Bayesian Network(BNs), and a special of this the Naïve Bayes Classifier in Apache Flink platform. We focus on the learning parameters of the Bayesian Network using the Maximum Likelihood Estimation (MLE) algorithm. The first objective is to accurately estimate the joint probability distribution of the Bayesian Network while providing user-defined error guarantees. The second objective focuses on using the minimum communication cost and at the same time implementing a system capable of scaling and handling high-dimensional, distributed, high-throughput, and rapid data streams. To solve this problem there are two approaches. The first approach uses approximate distributed counters, we implement two types of distributed counters, the first type refers to the Randomized Counters(RC) and the second one refers to the Deterministic Counters(DC). The second approach is based on the use of the Functional Geometric Monitoring(FGM) method. The second approach resulted in an improvement of 100-1000x in communication cost over the maintenance of exact MLEs and an improvement of 10-100x in communication cost over the first approach, while providing estimates of joint probability distribution with nearly the same accuracy as obtained by exact MLEs.

Implementation of a General Method for Monitoring Arbitrary Queries

LINK PDF

Architecture

Frameworks

We integrate the Functional Geometric Monitoring (FGM) method in the Apache Flink platform. Functional Geometric Monitoring is a technique that can be applied to any monitoring problem in order to perform distributed and scalable monitoring with minimal communication cost. The FGM method is a method that is independent of the monitoring problem, to achieve this the method uses a problem-specific family of functions termed safe functions. Finally, the FGM method can be naturally adapted under adverse conditions of the monitoring problem such as very tight monitoring bounds and the presence of skew in the distribution of data among the distributed nodes.

[CCFD-RF] Real-Time Credit Card Fraudulent Detection

LINK PDF

Architecture

Frameworks

This is a project implementing a real-time fraud-detection system (FDS) for Credit Card Fraud Transactions using Adaptive Random Forest in Apache Spark platform. The increasing use of credit card in online transactions reflects the rise of a new, fast and easy way of interchange in modern world. Based on extensive data collected for online card payments in the United Kingdom, is expected that by 2026, the number of card payments per day will grow to 60 million. This new form of transaction involves the risk of fraudulent attacks and fishing attempts, therefore provokes the urgent need of developing fast and online FDS. This project suggests a new scalable system for detecting and monitoring online transactions using the latest Apache Spark processing engine, Structured Streaming, and implementing an Adaptive Ensemble Classification Method, Random Forest. Our goal is to design a learner, for extremely large datasets which do not fit in main memory, adapting to concept drift and evolving data. We show the effectiveness of our method on both synthetic and real world datasets and we manage to reach a 92% accuracy on average.

Random Sampling for Group-By Queries

LINK

Architecture

Frameworks

This is a project implenting Random Sampling for Group-By Queries in Flink Platform. The goal of this project is to sample streaming tuples, answering effectively Single Aggregate along with a single group-by clause. This implementation is a two-pass algorithm which is devided into two phases. The first phase has been taken from the first job (pre-processing) and the second phase from the second job (reservoir sampling).

Web application deployment using Docker in the Google Cloud Platform

LINK

Architecture

Frameworks

Web-based application using the Docker container. Development of user and cloud interfaces. The application was developed on the Google Cloud platform. Development of user authentication mechanism using the OAuth protocol with KEYROCK IDM service and development of proxy mechanism using the PEP-PROXY WILMA service for the protection of backend containers from unauthorized users. Finally, the development of a publish-subscribe (Pub-Sub) mechanism using the Orion Context Broker service. Design of REST APIs from scratch for the communication of services with backend containers.

I'm Nikolaos Tzimos

About Me

Contact Details

Career

Data Engineer

External Researcher

Mandatory Military Service

Freelance Full Stack Engineer

Education

School of ECE

Certifications

Technologies

Frameworks

Programming Languages

Tools

Interests

A Few Of My Latest Projects

Distributed and Online Maintenance of Graphical Models

Implementation of a General Method for Monitoring Arbitrary Queries

[CCFD-RF] Real-Time Credit Card Fraudulent Detection

Random Sampling for Group-By Queries

Web application deployment using Docker in the Google Cloud Platform

Artificial Intelligenc(AI) - Searching and Genetic Algorithms

Approximate Distributed Counters

I'm Nikolaos Tzimos

About Me

Contact Details

Career

Data Engineer

External Researcher

Mandatory Military Service

Freelance Full Stack Engineer

Education

School of ECE

Certifications

Technologies

Frameworks

Programming Languages

Tools

Interests

A Few Of My Latest Projects

Distributed and Online Maintenance of Graphical Models

The basic architecture

The basic frameworks

Implementation of a General Method for Monitoring Arbitrary Queries

The basic architecture

The basic frameworks

[CCFD-RF] Real-Time Credit Card Fraudulent Detection

The basic archtecture

The basic frameworks

Random Sampling for Group-By Queries

The basic architecture

The basic technologies

Web application deployment using Docker in the Google Cloud Platform

The basic architecture

The basic frameworks

Artificial Intelligenc(AI) - Searching and Genetic Algorithms

The basic architecture

The basic technologies

Approximate Distributed Counters

The basic architecture

The basic technologies