I'm Nikolaos Tzimos

I'm Nikolaos Tzimos and I am Software Engineering.
I am excited about Distributed Data Processing Systems and Databases Systems
I am passionate about designing and implementing scalable and distributed data pipelines and systems
Least but not last, I am also interested in open source technologies and frameworks


Let's start scrolling and learn more about me.

Scroll Down
TZ

About Me

Graduate of the School of Electrical and Computer Engineering of Technical University of Crete. I received my diploma (5-year curriculum) under the supervision of Antonios Deligiannakis.

I have a strong interest in distributed data and streaming processing systems, as well as the design and implementation of end-to-end streaming and batching data pipelines, with a particular focus on cloud environments, though not limited to them. I am actively learning and working on Database Management Systems(DBMS), Distributed Data Processing Systems, Analytics over Big Data and last but least Distributed Machine Learning Algorithms. Currently, I am working as a Data Engineer around the Databricks Ecosystem on Azure Cloud Platform.

Regarding the machine learning algorithms, the whole idea boils down to the attempt of integrating machine learning algorithms into distributed streaming systems targeting to obtain the online/real-time version of machine learning algorithms, while providing high-performance and scalable systems capable of handling high-speed and distributed data streams. My latest work relied on Bayesian Networks and a special case of this, the well-known Naive Bayes Classifier. In particular, we focus on learning parameters of Bayesian Networks using the least communication cost under the umbrella of continuous distributed model. We proposed an alternative approach using the Functional Geometric Monitoring(FGM) method to have the online maintenance of networks(parameters) in distributed streaming systems.


Contact Details

Nikolaos Tzimos
Greece, 56123
nikostzim12@gmail.com

Career

Data Engineer

European Dynamics S.A – Greece Oct 2023 - Present

  • Implementation and optimization of end-to-end batching and streaming solutions on cloud infrastructure
  • Configuration and optimization of cloud resources, targeting in cost efficiency and optimal performance
  • Monitoring and troubleshooting of cloud resources
  • Testing and quality assurance
  • Microsoft Cloud Computing, Apache Spark, Databricks, DeltaLake, ReadyAPI

External Researcher

Technical University of Crete – Chania, Greece Feb 2023 - Jul 2023

  • Research on online structure learning of Bayesian Networks
  • Online maintenance of Bayesian Networks using Graphical Model Sketches.
  • Apache Flink and Sketches

Mandatory Military Service

Office of Research and Informatics January 2022 - November 2022

  • IT support
  • Server and Networking infrastructure maintenance
  • Computer maintenance and repair

Freelance Full Stack Engineer

Color Spare Parts THE UNION Ltd – Kozani, Greece Jun 2022 - Aug 2022

  • Database Managements Systems(DBMS)
  • Implementation and Design of Database System(ER Modeling)
  • ETL - Cleansing and transformation of data
  • Connectivity to database using REST API

Education

School of ECE

B.A. Degree in Electrical and Computer Engineering

  • Technical University of Crete, Greece
  • Grade: 8.4/10
  • Grade: 9/10 (Computer Science)
  • Class Rank: 5.7%

Certifications

Technologies

Below, we can see the frameworks, programming languages and tools I am familiar with.

Frameworks

spark flink storm hadoop kafka kafka mysql postgresql mongodb cassandra SQLite DuckDB

Programming Languages

java python scala Bash scripting c groovy

Tools

git jupyter google cloud databricks delta-lake Unity Catalog Apache Arrow apache airflow docker Polars Apache Arrow Spring sbt apache maven ReadyAPI

Interests

My interests are in the broad area of Big Data Management Systems, including distributed data stream processing, analytics over data streams, data synopses, approximate query processing, and distributed machine learning algorithms. Additionally, I have worked with the development and deployment of web-based application. In particular, my interests focus on five domains: Big Data Systems, Database Management Systems, Database Architectures, Machine Learning, and Algorithms and Data Structures. The following image summarizes all the concepts I’m dealing with.

A Few Of My Latest Projects

Distributed and Online Maintenance of Graphical Models

We implement a general, extensible and scalable system for the online maintenance of the well-known graphical model, the Bayesian Network(BNs), and a special of this the Naïve Bayes Classifier in Apache Flink platform. We focus on the learning parameters of the Bayesian Network using the Maximum Likelihood Estimation (MLE) algorithm. The first objective is to accurately estimate the joint probability distribution of the Bayesian Network while providing user-defined error guarantees. The second objective focuses on using the minimum communication cost and at the same time implementing a system capable of scaling and handling high-dimensional, distributed, high-throughput, and rapid data streams. To solve this problem there are two approaches. The first approach uses approximate distributed counters, we implement two types of distributed counters, the first type refers to the Randomized Counters(RC) and the second one refers to the Deterministic Counters(DC). The second approach is based on the use of the Functional Geometric Monitoring(FGM) method. The second approach resulted in an improvement of 100-1000x in communication cost over the maintenance of exact MLEs and an improvement of 10-100x in communication cost over the first approach, while providing estimates of joint probability distribution with nearly the same accuracy as obtained by exact MLEs.

Implementation of a General Method for Monitoring Arbitrary Queries

We integrate the Functional Geometric Monitoring (FGM) method in the Apache Flink platform. Functional Geometric Monitoring is a technique that can be applied to any monitoring problem in order to perform distributed and scalable monitoring with minimal communication cost. The FGM method is a method that is independent of the monitoring problem, to achieve this the method uses a problem-specific family of functions termed safe functions. Finally, the FGM method can be naturally adapted under adverse conditions of the monitoring problem such as very tight monitoring bounds and the presence of skew in the distribution of data among the distributed nodes.

[CCFD-RF] Real-Time Credit Card Fraudulent Detection

This is a project implementing a real-time fraud-detection system (FDS) for Credit Card Fraud Transactions using Adaptive Random Forest in Apache Spark platform. The increasing use of credit card in online transactions reflects the rise of a new, fast and easy way of interchange in modern world. Based on extensive data collected for online card payments in the United Kingdom, is expected that by 2026, the number of card payments per day will grow to 60 million. This new form of transaction involves the risk of fraudulent attacks and fishing attempts, therefore provokes the urgent need of developing fast and online FDS. This project suggests a new scalable system for detecting and monitoring online transactions using the latest Apache Spark processing engine, Structured Streaming, and implementing an Adaptive Ensemble Classification Method, Random Forest. Our goal is to design a learner, for extremely large datasets which do not fit in main memory, adapting to concept drift and evolving data. We show the effectiveness of our method on both synthetic and real world datasets and we manage to reach a 92% accuracy on average.

Random Sampling for Group-By Queries

This is a project implenting Random Sampling for Group-By Queries in Flink Platform. The goal of this project is to sample streaming tuples, answering effectively Single Aggregate along with a single group-by clause. This implementation is a two-pass algorithm which is devided into two phases. The first phase has been taken from the first job (pre-processing) and the second phase from the second job (reservoir sampling).

Web application deployment using Docker in the Google Cloud Platform

Web-based application using the Docker container. Development of user and cloud interfaces. The application was developed on the Google Cloud platform. Development of user authentication mechanism using the OAuth protocol with KEYROCK IDM service and development of proxy mechanism using the PEP-PROXY WILMA service for the protection of backend containers from unauthorized users. Finally, the development of a publish-subscribe (Pub-Sub) mechanism using the Orion Context Broker service. Design of REST APIs from scratch for the communication of services with backend containers.