Measurement data pipeline

March 2022

About the project

Build a pipeline to visualize data from manufacturing systems
- Manufacturing systems produce data for individual parts for each process step
Use Kafka as source to provide near-real-time streaming
Use Flink to implement the streaming application
- Parse incoming data, validate, transform
  - Stateful transformations
  - Real-time joins using event time semantics, watermarks, idleness handling
- Use checkpointing to prevent data loss in case of failures
- Handle errors gracefully, so that the pipeline continues to work
Use a Medallion architecture to store data
- Bronze tables to store data “as-is”
- Silver tables to provide an enterprise view and for self-service analytics
- Gold tables fully prepared for reports
Use Flink’s state broadcast
- To change the configuration of the transformations at run-time
- Without down time, without service interruption
Implement custom sink for InfluxDB
- To have bulk writes for high performance and throughput
Implement custom sink for Oracle
- Flink’s JDBC sink does not manage BLOBs correctly
Write output stream to different systems for inspection and analysis
- InfluxDB
- Oracle
- PostgreSQL
- Kafka
Visualize data
- Using Influx web UI
- Using Grafana
Write error stream to Kafka
- To provide feedback to the operators in case of failures
Use Flink/Flink SQL to implement batch jobs
- To remove old data in Oracle
- To import data regularly, using Kubernetes cronjobs
Implement tests for quality assurance and to prevent regressions
- Implement unit tests
- Implement integration tests using H2 and TestContainers
Provide docker-compose environment for a full development environment with all services
- ZooKeeper, Kafka
- PostgreSQL
- Oracle
- InfluxDB
- Grafana
- Flink job manager and task manager
Deployment to OpenShift 4 (Kubernetes)
- Test deployments to Minikube
- Test deployments to “CodeReady containers (CRC)” (OpenShift 4)
- Rancher test installation and evaluation
Use Kubernetes cronjobs to trigger Flink batch jobs

Measurement data pipeline

About the project

Roles

Industry, industrial sector

Stack overflow

Tags