Measurement data pipeline
March 2022
About the project
-
Build a pipeline to visualize data from manufacturing systems
- Manufacturing systems produce data for individual parts for each process step
-
Use Kafka as source to provide near-real-time streaming
-
Use Flink to implement the streaming application
- Parse incoming data, validate, transform
- Stateful transformations
- Real-time joins using event time semantics, watermarks, idleness handling
- Use checkpointing to prevent data loss in case of failures
- Handle errors gracefully, so that the pipeline continues to work
- Parse incoming data, validate, transform
-
Implement custom sink for InfluxDB
- To have bulk writes for high performance and throughput
-
Write output stream to different systems for inspection and analysis
- InfluxDB
- Oracle
- PostgreSQL
- Kafka
-
Visualize data
- Using Influx web UI
- Using Grafana
-
Write error stream to Kafka
- To provide feedback to the operators in case of failures
-
Implement tests for quality assurance and to prevent regressions
- Implement unit tests
- Implement integration tests using H2 and TestContainers
-
Provide docker-compose environment for a full development environment with all services
- ZooKeeper, Kafka
- PostgreSQL
- Oracle
- InfluxDB
- Grafana
- Flink job manager and task manager
-
Deployment to OpenShift 4 (Kubernetes)
- Development deployments to “CodeReady containers (CRC)” (OpenShift 4)
- Rancher test installation and evaluation
Roles
- System Architect, Data Architect, Data Engineer
- Design, development, integration, test, performance tuning
Industry, industrial sector
- Manufacturing systems engineering