Upload interface and ETL pipeline
Design of an upload interface, and development of an ETL pipeline, Tourism, November 2018 – March 2019
About the project
Design the data transfer objects which are used for uploading data Implement a REST service Development of an ETL pipeline from Kafka into an database for caching/analysis/reporting
- Design of data transfer objects (JSON)
- Implement a REST service using Akka which publishes uploaded data on Kafka
- Establish Spark streaming jobs from Kafka to Elasticsearch, Cassandra, PostgreSQL, and Parquet files
- Implement the Spark streaming apps as YARN jobs
- Design complex types (hierarchical nested types) suitable for each the back end
- Transform incoming data into records suitable for reporting according to the back ends
- Performance optimizations (Spark partitioning)
- Implement a distributed fare query engine using Akka
Roles
- System architect, development, integration, test
Industry, industrial sector
- Tourism, travel industry