About the project
- Migration of an ETL pipeline into AWS
- Batch pipelines to fill a datalake
- Using different zones: landing, structured, processed
- Delta updates
- Using AWS Step Functions
- Import from
- Export to
- Metadata configuration using the Dhall configuration language
- Generate case classes to represent records
- To get type safety
- To provide test data as code
- To simplify comparison in unit tests
- To provide quick information (type, nullable, comment, precision/scale, …) for attributes
- To get constants for attribute names instead of hard-coded values
- Generate SQL scripts
- To create tables in PostgreSQL
- Infrastructure-as-code using Terraform
- Using AWS Athena for interactive queries
- Create charts using Matplotlib and Leaflet/Folium to visualize reporting data
- Create charts using Mermaid to visualize data flows
Roles
- System Architect, Data Architect, Data Engineer
- Design, development, integration, test, performance tuning
Industry, industrial sector
- Transportation, energy management
Tags
etl
apache-spark
scala
sql
sbt
dhall
python
pyspark
cloud
amazon-web-services
amazon-athena
amazon-emr
amazon-rds
amazon-s3
aws-cli
aws-lambda
aws-step-functions
postgresql
terraform
data-visualization
matplotlib
folium
leaflet
pandas
tdd
bdd
scalatest
mermaid