Analytics
AWS Glue
Serverless data integration & ETL.
Official docsOverview
Glue crawls data, populates a Data Catalog and runs Spark-based ETL jobs to transform S3/RDS/Redshift data.
When to use it
- Building data lakes
- Schema discovery via crawlers
- Batch ETL
Setup
- Create database in Data Catalog → run a crawler against S3.
- Author job in Studio or PySpark script.
- Schedule via triggers or Step Functions.
How to use
Run job
aws glue start-job-run --job-name qa-etlQA use cases
- Generate masked datasets in S3 nightly for QA databases.
