All labs
Cloud Training LabsCase study

AWS Data Lake — student sandbox for 30 simultaneous learners

Per-student IAM scoping on a single AWS account, with Glue ETL and Athena queries that survive cohort-scale concurrency.

S3 → Glue Crawler → Glue Catalog → Glue ETL Job → Parquet → Athena
The brief

A corporate L&D team needed an AWS Data Lake hands-on lab for incoming engineers — one lab, 30 students, a single AWS account, no risk of one student wrecking another's environment. Off-the-shelf platforms didn't cover the Glue + Athena pipeline they wanted to teach.

What we built
  • Customer-managed IAM policy scoped to quicklabs-{username}-* resources, region-locked to us-west-2
  • Glue service role with explicit Deny on writes outside the student's own catalog namespace
  • Two Terraform modules — admin (IAM, workgroup) and student (S3, Glue, ETL job)
  • Athena workgroup per student, results bucket isolated, query history scoped
  • PySpark Glue ETL converting raw CSV → partitioned Parquet (snappy, by year)
  • Admin walkthrough for end-to-end verification + student handout for delivery
How it works
  1. 1Admin runs the IAM Terraform once per student — provisions user, role, policies, workgroup
  2. 2Student receives credentials and signs into a region-locked console
  3. 3Student runs the lab Terraform under their own credentials — proves the policy is correct
  4. 4Student exercises the pipeline: crawl raw, run ETL, crawl curated, query in Athena
  5. 5Cleanup is one Terraform destroy — IAM and infra teardown in one motion
What this proves
  • 30 students in one AWS account with zero cross-tenant access
  • Iteration loop: every IAM gap surfaced as a 403, fixed in the policy, re-applied — under five minutes per cycle
  • Total cost per cohort under $50 in AWS spend
  • Whole lab re-runs from a clean account in under 15 minutes

Tell us about your lab

Discovery call, no obligation. We’ll tell you whether to hire us, fork the repo, or use a SaaS — whichever fits.

We respect your privacy. No spam, ever.