Data Warehouse and Analytics Project

Data Warehouse and Analytics Project
June 29, 2023
DataWarehouse
SQLproject
ETLpipeline
DataEngineering

This project showcases a complete end-to-end solution for building a modern data warehouse using SQL Server, transforming raw data into business-ready insights. Designed as a portfolio project, it demonstrates my hands-on skills in data engineering, ETL pipeline development, and analytics.

๐Ÿ“Œ Overview

This repository presents a structured approach to data warehousing, adhering to the Medallion Architecture:

Project image

High Level Architecture

  • ๐ŸŸค Bronze Layer โ€“ Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
  • โšช Silver Layer โ€“ This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
  • ๐ŸŸก Gold Layer โ€“ Houses business-ready data modeled into a star schema required for reporting and analytics.

With data sourced from ERP and CRM systems (CSV format), I implemented ETL pipelines, created fact and dimension tables, and generated insights on customer behavior, product performance, and sales trends.

๐Ÿงฑ Features

  • ๐Ÿ—๏ธ Medallion Architecture applied in SQL Server
  • ๐Ÿ”„ ETL Pipelines for ingestion, cleaning, transformation, and modeling
  • ๐ŸŒ Star Schema Design (Fact & Dimension tables)
  • ๐Ÿ“Š Analytics-ready tables supporting reporting needs
  • ๐Ÿ“š Documentation for architecture, flow, and schema
  • ๐Ÿ’ก SQL-based reporting for key business metrics

๐Ÿ“‚ Project Structure

sql-data-warehouse-project/
โ”‚
โ”œโ”€โ”€ datasets/ # Raw CSV files from ERP and CRM
โ”‚
โ”œโ”€โ”€ docs/ # All documentation and visuals
โ”‚ โ”œโ”€โ”€ data_architecture_diagram.png # Visual representation of architecture
โ”‚ โ”œโ”€โ”€ data_catalog.md # Gold Layer: Tables and column descriptions
โ”‚ โ”œโ”€โ”€ etl.drawio # ETL flow diagram
โ”‚ โ”œโ”€โ”€ data_models.drawio # Star schema model
โ”‚ โ”œโ”€โ”€ naming-conventions.md # Standards for naming
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚ โ”œโ”€โ”€ bronze/ # Ingest raw data
โ”‚ โ”œโ”€โ”€ silver/ # Transform & clean data
โ”‚ โ”œโ”€โ”€ gold/ # Create business-level models
โ”‚
โ”œโ”€โ”€ tests/ # SQL validation and quality checks
โ”‚
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ requirements.txt # Tools & dependencies

๐Ÿงช Tech Stack & Tools

  • SQL Server Express โ€“ Backend DBMS
  • SSMS (SQL Server Management Studio) โ€“ GUI for development
  • Draw.io โ€“ Architecture and data modeling diagrams
  • Git & GitHub โ€“ Version control and collaboration
  • Notion โ€“ Project planning and documentation

๐Ÿ“Š Key Insights Delivered

  • ๐Ÿ“ˆ Customer Behavior Analysis
  • ๐Ÿ›’ Product Performance Metrics
  • ๐Ÿ’ฐ Sales Trend Insights

Each of these was enabled through optimised SQL queries over the Gold Layer, empowering stakeholders with actionable intelligence.

โœ… What I Learned

  • Real-world implementation of Medallion Data Architecture
  • Building efficient and reusable ETL processes
  • Hands-on practice with dimensional modeling
  • Writing performant and readable SQL queries for analytics
  • Importance of clear documentation and naming conventions

Diagrams

Data Flow Diagram

Data Flow Diagram

Data Flow Diagram

Data Integration

Data Integration Diagram

Data Integration Diagram

Data Model

Data Model

Data Model

ETL Methods

ETL Methods

ETL Methods

View the files and Project Here: LINK