NGUYEN MINH DUY

@minhduy200316_20429
5.0
Developer
Programming & Querying: Python, SQL, R, Java, C/C++ Data Engineering & Orchestration: Azure Data Factory, Databricks, Apache Airflow, dbt, GitHub Actions Streaming & Real-Time Processing: Apache Kafka, Debezium CDC, Spark Structured Streaming Cloud & Lakehouse Platforms: Azure, Snowflake, Delta Lake, MinIO, MongoDB
Linh Trung Ward, Thu Duc City, Ho Chi Minh City Tham gia: 13/10/2025 22:45:53
Aspiring Data Engineer with hands-on experience in designing, building, and optimizing data pipelines and cloud-based data warehousing solutions. Proficient in developing scalable and reliable data infrastructures that integrate batch and real-time workflows, enabling advanced analytics, business intelligence, and data-driven decision making.

Quá Trình Học Tập

VNU-HCM University of Information Technology (UIT) Expected Graduation: 2025
Bằng cấp: Bachelor of Information Technology—Major in Information Systems
(Gpa: 3.1/4.0)

Kinh Nghiệm Làm Việc

Modern Data Stack Pipeline (Sep. 2025 - )
- Built a containerized real-time pipeline simulating banking transactions with PostgreSQL as the operational source.
- Implemented Change Data Capture (CDC) using Debezium and streamed events via Apache Kafka to MinIO as Parquet files.
- Orchestrated ingestion with Apache Airflow and applied the Medallion Architecture in Snowflake.
- Developed dbt staging, dimension, and fact models with tests and SCD Type-2 snapshots for history tracking.
- Automated validation and deployment using GitHub Actions (CI/CD) to ensure reliability and reproducibility.
Data Engineer
Azure E-Commerce ETL Pipeline & Analytics (Jun. 2025 - )
- Designed and developed a production-grade ETL pipeline on Azure using ADF, Databricks, Synapse, and PowerBI following the Medallion Lakehouse Architecture.
- Automated ingestion from MySQL, MongoDB, and HTTP/CSV APIs through dynamic, parameterized ADF pipelines.
- Transformed and optimized data in Databricks with schema enforcement, surrogate keys, partitioning, and Z-Ordering for query performance.
- Modeled a Star Schema with 1 Fact, 8 Dimensions, and a Bridge table to support analytical workloads.
- Exposed curated data via Synapse external tables & views, powering PowerBI Direct Query dashboards with 20+ KPIs for Sales, Customer Insights, and Logistics.
Data Engineer
Movie Data Warehouse (ETL & OLAP) (Jul. 2024 - )
- Designed a Star Schema with 1 Fact Table and 8 Dimension Tables derived from raw records.
- Built SSIS ETL pipelines processing data and enhancing data quality.
- Developed SSAS cubes with measures and MDX queries to support multi-dimensional analytics.
- Created reports using SSRS & PowerBI highlighting revenue trends, top-performing movies, and production distribution by country.
- Extended the project with a Python forecasting module, achieving accuracy in movie revenue prediction.
Data Engineer

0 Review