Course DP-600: Microsoft Fabric Analytics Engineer

 

My Samples

https://tinyurl.com/cbmctsamples

 

Delivery Hints

 

Lab Hints

Use the T symbol to send text into Notepad, then copy and paste from there into the various tools.

 

General Notes

Terminology

https://learn.microsoft.com/en-us/fabric/get-started/fabric-terminology

Warehouse or Lakehouse?

Lakehouses are for all types of data. The primary interaction with them will be through Spark notebooks and jobs or through BI tools. There is read-only access to them using a SQL Analytics Endpoint. Under the hood they are delta parquet files in a data lake.

Warehouses are for structured and semistructired data. The primary interaction will be through SQL. They are our traditional columnstore data warehouses, for Kimball style star- and snowflake-schema models. They support multi-table transactions (which Lakehouses do not). Under the hood they are also delta files.

https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-lakehouse-warehouse

Or EventHouse? Or Fabric SQL database? Or PowerBI Datamart?

Fabric SQL databases are our even-more-traditional databases for OLTP operations. The primary interactuon will be through SQL. Under the hood they are Azure SQL Database resources.

Eventhouses are for realtime and near-realtime event data. The primary interaction will be through KQL or SQL, though the document below does also mention no-code.

https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-data-store

Or…?

Azure Cache for Redis, Microsoft Dataverse, the Synapse Analytics data stores, and of course Azure SQL Database, Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostGres.

Sheesh.

 

Learning Path: Get started with Microsoft Fabric

OneLake

Note that there is only one OneLake per tenant. This is to avoid data silos.

Scala

Scala is not a "Java-based scripting language". It's a statically-typed compiled object-oriented programming language that uses the Java VM.

"In practice, most data engineering and analytics workloads are accomplished using a combination of PySpark and Spark SQL." [citation required]

Module: Work with Delta Lake tables in Microsoft Fabric

Review question 1

The answer they give is a horrible description of Delta Lake

From delta.io:
Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, Hive, Snowflake, Google BigQuery, Athena, Redshift, Databricks, Azure Fabric and APIs for Scala, Java, Rust, and Python. With Delta Universal Format aka UniForm, you can read now Delta tables with Iceberg and Hudi clients.

Optimize delta tables

https://delta.io/blog/2023-01-25-delta-lake-small-file-compaction-optimize/

 

Learning Path: Implement a data warehouse with Microsoft Fabric

 

Learning Path: Work with semantic models in Microsoft Fabric

 

Learning Path: Administer and govern Microsoft Fabric

 

Lab 05: Analyze data in a data warehouse

Coding Style

Note that using a "v" prefix in the view "vSalesByRegion" (a convention called Systems Hungarian) is poor practice.