My Samples
https://tinyurl.com/cbmctsamples
Delivery Hints
Lab Hints
Use the T symbol to send text into Notepad, then copy and paste from there into the various tools.
General Notes
Terminology
https://learn.microsoft.com/en-us/fabric/get-started/fabric-terminology
Warehouse or Lakehouse?
Lakehouses are for all types of data. The primary interaction with them will be through Spark notebooks and jobs or through BI tools. There is read-only access to them using a SQL Analytics Endpoint. Under the hood they are delta parquet files in a data lake.
Warehouses are for structured and semistructired data. The primary interaction will be through SQL. They are our traditional columnstore data warehouses, for Kimball style star- and snowflake-schema models. They support multi-table transactions (which Lakehouses do not). Under the hood they are also delta files.
https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-lakehouse-warehouse
Or EventHouse? Or Fabric SQL database? Or PowerBI Datamart?
Fabric SQL databases are our even-more-traditional databases for OLTP operations. The primary interactuon will be through SQL. Under the hood they are Azure SQL Database resources.
Eventhouses are for realtime and near-realtime event data. The primary interaction will be through KQL or SQL, though the document below does also mention no-code.
https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-data-store
Or…?
Azure Cache for Redis, Microsoft Dataverse, the Synapse Analytics data stores, and of course Azure SQL Database, Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostGres.
Sheesh.
Learning Path: Get started with Microsoft Fabric
OneLake
Note that there is only one OneLake per tenant. This is to avoid data silos.
Scala
Scala is not a "Java-based scripting language". It's a statically-typed compiled object-oriented programming language that uses the Java VM.
"In practice, most data engineering and analytics workloads are accomplished using a combination of PySpark and Spark SQL." [citation required]
Module: Work with Delta Lake tables in Microsoft Fabric
Review question 1
The answer they give is a horrible description of Delta Lake
From delta.io:
Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, Hive, Snowflake, Google BigQuery, Athena, Redshift, Databricks, Azure Fabric and APIs for Scala, Java, Rust, and Python. With Delta Universal Format aka UniForm, you can read now Delta tables with Iceberg and Hudi clients.
Optimize delta tables
https://delta.io/blog/2023-01-25-delta-lake-small-file-compaction-optimize/
Learning Path: Implement a data warehouse with Microsoft Fabric
Learning Path: Work with semantic models in Microsoft Fabric
Learning Path: Administer and govern Microsoft Fabric
Lab 05: Analyze data in a data warehouse
Coding Style
Note that using a "v" prefix in the view "vSalesByRegion" (a convention called Systems Hungarian) is poor practice.