Overprepared Craig's Courseware Notes: Course DP-600: Microsoft Fabric Analytics Engineer

My Samples

https://tinyurl.com/cbmctsamples

Delivery Hints

Lab Hints

Use the T symbol to send text into Notepad, then copy and paste from there into the various tools.

General Notes

Terminology

https://learn.microsoft.com/en-us/fabric/get-started/fabric-terminology

Experience, Workload

"Experience: A collection of capabilities targeted to a specific functionality. The Fabric experiences include Fabric Data Warehouse, Fabric Data Engineering, Fabric Data Science, Real-Time Intelligence, Data Factory, and Power BI."

"Workloads expand functionality in Fabric. Users with the relevant permissions can add workloads and make them available either to the entire tenant or to a specific capacity."

The key point is that while an Experience is similar to a Workload, and many have the same name, the two things are not the same. Sadly, the tools and documentation often misuse the two terms.

Workloads are collections of frontened and backend functions. The workloads installed in your Fabric account determine what items are available to create. In the Fabric admin center, use the Workloads item in the Navbar to see the Workloads that are currently installed. The Create item in the Navbar groups items by Workload (the +New Item button on the menubar does not do this). In item lists (for example in a Workspace) the Filter option at the top-right allows filtering by Workload. Hoever, some items are in different Workloads in the create screen and the filter screen (for example, Notebooks).

Experiences are marketing terms for functionality in Microsoft Fabric. As far as I can see, the term "Experience" is not used by the current Fabric admin center, just by the learn.microsoft.com articles.

Admin Centers

The Fabric admin center (app.fabric.microsoft.com) and the Power BI admin center (app.powerbi.com) are the same tool. The bottom button on the Navbar allows you to change between the two UIs.

Data Stores

Lakehouses are for all types of data. Under the hood they are delta parquet files in a data lake.

The primary interaction with them will be through Spark notebooks and jobs or through BI tools. There is read-only access to them using a SQL Analytics Endpoint. They have a default Semantic Model (though these are sunsetted) and you can create custom Semantic Models.

Schema (two-part name) support is currently in preview for Lakehouses.

Warehouses are for structured and semistructired data. They are our traditional columnstore data warehouses, for Kimball style star- and snowflake-schema models. They support multi-table transactions (which Lakehouses do not). Under the hood they are also delta parquet files.

The primary interaction will be through database tools and through SQL using a read-write SQL Endpoint (that does not appear as a separate item in the Fabric admin center). They have a default Semantic Model (though these are sunsetted) and you can create custom Semantic Models.

https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-lakehouse-warehouse

SQL Databases (in preview) are our even-more-traditional rowstore databases for OLTP operations. Under the hood they are Azure SQL Database resources.

The primary interaction will be through database tools. There is read-only access to them using a SQL Analytics Endpoint. They have a default Semantic Model (though these are sunsetted) and you can create custom Semantic Models.

Eventhouses are for realtime and near-realtime streaming and event data. The primary interaction will be through KQL. They support a small subset of SQL. An Eventhouse can contain multiple KQL Databases.

Datamarts are self-service analytics solutions, enabling users to store and explore data that is loaded in a fully managed database. They are deprecated and due to be removed in October 2025. Migrate to a Warehouse.

https://learn.microsoft.com/en-nz/fabric/get-started/decision-guide-data-store

Or…?

Azure Cache for Redis, Microsoft Dataverse, the Synapse Analytics data stores, and of course Azure SQL Database, Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostGres.

Sheesh.

Learning Path: Get started with Microsoft Fabric

OneLake

Note that there is only one OneLake per tenant. This is to avoid data silos.

Scala

Scala is not a "Java-based scripting language". It's a statically-typed compiled object-oriented programming language that uses the Java VM.

"In practice, most data engineering and analytics workloads are accomplished using a combination of PySpark and Spark SQL." ^{[citation required]}

Module: Work with Delta Lake tables in Microsoft Fabric

Review question 1

The answer they give is a horrible description of Delta Lake

From delta.io:
Delta Lake is an open-source storage framework that enables building a format agnostic Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, Hive, Snowflake, Google BigQuery, Athena, Redshift, Databricks, Azure Fabric and APIs for Scala, Java, Rust, and Python. With Delta Universal Format aka UniForm, you can read now Delta tables with Iceberg and Hudi clients.

Optimize delta tables

https://delta.io/blog/2023-01-25-delta-lake-small-file-compaction-optimize/

Overprepared Craig's Courseware Notes

Course DP-600: Microsoft Fabric Analytics Engineer