The Microsoft Fabric labs in Github are used by multiple courses, including DP-600 and DP-700. I've put my notes on the labs in this blog post rather than in the posts on those courses.
Lab Hints
Use the T symbol to send text into Notepad, then copy and paste from there into the various tools. Seriously. If you don't do this, autocorrect will mess your lab up.
General Notes
Links
https://github.com/MicrosoftLearning/mslearn-fabric
Note that DP-600 and DP-700 use different names for the labs. The names below are the names used in the Github source.
Gotchas
The New SQL Query button does not put the focus in the editor.
Case-sensitivity
New warehouses and all SQL analytics endpoints are configured based on the workspace's Data Warehouse default collation setting, which by default is the case-sensitive collation Latin1_General_100_BIN2_UTF8.
https://learn.microsoft.com/en-us/fabric/data-warehouse/collation
Lakehouses are always case-sensitive (boooooo!). You can use the COLLATE option in T-SQL queries for case-insensitive comparisons.
SQL Coding Style
The coding style in the labs is generally good (especially the "SELECT 1" in the SQL EXISTS subquery!) though I would like to see more use of the AS keyword.
However, using a "v" prefix in a SQL view (for example "vSalesByRegion") is poor practice. This naming convention is technically called Systems Hungarian and scornfully called "tibbling".
Python Coding Style
The lab almost always use backslashes to span Python code across multiple lines. The Python community generally recommends using parens.
https://stackoverflow.com/questions/53162/how-can-i-do-a-line-break-line-continuation-in-python-split-up-a-long-line-of
Imports
It seems to be a rule for the lab authors that every notebook cell has to import its own libraries. The worst of this is lab 6 where there are over 20 instances of from delta.tables import *. You do not have to do this. Once a library is imported in one cell in a Spark session, it is available to all the code running in that session.
Labs
Create a Microsoft Fabric Lakehouse
(01-lakehouse.md) - DP-600 lab 1, DP-700 lab 1
A very straightforward lab. You might spend longer on the capcha than on the lab itself. :-)
I ran the lab with schemas enabled. Nothing broke.
Analyze data with Apache Spark in Fabric
(02-analyze-spark.md) - DP-600 lab 4, DP-700 lab 2
Delivery hint: IL.
df.show(10) prints the first 10 rows to the console.
df.head(10) returns the first 10 rows as an array or list (it is an action).
df.limit(10) returns the first 10 rows as a new dataframe (it is a transformation). We usually use something like display(df.limit(10)).
Note that in Spark SQL you use backticks around identifiers containing nonstandard characters, not square brackets as in T-SQL.
"SELECT * FROM `lab 4 lh`.salesorders LIMIT 1000"
instead of
"SELECT * FROM [lab 4 lh].salesorders LIMIT 1000"
If you chose to use schemas in the Lakehouse, then you will have to modify one of the queries.
Instead of:
df = spark.sql("SELECT * FROM [your_lakehouse].salesorders LIMIT 1000")
Use:
df = spark.sql("SELECT * FROM [your_lakehouse].dbo.salesorders LIMIT 1000")
Use Delta Tables in Apache Spark
(03-delta-lake.md) - DP-700 lab 3
Delivery hint: IL.
Make sure you deselect the schemas option. If you don't then the SQL CREATE TABLE will fail.
Click Saved∨ to rename the notebook.
Create a medallion architecture in a Microsoft Fabric lakehouse
(03b-medallion-lakehouse.md) - DP-700 lab 6
Personal comment: Click here to get the lab to enter code. Repeat ad nauseum. Wooo.
Personal comment: The naming conventions in this code make my head hurt. Prefix, postfix, underscores, no underscores, lowercase, camelCase, PascalCase, arrgghh make up your mind!
If you chose to use schemas in the Lakehouse, then you will have to modify some of the queries.
But seriously, don't. It's not worth the effort of making all these changes.
Instead of:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.sales_silver") \
Use:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.dbo.sales_silver") \
Instead of:
deltaTable = DeltaTable.forPath(spark, 'Tables/sales_silver')
Use:
deltaTable = DeltaTable.forPath(spark, 'Tables/dbo/sales_silver')
Instead of:
SELECT YEAR(OrderDate) AS Year
, CAST (SUM(Quantity * (UnitPrice + Tax)) AS DECIMAL(12, 2)) AS TotalSales
FROM sales_silver
Use:
SELECT YEAR(OrderDate) AS Year
, CAST (SUM(Quantity * (UnitPrice + Tax)) AS DECIMAL(12, 2)) AS TotalSales
FROM dbo.sales_silver
Instead of:
SELECT TOP 10 CustomerName, SUM(Quantity) AS TotalQuantity
FROM sales_silver
Use:
SELECT TOP 10 CustomerName, SUM(Quantity) AS TotalQuantity
FROM dbo.sales_silver
Instead of:
df = spark.read.table("Sales.sales_silver")
Use:
df = spark.read.table("Sales.dbo.sales_silver")
Instead of:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.dimdate_gold") \
Use:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.dbo.dimdate_gold") \
Instead of:
deltaTable = DeltaTable.forPath(spark, 'Tables/dimdate_gold')
Use:
deltaTable = DeltaTable.forPath(spark, 'Tables/dbo/dimdate_gold')
Instead of:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.dimcustomer_gold") \
Use:
DeltaTable.createIfNotExists(spark) \
f.tableName("sales.dbo.dimcustomer_gold") \
Instead of:
dfdimCustomer_temp = spark.read.table("Sales.dimCustomer_gold")
Use:
dfdimCustomer_temp = spark.read.table("Sales.dbo.dimCustomer_gold")
Instead of:
deltaTable = DeltaTable.forPath(spark, 'Tables/dimcustomer_gold')
Use:
deltaTable = DeltaTable.forPath(spark, 'Tables/dbo/dimcustomer_gold')
Instead of:
DeltaTable.createIfNotExists(spark) \
tableName("sales.dimproduct_gold") \
Use:
DeltaTable.createIfNotExists(spark) \
tableName("sales.dbo.dimproduct_gold") \
Instead of:
dfdimProduct_temp = spark.read.table("Sales.dimProduct_gold")
Use:
dfdimProduct_temp = spark.read.table("Sales.dbo.dimProduct_gold")
Instead of:
deltaTable = DeltaTable.forPath(spark, 'Tables/dimproduct_gold')
Use:
deltaTable = DeltaTable.forPath(spark, 'Tables/dbo/dimproduct_gold')
Instead of:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.factsales_gold") \
Use:
DeltaTable.createIfNotExists(spark) \
.tableName("sales.dbo.factsales_gold") \
Instead of:
dfdimCustomer_temp = spark.read.table("Sales.dimCustomer_gold")
dfdimProduct_temp = spark.read.table("Sales.dimProduct_gold")
Use:
dfdimCustomer_temp = spark.read.table("Sales.dbo.dimCustomer_gold")
dfdimProduct_temp = spark.read.table("Sales.dbo.dimProduct_gold")
Instead of:
deltaTable = DeltaTable.forPath(spark, 'Tables/factsales_gold')
Use:
deltaTable = DeltaTable.forPath(spark, 'Tables/dbo/factsales_gold')
Ingest data with a pipeline in Microsoft Fabric
(04-ingest-pipeline.md) - DP-600 lab 3, DP-700 lab 5
Make sure you enter a new unique connection name. Don't just accept the default (the URL) otherwise you will get a "Connection name already exists" error.
Note that the name for the Copy data activity is randomised. Mine was called "Copy_ind"; yours will be different. You can, of course, rename it.
Rant: Why do I have to save pipelines manually when notebooks autosave?
Create and use Dataflows (Gen2) in Microsoft Fabric
(05-dataflows-gen2.md) - DP-600 lab 2, DP-700 lab 4
Make sure you enter a new unique connection name. Don't just accept the default (the URL) otherwise you will get a "Connection name already exists" error.
You will need to remove the default destination (bottom right-hand corner of the screen) before you can add a custom one.
Analyze data in a data warehouse
(06-data-warehouse.md) - DP-600 lab 5, DP-700 lab 12
Delivery Hint: I can demo the last part in my work Fabric environment.
Load data into a warehouse using T-SQL
(06a-data-warehouse-load.md) - DP-600 lab 6, DP-700 lab 13
As an aside, I created a Lakehouse using schemas and did this lab using Staging.Sales. As expected, there were no issues.
Query a data warehouse in Microsoft Fabric
(06b-data-warehouse-query.md) - DP-600 lab 7
A very straightforward lab with a couple of gotchas.
The Sample Warehouse (NYC taxi data) is case-sensitive.
The columns in the Date dimension table are all varchar! I tried an ORDER BY Month and did not get the results I was expecting. This has to be a mistake, surely? The whole point of Month and MonthName columns is to get sorting and displaying correct.
Update: Since I wrote that comment, the sample data has been modified slightly. The Month column is still a varchar, but it contains "01" instead of "1". Better, I guess, but still not great
One of the queries includes WHERE D.Month = 1, which requires a CONVERT_IMPLICIT in the execution plan. I tried WHERE D.Month = '01' and the query executed in 1/3 the time!
Monitor a data warehouse in Microsoft Fabric
(06c-monitor-data-warehouse.md) - DP-600 lab 8, DP-700 lab 14
A very simple lab. Run a few queries, do nothing with them.
The Sample Warehouse (NYC taxi data) is case-sensitive.
Note the "Results:1" dropdown in the results pane when the results consists of more than one recordset.
Secure data in a data warehouse
(06d-secure-data-warehouse.md) - DP-600 lab 9, DP-700 lab 15
This lab sets up security features, but doesn't use any of them. Given Microsoft Fabric's reliance on Microsoft Entra identities, the ALH doesn't have the capability to create/use different users in this lab.
However, see the lab "Secure data access in Microsoft Fabric" below.
Get started with Real-Time Intelligence in Microsoft Fabric
(07-real-time-Intelligence.md) - DP-600 lab 14, DP-700 lab 7
The Real-Time Hub has changed since the lab instructions were written, as has the Stocks sample application. Lab update TBA.
Get started with data science in Microsoft Fabric
(08-data-science-get-started.md)
Explore data for data science with notebooks in Microsoft Fabric
(08a-data-science-explore-data.md)
Preprocess data with Data Wrangler in Microsoft Fabric
(08b-data-science-preprocess-data-wrangler.md)
Train and track machine learning models with MLflow in Microsoft Fabric
(08c-data-science-train.md)
Generate batch predictions using a deployed model in Microsoft Fabric
(08d-data-science-batch.md)
Ingest real-time data with Eventstream in Microsoft Fabric
(09-real-time-analytics-eventstream.md) - DP-700 lab 8
A quick lab that nicely illustrates a lot of concepts.
Ingest data with Spark and Microsoft Fabric notebooks
(10-ingest-notebooks.md)
(10-ingest-notebooks.md)
Use Activator in Fabric
(11-data-activator.md) - DP-700 Lab 11
Work with data in a Microsoft Fabric eventhouse
(12-query-data-in-kql-database.md) - DP-700 lab 9
Delivery hint: IL.
Another "click here to run code" lab, but with useful code and examples. Worth doing.
Get started with Real-Time Dashboards in Microsoft Fabric
(13-real-time-dashboards.md) - DP-700 Lab 10
Create and explore a semantic model
(14-create-a-star-schema-model.md)
(14-create-a-star-schema-model.md)
Create DAX calculations in Power BI Desktop
(14-create-dax-calculations.md) - DP-600 lab 10
A nice lab.
This lab does not use Microsoft Fabric, or any other cloud resource, just Power BI Desktop.
Note tab-completion.
Design scalable semantic models
(15-design-scalable-semantic-models.md) - DP-600 lab 11
Another nice lab. Some parts might need to be IL.
This lab does not use Microsoft Fabric, or any other cloud resource, just Power BI Desktop.
Work with model relationships
(15-work-with-model-relationships.md)
(15-work-with-model-relationships.md)
Design scalable semantic models
(15-design-scalable-semantic-models.md)
Create reusable Power BI assets
(16-create-reusable-power-bi-assets.md) - DP-600 lab 12
Note that if you modify a template, no changes are made to existing reports. There is no "linking" between a report and a template like there is for styles in Microsoft Word templates. Creating an item from a template is a one-time load. As is using Import Theme.
Use tools to optimize Power BI performance
(16-use-tools-to-optimize-power-bi-performance.md)
(16-use-tools-to-optimize-power-bi-performance.md)
Enforce semantic model security
(17-enforce-model-security.md) - DP-600 lab 13
Note that the colours of the visualisations persist (light blue for Australia, dark blue for Canada, and so on).
As above, the ALH doesn't have the capability to create/use different users in this lab.
Monitor Fabric activity in the monitoring hub
(18-monitor-hub.md) - DP-700 lab 17
Secure data access in Microsoft Fabric
(19-secure-data-access.md) - DP-600 lab 15, DP-700 lab 18
A nice lab.
In the Apply item access control section, it takes a few minutes for the "You can no longer browse workspaces on the left navigation pane to find the warehouse." to apply. For that couple of minutes I could use the Workspaces item to see the workspace, but I could only see the Warehouse in the list of resources. Refreshing the web browser after a few minutes removed the workspace entirely from user-2's view, leaving OneLake Catalog the only way to see the warehouse.
Work with SQL Database in Microsoft Fabric
(20-work-with-database.md)
Work with API for GraphQL in Microsoft Fabric
(20a-work-with-graphql.md)
Implement deployment pipelines in Microsoft Fabric
(21-implement-cicd.md) - DP-700 lab 16
Work smarter with Copilot in Microsoft Fabric Dataflow Ge
(22a-copilot-fabric-dataflow-gen2.md)
Analyze data with Apache Spark and Copilot in Microsoft Fabric notebooks
(22b-copilot-fabric-notebooks.md)
Use Copilot in Microsoft Fabric data warehouse
(22c-copilot-fabric-data-warehouse.md)
Chat with your data using Microsoft Fabric data agents
(22d-copilot-fabric-data-agents.md)