Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this part of the tutorial, you set up another type of sample data: a real-time data stream of sample bus data that includes time series information about bus locations. You stream the sample data into an eventhouse, perform some transformations on the data, then create a shortcut to get the eventhouse data into the sample data lakehouse that you created in the previous section. Digital twin builder requires data to be in a lakehouse.
Create an eventhouse
- Browse to the workspace in which you want to create your tutorial resources. You must create all resources in the same workspace.
- Select + New item.
- In the Filter by item type search box, enter Eventhouse.
- Select the Eventhouse item.
- Enter Tutorial as the eventhouse name. A KQL database is created simultaneously with the same name.
- Select Create. When provisioning is complete, the eventhouse System overview page is shown.
Create the eventstream
In this section, you create an eventstream to send sample bus streaming data to the eventhouse.
Add source
Follow these steps to create the eventstream and add the Buses sample data as the source.
From the KQL databases pane in the eventhouse, select the new Tutorial database.
From the menu ribbon, select Get data and choose Eventstream > New eventstream.
Name your eventstream BusEventstream. When the eventstream is finished creating, it opens.
Select Use sample data.
In the Add source page, enter BusDataSource for the source name. Under Sample data, select Buses. Select Add.
When the new eventstream is ready, it opens in the authoring canvas.
Transform data
In this section, you add one transformation to the incoming sample data. This step casts the string fields ScheduleTime
and Timestamp
to DateTime type, and renames Timestamp
to ArrivalTime
for clarity. Timestamp fields need to be in DateTime format for digital twin builder (preview) to use them as time series data.
Follow these steps to add the data transformation.
Select the down arrow on the Transform events or add destination tile, then select the Manage fields predefined operation. The tile is renamed to ManageFields.
Select the edit icon (shaped like a pencil) on the MangeFields tile, which opens the Manage fields pane.
Select Add all fields. This action ensures that all fields from the source data are present through the transformation.
Select the Timestamp field. Toggle Change type to Yes. For Converted Type, select DateTime from the dropdown list. For Name, enter the new name of ActualTime.
Without saving, select the ScheduleTime field. Toggle Change type to Yes. For Converted Type, select DateTime from the dropdown list. Leave the name as ScheduleTime.
Now select Save.
The Manage fields pane closes. The ManageFields tile continues to display an error until you connect it to a destination.
Add destination
From the menu ribbon, select Add destination, then select Eventhouse.
Enter the following information in the Eventhouse pane:
Field Value Data ingestion mode Event processing before ingestion Destination name TutorialDestination Workspace Select the workspace in which you created your resources. Eventhouse Tutorial KQL Database Tutorial KQL Destination table Create new - Enter bus_data_raw as the table name Input data format Json Ensure that the box Activate ingestion after adding the data source is checked.
Select Save.
In the authoring canvas, select the ManageFields tile and drag the arrow to the TutorialDestination tile to connect them. This action resolves all error messages in the flow.
From the menu ribbon, select Publish. The eventstream now begins sending the sample streaming data to your eventhouse.
After a few minutes, the TutorialDestination card in the eventstream view displays sample data in the Data preview tab. You might need to refresh the preview a few times while you wait for the data to arrive.
Verify that the data table is active in your eventhouse. Go to your Tutorial KQL database and refresh the view. It now contains a table called bus_data_raw which contains data.
Transform the data using update policies
Now that your bus streaming data is in a KQL database, you can use functions and a Kusto update policy to further transform the data. The transformations that you perform in this section prepare the data for use in digital twin builder (preview), and include the following actions:
- Break apart the JSON field
Properties
into separate columns for each of its contained data items,BusStatus
andTimeToNextStation
. Digital twin builder doesn't have JSON parsing capabilities, so you need to separate these values before the data goes to digital twin builder. - Add column
StopCode
, which is a unique key representing each bus stop. The purpose of this step is just to complete the sample data set to support this tutorial scenario. Joinable entities from separate data sources must contain a common column that digital twin builder can use to link them together, so this step adds a simulated set of int values that matches theStop_Code
field in the static bus stops data set. In the real world, related data sets already contain some kind of commonality. - Create a new table called bus_data_processed that contains the transformed bus data.
- Enable OneLake availability for the new table, so that you can use a shortcut to access the data in your Tutorial lakehouse.
To run the transformation queries, follow these steps.
Select the Tutorial KQL database inside your eventhouse. From the menu ribbon, select Query with code, which opens the KQL query editor.
Copy and paste the following code into the query editor. Run each code block in order.
// Set columns .create-or-alter function extractBusData () { bus_data_raw | extend BusState = tostring(todynamic(Properties).BusState) , TimeToNextStation = tostring(todynamic(Properties).TimeToNextStation) , StopCode = toint(10000 + abs(((toint(BusLine) * 100) + toint(StationNumber)) % 750)) | project-away Properties }
// Create table .create table bus_data_processed (ActualTime:datetime, TripId:string, BusLine:string, StationNumber:string, ScheduleTime:datetime, BusState:string, TimeToNextStation:string, StopCode:int)
//Load data into table .alter table bus_data_processed policy update ``` [{ "IsEnabled": true, "Source": "bus_data_raw", "Query": "extractBusData", "IsTransactional": false, "PropagateIngestionProperties": true }] ```
// Enable OneLake availability .alter-merge table bus_data_processed policy mirroring dataformat=parquet with (IsEnabled=true, TargetLatencyInMinutes=5)
Tip
You can also enable OneLake availability for the new table through the UI instead of using code. Select the table and toggle on OneLake availability.
With the UI option, the default latency is 15 minutes to several hours, depending on the volume of data. To reduce the latency to five minutes, use the .alter-merge table command as shown in the previous code block.
Optionally, save the query tab as Bus data processing so you can identify it later.
A new table is created in your database called bus_data_processed. After a short wait, it begins to populate with the processed bus data.
Create lakehouse shortcut
Finally, create a shortcut that exposes the processed bus data in the Tutorial lakehouse, which holds sample data for digital twin builder (preview). This step is necessary because digital twin builder requires its data source to be a lakehouse.
Go to your Tutorial lakehouse (you created it earlier in part one, Upload contextual data). From the menu ribbon, select Get data > New shortcut.
Under Internal sources, select Microsoft OneLake. Then, choose the Tutorial KQL database.
Expand the list of Tables and check the box next to bus_data_processed. Select Next.
Review your shortcut details and select Create.
The bus_data_processed table is now available in your lakehouse. Verify that it contains data (this might take a few minutes).
Next, you use this lakehouse data as a source to build an ontology in digital twin builder.