Design Analytical Stores – Microsoft Azure Data Engineering Associate (DP-203) Study Guide

Create a Pipeline to Convert XLSX to Parquet – Data Sources and Ingestion

Posted on 2024-08-052024-08-05 by Benjamin Goodwin

Power Query This option bears a great resemblance to what you might find in Power BI. The same engine that runs Power BI likely runs the Power Query plug‐in. The feature provides an interface for viewing the data from a selected dataset. You can then run through some transformation ideas and see how the data…

Create an Azure Data Factory – Data Sources and Ingestion

Posted on 2024-02-172024-08-05 by Benjamin Goodwin

The provisioning of an Azure data factory is straightforward. The one item you have not seen before was on the Advanced tab: the Enable Encryption Using a Customer Managed Key check box. As you might have read in the text on that tab, the data stored in Azure Data Factory is encrypted by default using…

Configure Azure Synapse Analytics Data Hub SQL Pool Staging Tables – Data Sources and Ingestion

Posted on 2023-11-122024-08-05 by Benjamin Goodwin

FIGUER 3.48 Azure Synapse Analytics Data SQL database In addition to creating tables, you can also create external tables, external resources, views, stored procedures, and schemas, and implement security. All the features and capabilities of an Azure SQL database or a dedicated SQL pool are found. These kinds of database incur costs when idle; consider…

Source Control – Data Sources and Ingestion

Posted on 2023-10-282024-08-05 by Benjamin Goodwin

If you will be writing code or queries to manage your data analytics solution running on Azure Synapse Analytics, you should consider storing them in a repository. Source repositories like Azure DevOps and GitHub provide features like protection of losing the code, storage of change history and branching. After spending hours, days, and sometimes precious…

MANAGED PRIVATE ENDPOINTS – Data Sources and Ingestion

Posted on 2023-09-182024-08-05 by Benjamin Goodwin

In Exercise 3.3 you enabled managed virtual networking during the provisioning of the Azure Synapse Analytics workspace. This feature enables you to configure outbound workspace connectivity with products, applications, and other services that exist outside of the managed virtual network. When you click the Managed Private Endpoints link, you will see some existing private endpoints….

Design the Serving/Data Exploration Layer – Data Sources and Ingestion

Posted on 2023-06-302024-08-05 by Benjamin Goodwin

What is a serving/data exploration layer? Don’t confuse it with something called the servicing layer, which is common in a service‐oriented architecture (SOA). For an illustration of the serving layer, see Figure 3.13. The serving layer is one component of a larger architecture that includes a speed layer and batch layer. The Big Data architecture…

Configure an Azure Synapse Analytics Workspace Package – Data Sources and Ingestion

Posted on 2023-04-272024-08-05 by Benjamin Goodwin

FIGUER 3.42 Adding a workspace package in Azure Synapse Analytics FIGUER 3.43 Consuming a workspace package in Azure Synapse Analytics Configuring the workspace is a very powerful aspect of the platform. As long as your custom code runs with the default installed comments, you can run just about any computation. You are limited only by…

Configure an Azure Synapse Analytics Workspace with GitHub – Data Sources and Ingestion

Posted on 2023-03-042024-08-05 by Benjamin Goodwin

FIGUER 3.45 Azure Synapse Analytics configure GitHub FIGUER 3.46 Azure Synapse Analytics configure GitHub repository FIGUER 3.47 Azure Synapse Analytics configure GitHub saved You should not make public the repository where you store the Azure Synapse Analytics content, because it contains some sensitive information—for example, your Azure subscription number and the general configuration of your…

Configure Azure Synapse Analytics Data Hub with Azure Cosmos DB – Data Sources and Ingestion

Posted on 2023-02-222024-08-05 by Benjamin Goodwin

FIGUER 3.50 Azure Synapse Analytics Data connect Azure Cosmos DB You can use this feature to try out queries and discover what data you have in the container. Then use those findings to perform data transformations or gather business insights. Integration DatasetThe purpose of integration datasets is in its name. Integration datasets provide an interface…

Create an Azure Synapse Analytics Linked Service – Data Sources and Ingestion

Posted on 2022-12-302024-08-05 by Benjamin Goodwin

FIGUER 3.33 Azure Synapse Analytics External connections Linked services FIGUER 3.34 Azure Synapse Analytics Linked Azure SQL Database FIGUER 3.35 Azure Synapse Analytics linked services spark.sql.hive.metastore.version 0.13 spark.hadoop.hive.synapse.externalmetastore.linkedservice.name BrainjammerAzureSQL spark.sql.hive.metastore.jars /opt/hive-metastore/lib-0.13/:/usr/hdp/current/ hadoop-client/lib/ Upload the file using the File Upload text box in the Apache Spark configuration section, and then click Upload. Note that there must…