From a strategic point of view, data is the central element of an organization: more data collected means more processing and storage space. Choosing a data lake allows to centralize all data before enriching and structuring it. In the implementation of a modern BI architecture, the transition from a data lake to a data warehouse is the second step before data visualization.
In this article you will see :
- The relationship between Data Lake and Data Warehouse
- How to export data between them
I. The relationship between Data Lake and Data Warehouse
In Microsoft Azure, a Data Lake is dedicated solely to storage while a Data Warehouse is a space for storing, processing and transforming large volumes of data. These two spaces allow you to manage various functionalities related to Big Data analysis.
Some benefits of a Data Lake (Example: Azure Data Lake Storage Gen 2):
- Scaling and creating file-level security
- Low cost hierarchical storage management
- Follow-up of a file system semantics
In this post, we will focus on the relationship between Data Lake and Data Warehouse (Azure SQL DW). Here is an overview of the steps of a modern BI architecture:
The different relationships between the two spaces:
Depending on the source of your data, several types of architecture are available. Among the most practical and modern are a combination of the three parts below:
- Data Lake (Azure Data Lake Storage Gen2): Storage of data from different sources (ERP, CRM, etc...)
- Data Warehouse (Azure Synapse): Data transformation and enrichment
- Data Visualization (Power BI): Data analysis and visualization
Let's take an example to see how to export data from a Data Lake to a Data Warehouse (Azure Synapse Analytics).
II. How to export data from a data lake to a data warehouse
Case study: Our objective is to export data from a Data Lake (Azure Data Store Gen 2) to a Data Warehouse (Azure Synapse). The data goes from a semi-structured format (csv) to a structured format (table):
Here are the steps to follow to create a :
1. In Azure Data Lake:
1.1. To start, open a new resource in Azure Services :
1.2 In the selected resource, select a "Storage account" and then a "Data Lake Storage":
1.3. In this Data Lake Storage, you will find the folders related to the export from Dynamics365 Finance&Operations. We will focus on the folder containing "tables":
1.4. In this "Tables" folder, you will find the subfolders related to each functional area. We are going to look at the area related to the Supply Chain:
1.5. Select the "VendTable" table which corresponds to the suppliers table:
1.6. To continue, you must edit the file to display the flat data (csv format):
1.7. Here is the semi-structured format in which the data is stored in the Data Lake:
2. In Azure Data Synapse Analytics:
2.1. To start the transformation, we will choose the "SQL Pool Serverless" mode in Synapse Analytics :
2.2. For automatic creation of tables in the Synapse database, it is necessary to set up and run the CDMUtil module and use the two applications:
2.3. Here is a setting of the CDMUtil:
2.4. After running the CDMUtil module, the various tables in the Data Lake are visible in Synapse Studio :
2.5. We find the table "VendTable": :
2.6. To simplify table management, it is possible to work directly with SQL Server Management Studio. When you log in, you will find all the tables associated with the Data Lake :
2.7. We find the table "VendTable":
2.8. By displaying the contents of the table, you get structured data ready for visualization:
Conclusion:
In summary, we can say that Synapse (Data Warehouse) is the mirror of the Data Lake. Moreover, exporting data from a data lake to a data warehouse facilitates large-scale data management. After this step, users can move directly to data visualization and analysis in Power BI.
See more :
- Data visualization : Link 01
- Data governance : Link 02
- Data modelling : Link 03
- BI Architecture (D365 to Data Lake) : Link 04
- Customer Insights : Link 05
- Analyzing data with Power BI : Link 06
Thank you for reading,
Fodé DIABY
---
#dynamics365
#datalake
#datawarehouse
#azure
#businessintelligence
#architecture
#storage
#azuresynapse