DATA PRACTICE

Practice #6 - BI Architecture : From Data Lake to Data Warehouse

From a strategic point of view, data is the central element of an organization: more data collected means more processing and storage space. Choosing a data lake allows to centralize all data before enriching and structuring it. In the implementation of a modern BI architecture, the transition from a data lake to a data warehouse is the second step before data visualization.

In this article you will see :

  • The relationship between Data Lake and Data Warehouse
  • How to export data between them

I. The relationship between Data Lake and Data Warehouse

In Microsoft Azure, a Data Lake is dedicated solely to storage while a Data Warehouse is a space for storing, processing and transforming large volumes of data. These two spaces allow you to manage various functionalities related to Big Data analysis.

Figure 1 - Source(Dataedo / Data Cartoon)

Some benefits of a Data Lake (Example: Azure Data Lake Storage Gen 2):

  • Scaling and creating file-level security
  • Low cost hierarchical storage management
  • Follow-up of a file system semantics

In this post, we will focus on the relationship between Data Lake and Data Warehouse (Azure SQL DW). Here is an overview of the steps of a modern BI architecture:

Figure 2 - Source(Overview Data Lake Storage)

The different relationships between the two spaces:

Figure 3 - Source(Data Lake vs Data Warehouse)

Depending on the source of your data, several types of architecture are available. Among the most practical and modern are a combination of the three parts below:

  • Data Lake (Azure Data Lake Storage Gen2): Storage of data from different sources (ERP, CRM, etc...)
  • Data Warehouse (Azure Synapse): Data transformation and enrichment
  • Data Visualization (Power BI): Data analysis and visualization

Let's take an example to see how to export data from a Data Lake to a Data Warehouse (Azure Synapse Analytics).

II. How to export data from a data lake to a data warehouse

Case study: Our objective is to export data from a Data Lake (Azure Data Store Gen 2) to a Data Warehouse (Azure Synapse). The data goes from a semi-structured format (csv) to a structured format (table):

Here are the steps to follow to create a :

1.       In Azure Data Lake:

1.1. To start, open a new resource in Azure Services :

1.2 In the selected resource, select a "Storage account" and then a "Data Lake Storage":

1.3. In this Data Lake Storage, you will find the folders related to the export from Dynamics365 Finance&Operations. We will focus on the folder containing "tables":

1.4.  In this "Tables" folder, you will find the subfolders related to each functional area. We are going to look at the area related to the Supply Chain:

1.5.  Select the "VendTable" table which corresponds to the suppliers table:

1.6. To continue, you must edit the file to display the flat data (csv format):

1.7. Here is the semi-structured format in which the data is stored in the Data Lake:

2.       In Azure Data Synapse Analytics:

2.1. To start the transformation, we will choose the "SQL Pool Serverless" mode in Synapse Analytics :

2.2. For automatic creation of tables in the Synapse database, it is necessary to set up and run the CDMUtil module and use the two applications:

2.3. Here is a setting of the CDMUtil:

2.4. After running the CDMUtil module, the various tables in the Data Lake are visible in Synapse Studio :

2.5. We find the table "VendTable": :

2.6. To simplify table management, it is possible to work directly with SQL Server Management Studio. When you log in, you will find all the tables associated with the Data Lake :

2.7. We find the table "VendTable":

2.8. By displaying the contents of the table, you get structured data ready for visualization:

Conclusion:

In summary, we can say that Synapse (Data Warehouse) is the mirror of the Data Lake. Moreover, exporting data from a data lake to a data warehouse facilitates large-scale data management. After this step, users can move directly to data visualization and analysis in Power BI.

See more :

Thank you for reading,

Fodé DIABY

---

#dynamics365

#datalake

#datawarehouse

#azure

#businessintelligence

#architecture

#storage

#azuresynapse