Загрузка...

Enhancing Performance: Optimizing Updates in Azure Synapse

Discover effective strategies to improve the speed of table updates in Azure Synapse Data Warehouse, especially with large datasets.
---
This video is based on the question https://stackoverflow.com/q/65337474/ asked by the user 'Steve Powell' ( https://stackoverflow.com/u/2439493/ ) and on the answer https://stackoverflow.com/a/65359118/ provided by the user 'Steve Powell' ( https://stackoverflow.com/u/2439493/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.

Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Very Slow Updates on tables

Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/licensing
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/by-sa/4.0/ ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/by-sa/4.0/ ) license.

If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Enhancing Performance: Optimizing Updates in Azure Synapse

When working with Azure Synapse Data Warehouse (formerly known as Azure DWH), users often face performance issues, particularly when executing operations such as updating tables. A common scenario involves updating a flag in multiple massive tables (sometimes exceeding a billion rows), which can lead to frustratingly slow performance. In this guide, we will unpack this problem and provide actionable solutions to optimize your update operations in Azure Synapse.

Understanding the Problem

A user reported significantly variable performance with updates on different tables:

Large Dataset Scenario: Tables with 1.8 billion rows updated in just 5 minutes, while smaller tables with 700 million rows took nearly an hour.

Column Store Architecture: All tables were using column stores, which generally should speed up operations since they flush entire columns.

Incremental Data Load: The application involved marking rows with a flag for processing, and a maintenance job updated these flags in the evening.

Despite operating in a quiet environment without competing queries, the performance of the updates did not align with expectations. The main question: why were these updates so slow?

Analyzing the Solution

Upon thorough examination, a key finding emerged: it was not replicating the existing behavior correctly. Here are the takeaways that can enhance the understanding of the problem and potential fixes.

Identifying the Root Cause

ROW_SENT_TO_EDW Flag: Many of the ODS tables had their ROW_SENT_TO_EDW flag set to 0, leading to updates on hundreds of millions of rows at once.

Transaction Logging: The slow performance was largely due to the transaction logging overhead associated with updating many rows, which escalated the time required for the operations.

Alternative Solutions to Improve Performance

The following strategies were explored to mitigate slow updates:

1. Switching to HEAPs

Impact of Storage Structures: Since performance was poorer on CCI (Clustered Column Index) tables compared to HEAP, consider switching to HEAP for certain tables.

2. Leveraging CTAS (Create Table As Select)

Create Table As Select: This method effectively refreshes rows and can be significantly quicker:

For CCI: 56 minutes for updates.

For HEAP: 1 hour and 35 minutes, still faster than direct updates.

3. Optimizing Resource Classes

Resource Class Differences: Utilizing different resource classes showed some performance variation. Even though the effect isn’t as significant as expected, trying out different classes might yield better results.

4. The Sneaky Full Table Update Method

Dropping and Adding Columns: Instead of updating the values, drop the column and add it back with a default value.

Time Efficiency: Dropping took approximately 15 to 20 seconds, and adding back was completed in around 1 second.

Execution Example

Here is the SQL snippet that illustrates dropping and re-adding the column:

[[See Video to Reveal this Text or Code Snippet]]

Final Thoughts

By understanding the underlying challenges related to transaction logging and utilizing creative methods to perform updates—such as switching to HEAP tables or by using CTAS techniques—you can significantly enhance operation speeds in Azure Synapse. Each case may be different, and testing various configurations is crucial in finding an optimal solution for your data processing needs.

Embracing these techniques can lead to more efficient data handling and improved performance of your Azure Synapse environment, allowing for smoother, quicker operations in your data warehousing processes.

Видео Enhancing Performance: Optimizing Updates in Azure Synapse канала vlogize
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять