Загрузка...

Handling HDFS File Movement in Apache NiFi: A Step-by-Step Guide to Use Flowfile Attributes

Learn how to utilize Apache NiFi’s `MoveHDFS` processor effectively to move files from HDFS to attribute-defined directories with our easy-to-follow guide.
---
This video is based on the question https://stackoverflow.com/q/59571396/ asked by the user 'alaskanloops' ( https://stackoverflow.com/u/2930025/ ) and on the answer https://stackoverflow.com/a/67324306/ provided by the user 'Vinay Annayya' ( https://stackoverflow.com/u/14210365/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.

Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: NiFi - move files in hdfs to a file directory attribute

Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/licensing
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/by-sa/4.0/ ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/by-sa/4.0/ ) license.

If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Handling HDFS File Movement in Apache NiFi: A Step-by-Step Guide to Use Flowfile Attributes

When working with Apache NiFi, moving files within HDFS can sometimes present challenges, particularly when trying to move files based on dynamically set attributes. One common scenario involves transferring parquet files from a specific directory in HDFS to another directory, potentially influenced by earlier processing steps. Here, we'll explore a potential solution to effectively move files in HDFS to directories defined by flowfile attributes.

The Problem

You might encounter instances where you're using the MoveHDFS processor in NiFi but find that your files are not directed to the intended destination. Instead of moving files to directories based on attributes, they end up in the root / directory. This confusion often stems from the processor's handling of the output directory, which supports Expression Language but requires files to be moved to a previously defined path, potentially leading to the problem at hand.

The question arises: Is there a way to use NiFi's built-in features to move HDFS files to dynamically set directories based on attributes?

The Solution

To navigate this challenge, you can adopt a multi-step approach that leverages Apache NiFi's processors effectively. Here’s a detailed breakdown of the steps you can follow:

Step 1: Move Files to a Temporary Location

Use MoveHDFS Processor:

Configure the MoveHDFS processor to move the target files to a temporary directory (let's call it path "X").

Important: The Input directory in the MoveHDFS processor can accept flowfile attributes, allowing it to respond dynamically.

Step 2: Fetch the Files

Connect to FetchHDFS:

After files are successfully moved, connect the success relationship of the MoveHDFS processor to a new processor: FetchHDFS.

Step 3: Configure FetchHDFS

Set HDFS Filename:

In the FetchHDFS processor, you will now define the HDFS Filename property.

Write the expression as ${absolute.hdfs.path}/${filename} to ensure the specific files are fetched correctly from temporary path X into the flowfile content.

Step 4: Move to Final Destination

Connect to PutHDFS:

Afterwards, connect the success relationship of the FetchHDFS processor to the PutHDFS processor.

Step 5: Final Configuration in PutHDFS

Set Directory Property:

In the PutHDFS, configure the directory property according to your needs to accept flowfile attributes dynamically. This is where the actual files will be moved.

Considerations

While this method is effective, be mindful of one downside:

Duplicate Copies: This approach creates a temporary duplicate copy of files during the transfer from the temporary location to the final destination. You will need to implement a separate flow to regularly delete these duplicates if they are not required for future use.

Conclusion

By following these organized steps, you can effectively move files from HDFS to dynamically set directories in Apache NiFi. This process not only allows for the use of flowfile attributes but also optimizes the file transfer workflow, easing the management of your data pipelines. So the next time you encounter issues with moving HDFS files using NiFi, remember this step-by-step method!

Видео Handling HDFS File Movement in Apache NiFi: A Step-by-Step Guide to Use Flowfile Attributes канала vlogize
Яндекс.Метрика

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять