Nifi Tuning

Raghavan Chockalingam
4 min readMar 26, 2023

--

Nifi comes with decent defaults for Processor settings. But these settings require tuning if one is after performance and their environment setup or use case could be different from a typical scenario.

Few important parameters:

  • Scheduling (Concurrent tasks, Schedule Interval)
  • Controller Threads
  • General Settings (Penalty Duration, Yield Duration)
  • Connection Settings (Backpressure)

Scheduling

Concurrent Tasks:

Default is 1 task or thread.

Number of threads requested to process Flow files in parallel. Nifi will try to allocate max number of threads if available. This can be set to >1.

Configure Concurrent Tasks

Run Schedule:

Default is 0s

It is the time between two executions of a processor. It can be set to 0s to prevent delay between any two executions.

There are situations when you may want this number higher. For example, a ConsumeKafkaRecord which can pull a batch of records (say 50) just in one execution. In this case, it is not required to continuously run this processor.

Controller Threads

Default: 5 (Event), 10 (Timer)

Ensure there are enough threads for configured concurrency levels (threads) across all processors and process groups. The Nifi instance should have access to enough CPU cores for the threads allocated.

General Settings

Default: 30s

A Flow file if not processed successfully due to a certain reason can be penalized. When penalized, the Flow file does not appear in the downstream queue for the penalized time period. 30scould be on the higher side if you would like the Flow file to be retried sooner. Also note that all failures may not be penalized and it is coded in the Processor’s logic.

Yield Duration

Default: 1s

When a processor could not successfully process a Flow file, the processor would like to temporarily stop working or yield. The value could be lower if one knows the underlying situation can resolve quicker.

A typical example why this feature exists: Setup where InvokeHTTP Processor that posts messages to an API server by taking messages from Kafka (messaging broker) topic. The API server contacted by InvokeHTTP processor is temporarily unavailable or overloaded, so the processor should backoff for fixed interval and try again.

Backpressure Settings

It is important fine tune backpressure settings so not to overload RAM required by Nifi to keep Flow files that are currently under processing or in the flow.

Above is an example where messages in Kafka topic are picked up by InvokeHTTP processor to call an API endpoint. Find below the configuration dialog for Connection (queue) for relationship, success

  • Default backpressure objects: 10000

> number of Flow files that can be held in the queue. When the Flow files count ≥ backpressure object count, the upstream processor would not be run

  • Default size threshold: 1GB

> size of all objects or Flow files that are held in the queue

But these defaults are sometimes higher. For example, when ConsumeKafka like processor can pull messages too quickly into the queue and the InvokeHTTP (the downstream processor) is slow to process. This results in unwanted use of RAM to keep the Flowfiles in queue where the messages could have been brought later from the Kafka topic.

Start with small threshold numbers but ensure there is enough work / load for downstream processor such as InvokeHTTP and is not waiting. One way to notice that, is by observing the thread count that shows up on the processor title bar. If no or less thread count that configured shows up, then the downstream processor is waiting for work.

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Raghavan Chockalingam
Raghavan Chockalingam

No responses yet

Write a response