Nifi Tuning

Nifi comes with decent defaults for Processor settings. But these settings require tuning if one is after performance and their environment setup or use case could be different from a typical scenario.
Few important parameters:
- Scheduling (Concurrent tasks, Schedule Interval)
- Controller Threads
- General Settings (Penalty Duration, Yield Duration)
- Connection Settings (Backpressure)
Scheduling
Concurrent Tasks:
Default is 1 task or thread.
Number of threads requested to process Flow files in parallel. Nifi will try to allocate max number of threads if available. This can be set to >1.

Run Schedule:
Default is 0s
It is the time between two executions of a processor. It can be set to 0s
to prevent delay between any two executions.
There are situations when you may want this number higher. For example, a ConsumeKafkaRecord which can pull a batch of records (say 50) just in one execution. In this case, it is not required to continuously run this processor.

Controller Threads
Default: 5 (Event), 10 (Timer)
Ensure there are enough threads for configured concurrency levels (threads) across all processors and process groups. The Nifi instance should have access to enough CPU cores for the threads allocated.

General Settings
Default: 30s
A Flow file if not processed successfully due to a certain reason can be penalized. When penalized, the Flow file does not appear in the downstream queue for the penalized time period. 30s
could be on the higher side if you would like the Flow file to be retried sooner. Also note that all failures may not be penalized and it is coded in the Processor’s logic.

Yield Duration
Default: 1s
When a processor could not successfully process a Flow file, the processor would like to temporarily stop working or yield. The value could be lower if one knows the underlying situation can resolve quicker.
A typical example why this feature exists: Setup where InvokeHTTP Processor that posts messages to an API server by taking messages from Kafka (messaging broker) topic. The API server contacted by InvokeHTTP processor is temporarily unavailable or overloaded, so the processor should backoff for fixed interval and try again.

Backpressure Settings
It is important fine tune backpressure settings so not to overload RAM required by Nifi to keep Flow files that are currently under processing or in the flow.

Above is an example where messages in Kafka topic are picked up by InvokeHTTP processor to call an API endpoint. Find below the configuration dialog for Connection (queue) for relationship, success

- Default backpressure objects: 10000
> number of Flow files that can be held in the queue. When the Flow files count ≥ backpressure object count, the upstream processor would not be run
- Default size threshold: 1GB
> size of all objects or Flow files that are held in the queue
But these defaults are sometimes higher. For example, when ConsumeKafka like processor can pull messages too quickly into the queue and the InvokeHTTP (the downstream processor) is slow to process. This results in unwanted use of RAM to keep the Flowfiles in queue where the messages could have been brought later from the Kafka topic.
Start with small threshold numbers but ensure there is enough work / load for downstream processor such as InvokeHTTP and is not waiting. One way to notice that, is by observing the thread count that shows up on the processor title bar. If no or less thread count that configured shows up, then the downstream processor is waiting for work.
