Optimize Nifi workload in Azure with AKS (k8s) cluster & operator optimization

Chandan Kumar
May 6, 2024
6 min read

NiFi Architecture Overview

Apache NiFi is employed as a crucial data transformation tool within our cloud architecture, facilitating data integration for various processes. The NiFi cluster operates within a Kubernetes (K8s) environment that is expertly managed by our Site Reliability Engineering (SRE) team. This shared K8s cluster also supports other significant applications including Kafka and various microservices.

Recommendations for Improved Performance and Scalability

Infrastructure Enhancements

Azure Virtual Machines: For the NiFi cluster, the use of Azure F-series VMs is advised. These are compute-optimized VMs with a high CPU-to-memory ratio, ideal for compute-intensive applications like data processing.
Node Isolation: Given NiFi's resource-intensive nature, it should operate on dedicated nodes within the K8s cluster or have prioritized resource allocation over other applications to prevent resource contention during intensive data processing periods.

Architectural Improvements

Storage Affinity: Assign persistent volumes in close proximity to their respective NiFi pods to enhance I/O throughput, crucial for efficient data handling.
Process Segregation: Differentiate heavy batch processing tasks into distinct NiFi instances. This segregation helps manage load spikes and optimizes performance by distributing the processing load.
Network Segregation: Implement dedicated network interfaces for backend applications (like NiFi and Kafka) and frontend applications (microservices) to streamline traffic management and enhance performance.

Multi-Application K8s Cluster Strategy

Running multiple applications like NiFi, Kafka, and other microservices on the same K8s cluster in Azure provides a robust, scalable environment. This setup benefits from Kubernetes' ability to efficiently manage resources and maintain application isolation, crucial for ensuring that services do not interfere with each other's operations.

Running Kafka and Microservices

Kafka, which facilitates real-time data streaming, and various microservices can coexist effectively on the same Kubernetes cluster by utilizing namespace isolation and resource quotas to ensure efficient operation without impacting the performance of NiFi.

NiFiKop Operator Customizations

CPU and Memory Limits/Requests

Setting appropriate CPU and memory limits and requests is crucial for ensuring that the NiFi pods have enough resources to handle their workload without affecting other pods in the cluster.

In the Helm chart values file, you might adjust these settings as follows:

YAML

resources:
  requests:
    memory: "32Gi"
    cpu: "16"
  limits:
    memory: "32Gi"
    cpu: "16"

CPU Affinity

CPU affinity ties NiFi pods to specific cores, improving performance by reducing context switching due to CPU reassignments.

Example of setting CPU affinity in the Helm chart:

YAML

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        
   	     operator: In
          values:
          - "node-name"
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: "app"
          operator: In
          values:
          - "nifi"

Storage Configuration

Fast and local storage can significantly improve I/O performance for NiFi, particularly when handling large volumes of data.

Example to use SSD storage in the Helm chart:

YAML

persistence:
  enabled: true
  storageClass: "fast-ssd"
  accessMode: ReadWriteOnce
  size: 200Gi

Configure JVM Parameters

To set up NiFi JVM parameters using the Nifikop operator, you can configure the nifi.yaml file to specify the desired JVM settings. Here's an example of how to do this:

nifi.yaml

YAML

apiVersion: <API version>
kind: NiFi
metadata:
  name: my-nifi
spec:
  component:
    nifi:
      config:
        java:
          args:
            - "-Xms1024m"
            - "-Xmx2048m"
            - "-XX:NewSize=256m"
            - "-XX:MaxNewSize=256m"
            - "-XX:MetaspaceSize=128m"
            - "-XX:MaxMetaspaceSize=128m"
            - "-Djava.security.egd=file:/dev/./urandom"

In this example, we're setting the following JVM parameters:

-Xms1024m and -Xmx2048m to set the initial and maximum heap size to 1024MB and 2048MB, respectively.
-XX:NewSize=256m and -XX:MaxNewSize=256m to set the initial and maximum size of the young generation to 256MB.
-XX:MetaspaceSize=128m and -XX:MaxMetaspaceSize=128m to set the initial and maximum size of the metaspace to 128MB.
-Djava.security.egd=file:/dev/./urandom to specify the location of the entropy gatherer daemon (EGD) for secure random number generation.

Note: Make sure to adjust the values according to your specific requirements and available resources.

Applying the configuration

To apply this configuration, create a nifi.yaml file with the above content and then run the following command:

kubectl apply -f nifi.yaml

This will create a new NiFi instance with the specified JVM parameters.

Verifying the configuration

To verify that the JVM parameters have been applied, you can check the NiFi logs or use the NiFi REST API to retrieve the JVM settings. For example, you can use the following command to get the JVM arguments:

kubectl exec -it my-nifi -- <pod-id> print-jvm-args

This will print the JVM arguments, including the ones we specified in the nifi.yaml file.

I hope this helps! Let me know if you have any further questions.

Terraform/OpenTofu Deployment Script for AKS Using F1 Series VMs

Below is an example Terraform script to deploy an Azure Kubernetes Service (AKS) cluster using F1 series VMs. This assumes you already have a provider configured and you have the necessary permissions.

Hcl

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = "myResourceGroup"
  location = "East US"
}

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myakscluster"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_F1"
  }

  identity {
    type = "SystemAssigned"
  }

  tags = {
    Environment = "production"
  }
}

output "client_certificate" {
  value = azurerm_kubernetes_cluster.aks_cluster.kube_config.0.client_certificate
}

output "kube_config" {
  value = azurerm_kubernetes_cluster.aks_cluster.kube_config_raw
  sensitive = true
}

This script sets up an AKS cluster with:

A resource group in East US.
A cluster named myAKSCluster with DNS prefix myakscluster.
A default node pool using Standard_F1 VMs for cost-efficient compute resources.

Monitoring NiFi

NiFi metrics can be monitored using Node Exporter, a Prometheus exporter, to collect and expose NiFi metrics in a format that Prometheus can scrape. Here's a step-by-step explanation:

Step 1: Install Node Exporter

Install Node Exporter on the same machine as NiFi. You can download the binary from the Prometheus website or use a package manager like apt-get or yum.

Step 2: Configure Node Exporter

Create a configuration file (e.g., node_exporter.yml) with the following content:

YAML

metrics_path: /metrics
scrape_interval: 10s
static_configs:
  - targets: ['localhost:8080']
    labels:
      instance: 'nifi-instance'

This configuration tells Node Exporter to:

Expose metrics at the /metrics endpoint
Scrape metrics every 10 seconds
Target the NiFi instance running on localhost:8080 and label it as nifi-instance

Prometheus Node Exporter parameters

YAML

nodeExporter:
  enabled: true
  port: 9100
  path: "/metrics"
  params:
    - "fs.nifi.metrics.path=/metrics"
    - "fs.nifi.metrics.scrape_interval=10s"
    - "fs.nifi.metrics.static_configs=- targets: ['localhost:8080']"

Here's an explanation of the configuration:

enabled: true enables the Node Exporter
port: 9100 specifies the port number for the Node Exporter
path: "/metrics" specifies the path for the metrics endpoint
params is a list of parameters to pass to the Node Exporter
fs.nifi.metrics.path=/metrics specifies the path for the NiFi metrics
fs.nifi.metrics.scrape_interval=10s specifies the scrape interval for NiFi metrics
fs.nifi.metrics.static_configs=- targets: ['localhost:8080'] specifies the static configuration for NiFi metrics, targeting the NiFi instance on localhost:8080

Once you've added this configuration to the values.yaml file, you can install or upgrade the

NiFi Helm chart using the following command:

helm install --values values.yaml nifi

Alternatively, you can also specify these parameters as command-line arguments when installing or upgrading the chart:

helm install --set nodeExporter.enabled=true --set nodeExporter.port=9100 --set nodeExporter.path=/metrics --set nodeExporter.params[0]="fs.nifi.metrics.path=/metrics" --set nodeExporter.params[1]="fs.nifi.metrics.scrape_interval=10s" --set nodeExporter.params[2]="fs.nifi.metrics.static_configs=- targets: ['localhost:8080']" nifi

This will enable the Node Exporter with the specified parameters, allowing Prometheus to scrape metrics from the NiFi instance.

Step 4: Start Node Exporter

Start Node Exporter with the configuration file:

node_exporter --config.file=node_exporter.yml

Step 5: Scrape Metrics with Prometheus

Configure Prometheus to scrape metrics from Node Exporter:

YAML

scrape_configs:
  - job_name: 'nifi-metrics'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'nifi-instance'

Prometheus will now scrape metrics from Node Exporter, which exposes NiFi metrics.

Examples of NiFi Metrics

Some examples of NiFi metrics that can be monitored using Node Exporter and Prometheus include:

nifi_flowfile_queue_size: The number of FlowFiles in the queue
nifi_flowfile_processed_count: The number of FlowFiles processed
nifi_byte_count: The number of bytes transferred
nifi_error_count: The number of errors encountered

These metrics can be used to monitor NiFi performance, identify bottlenecks, and optimize workflows.

Visualizing Metrics with Grafana

Use Grafana to visualize the metrics collected by Prometheus. Create dashboards and charts to display NiFi metrics, such as:

FlowFile queue size over time
FlowFile processing rate
Byte transfer rate
Error rate

This provides a comprehensive view of NiFi performance and helps identify areas for optimization.

Optimize Nifi workload in Azure with AKS (k8s) cluster & operator optimization

NiFi Architecture Overview

Recommendations for Improved Performance and Scalability

Infrastructure Enhancements

Architectural Improvements

Multi-Application K8s Cluster Strategy

Running Kafka and Microservices

NiFiKop Operator Customizations

CPU and Memory Limits/Requests

CPU Affinity

Storage Configuration

Configure JVM Parameters

Applying the configuration

Verifying the configuration

Terraform/OpenTofu Deployment Script for AKS Using F1 Series VMs

Monitoring NiFi

Step 1: Install Node Exporter

Step 2: Configure Node Exporter

Prometheus Node Exporter parameters

Step 4: Start Node Exporter

Step 5: Scrape Metrics with Prometheus

Visualizing Metrics with Grafana

Recent Posts

Comments

Contact Us