NiFi Architecture Overview
Apache NiFi is employed as a crucial data transformation tool within our cloud architecture, facilitating data integration for various processes. The NiFi cluster operates within a Kubernetes (K8s) environment that is expertly managed by our Site Reliability Engineering (SRE) team. This shared K8s cluster also supports other significant applications including Kafka and various microservices.
Recommendations for Improved Performance and Scalability
Infrastructure Enhancements
Azure Virtual Machines: For the NiFi cluster, the use of Azure F-series VMs is advised. These are compute-optimized VMs with a high CPU-to-memory ratio, ideal for compute-intensive applications like data processing.
Node Isolation: Given NiFi's resource-intensive nature, it should operate on dedicated nodes within the K8s cluster or have prioritized resource allocation over other applications to prevent resource contention during intensive data processing periods.
Architectural Improvements
Storage Affinity: Assign persistent volumes in close proximity to their respective NiFi pods to enhance I/O throughput, crucial for efficient data handling.
Process Segregation: Differentiate heavy batch processing tasks into distinct NiFi instances. This segregation helps manage load spikes and optimizes performance by distributing the processing load.
Network Segregation: Implement dedicated network interfaces for backend applications (like NiFi and Kafka) and frontend applications (microservices) to streamline traffic management and enhance performance.
Multi-Application K8s Cluster Strategy
Running multiple applications like NiFi, Kafka, and other microservices on the same K8s cluster in Azure provides a robust, scalable environment. This setup benefits from Kubernetes' ability to efficiently manage resources and maintain application isolation, crucial for ensuring that services do not interfere with each other's operations.
Running Kafka and Microservices
Kafka, which facilitates real-time data streaming, and various microservices can coexist effectively on the same Kubernetes cluster by utilizing namespace isolation and resource quotas to ensure efficient operation without impacting the performance of NiFi.
NiFiKop Operator Customizations
CPU and Memory Limits/Requests
Setting appropriate CPU and memory limits and requests is crucial for ensuring that the NiFi pods have enough resources to handle their workload without affecting other pods in the cluster.
In the Helm chart values file, you might adjust these settings as follows:
YAML
resources:
requests:
memory: "32Gi"
cpu: "16"
limits:
memory: "32Gi"
cpu: "16"
CPU Affinity
CPU affinity ties NiFi pods to specific cores, improving performance by reducing context switching due to CPU reassignments.
Example of setting CPU affinity in the Helm chart:
YAML
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
operator: In
values:
- "node-name"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- "nifi"
Storage Configuration
Fast and local storage can significantly improve I/O performance for NiFi, particularly when handling large volumes of data.
Example to use SSD storage in the Helm chart:
YAML
persistence:
enabled: true
storageClass: "fast-ssd"
accessMode: ReadWriteOnce
size: 200Gi
Configure JVM Parameters
To set up NiFi JVM parameters using the Nifikop operator, you can configure the nifi.yaml file to specify the desired JVM settings. Here's an example of how to do this:
nifi.yaml
YAML
apiVersion: <API version>
kind: NiFi
metadata:
name: my-nifi
spec:
component:
nifi:
config:
java:
args:
- "-Xms1024m"
- "-Xmx2048m"
- "-XX:NewSize=256m"
- "-XX:MaxNewSize=256m"
- "-XX:MetaspaceSize=128m"
- "-XX:MaxMetaspaceSize=128m"
- "-Djava.security.egd=file:/dev/./urandom"
In this example, we're setting the following JVM parameters:
-Xms1024m and -Xmx2048m to set the initial and maximum heap size to 1024MB and 2048MB, respectively.
-XX:NewSize=256m and -XX:MaxNewSize=256m to set the initial and maximum size of the young generation to 256MB.
-XX:MetaspaceSize=128m and -XX:MaxMetaspaceSize=128m to set the initial and maximum size of the metaspace to 128MB.
-Djava.security.egd=file:/dev/./urandom to specify the location of the entropy gatherer daemon (EGD) for secure random number generation.
Note: Make sure to adjust the values according to your specific requirements and available resources.
Applying the configuration
To apply this configuration, create a nifi.yaml file with the above content and then run the following command:
kubectl apply -f nifi.yaml
This will create a new NiFi instance with the specified JVM parameters.
Verifying the configuration
To verify that the JVM parameters have been applied, you can check the NiFi logs or use the NiFi REST API to retrieve the JVM settings. For example, you can use the following command to get the JVM arguments:
kubectl exec -it my-nifi -- <pod-id> print-jvm-args
This will print the JVM arguments, including the ones we specified in the nifi.yaml file.
I hope this helps! Let me know if you have any further questions.
Terraform/OpenTofu Deployment Script for AKS Using F1 Series VMs
Below is an example Terraform script to deploy an Azure Kubernetes Service (AKS) cluster using F1 series VMs. This assumes you already have a provider configured and you have the necessary permissions.
Hcl
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "aks_rg" {
name = "myResourceGroup"
location = "East US"
}
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "myAKSCluster"
location = azurerm_resource_group.aks_rg.location
resource_group_name = azurerm_resource_group.aks_rg.name
dns_prefix = "myakscluster"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_F1"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "production"
}
}
output "client_certificate" {
value = azurerm_kubernetes_cluster.aks_cluster.kube_config.0.client_certificate
}
output "kube_config" {
value = azurerm_kubernetes_cluster.aks_cluster.kube_config_raw
sensitive = true
}
This script sets up an AKS cluster with:
A resource group in East US.
A cluster named myAKSCluster with DNS prefix myakscluster.
A default node pool using Standard_F1 VMs for cost-efficient compute resources.
Monitoring NiFi
NiFi metrics can be monitored using Node Exporter, a Prometheus exporter, to collect and expose NiFi metrics in a format that Prometheus can scrape. Here's a step-by-step explanation:
Step 1: Install Node Exporter
Install Node Exporter on the same machine as NiFi. You can download the binary from the Prometheus website or use a package manager like apt-get or yum.
Step 2: Configure Node Exporter
Create a configuration file (e.g., node_exporter.yml) with the following content:
YAML
metrics_path: /metrics
scrape_interval: 10s
static_configs:
- targets: ['localhost:8080']
labels:
instance: 'nifi-instance'
This configuration tells Node Exporter to:
Expose metrics at the /metrics endpoint
Scrape metrics every 10 seconds
Target the NiFi instance running on localhost:8080 and label it as nifi-instance
Prometheus Node Exporter parameters
YAML
nodeExporter:
enabled: true
port: 9100
path: "/metrics"
params:
- "fs.nifi.metrics.path=/metrics"
- "fs.nifi.metrics.scrape_interval=10s"
- "fs.nifi.metrics.static_configs=- targets: ['localhost:8080']"
Here's an explanation of the configuration:
enabled: true enables the Node Exporter
port: 9100 specifies the port number for the Node Exporter
path: "/metrics" specifies the path for the metrics endpoint
params is a list of parameters to pass to the Node Exporter
fs.nifi.metrics.path=/metrics specifies the path for the NiFi metrics
fs.nifi.metrics.scrape_interval=10s specifies the scrape interval for NiFi metrics
fs.nifi.metrics.static_configs=- targets: ['localhost:8080'] specifies the static configuration for NiFi metrics, targeting the NiFi instance on localhost:8080
Once you've added this configuration to the values.yaml file, you can install or upgrade the
NiFi Helm chart using the following command:
helm install --values values.yaml nifi
Alternatively, you can also specify these parameters as command-line arguments when installing or upgrading the chart:
helm install --set nodeExporter.enabled=true --set nodeExporter.port=9100 --set nodeExporter.path=/metrics --set nodeExporter.params[0]="fs.nifi.metrics.path=/metrics" --set nodeExporter.params[1]="fs.nifi.metrics.scrape_interval=10s" --set nodeExporter.params[2]="fs.nifi.metrics.static_configs=- targets: ['localhost:8080']" nifi
This will enable the Node Exporter with the specified parameters, allowing Prometheus to scrape metrics from the NiFi instance.
Step 4: Start Node Exporter
Start Node Exporter with the configuration file:
node_exporter --config.file=node_exporter.yml
Step 5: Scrape Metrics with Prometheus
Configure Prometheus to scrape metrics from Node Exporter:
YAML
scrape_configs:
- job_name: 'nifi-metrics'
static_configs:
- targets: ['localhost:9100']
labels:
instance: 'nifi-instance'
Prometheus will now scrape metrics from Node Exporter, which exposes NiFi metrics.
Examples of NiFi Metrics
Some examples of NiFi metrics that can be monitored using Node Exporter and Prometheus include:
nifi_flowfile_queue_size: The number of FlowFiles in the queue
nifi_flowfile_processed_count: The number of FlowFiles processed
nifi_byte_count: The number of bytes transferred
nifi_error_count: The number of errors encountered
These metrics can be used to monitor NiFi performance, identify bottlenecks, and optimize workflows.
Visualizing Metrics with Grafana
Use Grafana to visualize the metrics collected by Prometheus. Create dashboards and charts to display NiFi metrics, such as:
FlowFile queue size over time
FlowFile processing rate
Byte transfer rate
Error rate
This provides a comprehensive view of NiFi performance and helps identify areas for optimization.
Comments