Troubleshooting
Check the issues below for common problems and suggested ways of handling them.
What to Do if Events Are Not Collected
NetApp 7-mode
-
In the relevant vfiler context in NetApp, run the command:
fpolicy show [whitebox_cifs_policy]
or[whitebox_nfs_policy]
depending on the application type. -
Verify that the Activity Monitor server is connected as an FPolicy server.
-
If the FPolicy server is registered, simulate some activity and run the command again.
-
Look at the counters at the end of the output of the command. They should increase.
-
If they don't increase, there might be an issue with the definition of the included volumes.
-
If the name of the included volume is incorrect, no events will be sent by NetApp.
-
-
If the Activity Monitor is not registered as an FPolicy server, stop the activity monitor service, wait for 60 seconds, and then start the activity monitor service again.
Note
In some cases, it might take a while for NetApp to de-register the FPolicy server in case of an error.
-
Run the command again and make sure the FPolicy server is registered.
-
If the FPolicy server is not registered, verify the following:
-
The Activity Monitor service is running with a domain user who is a local administrator on the server running the Activity Monitor.
-
The user running the Activity Monitor service is a member of the Backup Operators local group on the filer/vfiler.
-
The Activity Monitor server is in the same domain as the server running the Activity Monitor service.
-
The clocks of the server running the Activity Monitor and the NetApp clock are accurate within 5 minutes. A larger difference might cause the RPC Kerberos authentication process to fail.
-
There is no firewall between the NetApp and the server running the Activity Monitor, and the Windows Firewall is off on the server running the Activity Monitor.
-
-
If all prerequisites are set, look for errors in the Activity Monitor logs indicating connection issues with the FPolicy server. Also, check for messages in the NetApp logs indicating whether the FPolicy server failed to register or disconnected after a while.
-
If there are authenticated failures in the Activity Monitor/Permission Collector Logs to the Ontapi API:
-
Ensure all the prerequisites listed in the Permissions section were configured correctly.
-
If the Activity Monitor seems to connect to the NetApp but disconnects after a few seconds, check if the NetApp filer and Activity Monitor server use the same SMB version.
Note
SMB1 is no longer supported on most modern systems. Use SMB2 or higher when possible.
-
Run the following command on the NetApp side to verify whether SMB2 is enabled:
options cifs.smb2.enable
-
If SMB2 is not enabled, run the following command to enable it:
options cifs.smb2.enable on
-
If for some internal reason you cannot are are not allowed to enable SMB2, use one of the following methods to enable SMB1 on the Windows side:
-
On Windows Server 2012 to 2016, run the following PowerShell command:
Set-SmbServerConfiguration -EnableSMB1Protocol $true
-
For Windows Server 2016 or lower, check the registry value SMB1 under:
HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters
-
If it exists and is set to 0, SMB1 is disabled. To enable it, set the value to 1.
-
-
NetApp Cluster Mode
If not all events are collected, perform the following steps:
-
Run the command:
fpolicy show-engine
-
Locate the line representing the FPolicy engine for the Vserver you are analyzing. Verify that the IP address of the FPolicy server matches the IP address of the server where the Activity Monitor is installed, and that the Server Status is connected.
-
If the Server Status is disconnected, run the following command:
fpolicy show-engine –node <node-name> -instance
This will indicate the reason for the disconnection.
-
If the disconnect reason is TCP failure, make sure the port configured in the Application configuration matches the port configured in the FPolicy configuration, and that the IP address of the external-engine configuration is the same as the IP address of the server running the Activity Monitor.
-
Verify that there is no firewall between the Activity Monitor server and the cluster nodes, and that the Windows firewall is off on the Activity Monitor server.
Note
The firewall should only be off during troubleshooting. Turning off the firewall should not be a permanent solution. Refer to the Physical Filer 7-Mode Communications Requirements for more information.
-
If there are Authentication Failures to the ONTAP API in the Activity Monitor or Permissions Collector logs, check for the following:
-
Ensure all the prerequisites in the Permissions section were configured correctly.
-
Ensure the domain case configured in the application matches the domain value for the user configured in the Permissions sections.
-
Ensure the username configured in the Permissions section matches the username in Active Directory and the user defined in the Application configuration.
-
-
Ensure the NetApp internal firewall is not blocking communications with the Activity Monitor. Run the following commands in case of a block to allow communication with the Activity Monitor:
system services firewall policy clone -vserver <vserver_name> -policy data -destination-policy fp_siq1 -destination-vserver <vserver_name>
system services firewall policy create -vserver <vserver_name> -policy fp_siq -service http -allow-list <am_server_ip_address_with_mask>
-
If the Crawler hits unexpected "access denied" errors or misses entire shares due to access issues, this might be related to a known NetApp bug, which is documented in their knowledgebase (you need a NetApp account to view the entire entry):
Backups failing even though user is part of BUILTIN\Backup Operators group for ONTAP 9
-
The bug affects Data ONTAP 9.x, and according to the document should be fixed in version 9.4. It “causes backup intent permissions to be incorrectly checked”. This means the Backup Operators membership used to gain access to the filesystem doesn’t work, and “access denied” errors are sent back.
-
Fortunately, there’s a workaround provided in the knowledgebase entry, which is to “disable fake open capability” by running the following commands on the NetApp console or an SSH connection to the management interface (replace SVM01 with the relevant Vserver):
set diag
cifs options modify -vserver SVM01 -is-fake-open-enabled false
-
SSL Connection Failure
If an error is received in the Permissions Collector or Activity Monitor about an SSL connection which can’t be established:
-
The certificate key length on the NetApp should be verified. In older NetApp versions, the default certificate is created with a 512-bit length certificate. Use this command to create a certificate with at least a 1024-bit length key:
secureadmin setup ssl
-
Data ONTAP up to version 8.2.3 operating in 7-mode only supports security protocols up to TLSv1.0, with the following cipher suites supported when using TLSv1.0:
-
TLS_RSA_WITH_RC4_128_MD5
-
TLS_RSA_WITH_RC4_128_SHA
-
TLS_RSA_WITH_3DES_EDE_CBC_SHA
-
-
Removing support for cipher suites using RC4 or 3DES as their block cipher (the algorithm used to encrypt the data) means that the filer has no available cipher suites to use for secure communications.
-
Any server trying to communicate securely with the filer must support one of the above cipher suites, preferably 3DES, because it has been deprecated most recently and is still allowed for use. If you have knowledge of these ciphers or TLSv1.0 being blocked in your organization, you must unblock them on the servers running Permission Collection and Activity Monitoring. If you don’t know how to unblock them, talk to your organization’s security department/team, because those settings are not set that way by default. For more information, refer to the links below:
-
According to a NetApp security advisory, Data ONTAP 8.2.5 operating in 7-mode has the option to turn off TLSv1.0 entirely, and it supports TLSv1.1 and TLSv1.2, plus extra cipher suites that are supported by them, so this version should not be affected by removing support for cipher suites using RC4 or 3DES. The advisory is linked here:
If no events are collected, refer to What to do if Events are not Collected.