Creating Watcher Alerts for Machine Learning jobs

The following appendix describes the procedure for creating Watcher alerts for machine learning jobs, emails, and remote Syslog servers.

Watcher Alert Workaround

DMF 8.1 uses Elasticsearch 7.2.0, where the inter-container functional calls are HTTP-based. However, DMF 8.3 uses Elasticsearch version 7.13.0, which now requires HTTPS-based calls. It would require an extensive change in the system calls used by the Analytics Node (AN), and engineering is working on this effort. Arista recommends the following workaround until the earlier fixes are released.
Workaround Summary:
  • Create a Watcher manually using the provided template.
  • Configure the Watcher to select the job ID for the ML job that needs to send alerts.
  • Use ‘webhook’ as the alerting mechanism within the Watcher to send alerts to 3rd party tools like ‘Slack.’
  1. Access the AN's ML job page and click Manage Jobs to list the ML jobs.
  2. If the data feed column shows as stopped, skip to Step 3. If it says started, click the 3 dots for a particular ML job and Stop the data feed for the current ML job.
    Figure 1. Stop Data Feed
  3. After the data feed has stopped, click the 3 dots and start the data feed.
    Figure 2. Start Data Feed
  4. Select the options as shown in the diagram below.
    Figure 3. Job Time Options
  5. Confirm that the data feed has started. Note down the job ID of this ML job.
    Figure 4. ML Job Characteristics
  6. Access the Watchers page.
    Figure 5. Access Watchers
  7. Create an advanced Watcher.
    Figure 6. Create Advanced Watcher
  8. Configure the name of the Watcher (can include whitespace characters), e.g., Latency ML.
  9. Configure the ID of the Watcher (can be alphanumeric, but without whitespace characters), e.g., ml_latency.
  10. Delete the code from the Watch JSON section.
  11. Copy and paste the following code into the Watcher. Replace the highlighted text according to your environment and your ML job parameters.
    {
      "trigger": {
        "schedule": {
          "interval": "107s"
        }
      },
      "input": {
        "search": {
          "request": {
            "search_type": "query_then_fetch",
            "indices": [
              ".ml-anomalies-*"
            ],
            "rest_total_hits_as_int": true,
            "body": {
              "size": 0,
              "query": {
                "bool": {
                  "filter": [
                    {
                      "term": {
                        "job_id": "<use the id of the ML job retrieved in step 6.>"
                      }
                    },
                    {
                      "range": {
                        "timestamp": {
                          "gte": "now-30m"
                        }
                      }
                    },
                    {
                      "terms": {
                        "result_type": [
                          "bucket",
                          "record",
                          "influencer"
                        ]
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "bucket_results": {
                  "filter": {
                    "range": {
                      "anomaly_score": {
                        "gte": 75
                      }
                    }
                  },
                  "aggs": {
                    "top_bucket_hits": {
                      "top_hits": {
                        "sort": [
                          {
                            "anomaly_score": {
                              "order": "desc"
                            }
                          }
                        ],
                        "_source": {
                          "includes": [
                            "job_id",
                            "result_type",
                            "timestamp",
                            "anomaly_score",
                            "is_interim"
                          ]
                        },
                        "size": 1,
                        "script_fields": {
                          "start": {
                            "script": {
                              "lang": "painless",
                              "source": "LocalDateTime.ofEpochSecond((doc[\"timestamp\"].value.getMillis()-((doc[\"bucket_span\"].value * 1000)\n * params.padding)) / 1000, 0,ZoneOffset.UTC).toString()+\":00.000Z\"",
                              "params": {
                                "padding": 10
                              }
                            }
                          },
                          "end": {
                            "script": {
                              "lang": "painless",
                              "source": "LocalDateTime.ofEpochSecond((doc[\"timestamp\"].value.getMillis()+((doc[\"bucket_span\"].value * 1000)\n * params.padding)) / 1000, 0,ZoneOffset.UTC).toString()+\":00.000Z\"",
                              "params": {
                                "padding": 10
                              }
                            }
                          },
                          "timestamp_epoch": {
                            "script": {
                              "lang": "painless",
                              "source": """doc["timestamp"].value.getMillis()/1000"""
                            }
                          },
                          "timestamp_iso8601": {
                            "script": {
                              "lang": "painless",
                              "source": """doc["timestamp"].value"""
                            }
                          },
                          "score": {
                            "script": {
                              "lang": "painless",
                              "source": """Math.round(doc["anomaly_score"].value)"""
                            }
                          }
                        }
                      }
                    }
                  }
                },
                "influencer_results": {
                  "filter": {
                    "range": {
                      "influencer_score": {
                        "gte": 3
                      }
                    }
                  },
                  "aggs": {
                    "top_influencer_hits": {
                      "top_hits": {
                        "sort": [
                          {
                            "influencer_score": {
                              "order": "desc"
                            }
                          }
                        ],
                        "_source": {
                          "includes": [
                            "result_type",
                            "timestamp",
                            "influencer_field_name",
                            "influencer_field_value",
                            "influencer_score",
                            "isInterim"
                          ]
                        },
                        "size": 3,
                        "script_fields": {
                          "score": {
                            "script": {
                              "lang": "painless",
                              "source": """Math.round(doc["influencer_score"].value)"""
                            }
                          }
                        }
                      }
                    }
                  }
                },
                "record_results": {
                  "filter": {
                    "range": {
                      "record_score": {
                        "gte": 75
                      }
                    }
                  },
                  "aggs": {
                    "top_record_hits": {
                      "top_hits": {
                        "sort": [
                          {
                            "record_score": {
                              "order": "desc"
                            }
                          }
                        ],
                        "_source": {
                          "includes": [
                            "result_type",
                            "timestamp",
                            "record_score",
                            "is_interim",
                            "function",
                            "field_name",
                            "by_field_value",
                            "over_field_value",
                            "partition_field_value"
                          ]
                        },
                        "size": 3,
                        "script_fields": {
                          "score": {
                            "script": {
                              "lang": "painless",
                              "source": """Math.round(doc["record_score"].value)"""
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      },
      "condition": {
        "compare": {
          "ctx.payload.aggregations.bucket_results.doc_count": {
            "gt": 0
          }
        }
      },
      "actions": {
        "log": {
          "logging": {
            "level": "info",
            "text": "Alert for job [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0._source.job_id}}] at [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.timestamp_iso8601.0}}] score [{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.score.0}}]"
          }
        },
        "my_webhook": {
          "webhook": {
            "scheme": "https",
            "host": "hooks.slack.com",
            "port": 443,
            "method": "post",
            "path": "<path for slack>",
            "params": {},
            "headers": {
              "Content-Type": "application/json"
            },
            "body": """{"channel": "#<slack channel name>", "username": "webhookbot", "text":"Alert for job [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0._source.job_id}}] at [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.timestamp_iso8601.0}}] score [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.score.0}}]", "icon_emoji": ":exclamation:"}"""
          }
        }
      }
    }
    
  12. Click Create Watch to create the Watcher.

Email Alerts and Remote Syslog Server

Sending Watcher alerts to email required editing configuration files in the command line and restarting the Elasticsearch container previously.

An update to the Watcher alerts feature creates a simpler configuration method using the Analytics Node UI and supports sending Watcher alerts to remote Syslog servers

Configuring a Kibana Email Connector

Select an existing Kibana email connector to send email alerts or create a connector by navigating to Stack Management > Rules and Connectors > Connectors > Create Connectors. Complete the following steps:

Figure 7. Rules and Connectors
  1. Configure the fields in the Configuration tab.
  2. Verify the connector works in the Test tab.
    Figure 8. Editing Connector
    Figure 9. Editing Connector to create action

Configuring a Watch

Configure a Watch using the Create threshold alert or Create advanced watch option, described in the following instructions.
Figure 10. Watcher

Create Threshold Alert

  1. Navigate to Stack Management > Watcher > Create > Create threshold alert and configure the alert conditions.
    Figure 11. Creating threshold alert
  2. Add a webhook action with the following fields.
    • Method: POST
    • Scheme: HTTP
    • Host: 169.254.16.1
    • Port: 8000
    • Specify the Body field as follows:
      • Sending Watcher alerts by email: Enter the required fields: to, subject, message, and kibana_email_connector. Multiple entries in the to field require a comma-separated list of email addresses wrapped in quotes. The kibana_email_connector field references an existing Kibana email connector.
      • Sending Watcher alerts to a remote Syslog server: Enter the required fields: message, protocol, primary_syslog_ip, and primary_syslog_port. If a second Syslog server should receive alerts, include backup_syslog_ip and backup_syslog_port.
        Figure 12. Performing Action for Webhook
    • The Path, Username, and Password fields do not need to be specified.
  3. Test the webhook action using Send Request before selecting Create alert. Depending on the configuration:
    • Verify the receipt of an email at the configured recipient address.
    • Verify the receipt of a syslog message on the remote Syslog server.

Create Advanced Watch

  1. Navigate to Stack Management > Watcher > Create > Create advanced watch and fill out the Name and ID of the Watch.
    Figure 13. Editing Advanced Watch
  2. For the Watch JSON field, the following JSON template configures the forwarding of alerts to email and remote Syslog servers. Configure the alert condition under the input and condition fields. Replace these values with any custom alert condition using the Elastic Painless scripting language. The configuration for forwarding alerts to email and remote Syslog servers is under the actions field.
    {
      "trigger": {
        "schedule": {
          "interval": "1m"
        }
      },
      "input": {
        "http": {
          "request": {
            "scheme": "https",
            "host": "<host>",
            "port": 443,
            "method": "get",
            "path": "/_cluster/health",
            "params": {},
            "headers": {
              "Content-Type": "application/json"
            },
            "auth": {
              "basic": {
                "username": "<user>",
                "password": "<password>"
              }
            }
          }
        }
      },
      "condition": {
        "script": {
          "source": "ctx.payload.status == 'green'",
          "lang": "painless"
        }
      },
      "actions": {
        "webhook_1": {
          "webhook": {
            "host": "169.254.16.1",
            "port": 8000,
            "method": "post",
            "scheme": "http",
            "body": "{\"message\": \"The Elasticsearch cluster status is {{ctx.payload.status}}\",  \"kibana_email_connector\": \"<existing-email-connector>\",  \"to\": \"This email address is being protected from spambots. You need JavaScript enabled to view it.\",  \"subject\": \"Elasticsearch cluster status alert\", \"protocol\": \"UDP\", \"primary_syslog_ip\": \"<remote-syslog-ip>\", \"primary_syslog_port\": <remote-syslog-port>, \"backup_syslog_ip\": \"<remote-syslog-ip>\", \"backup_syslog_port\": <remote-syslog-port>}"
          }
        }
      }
    }
  3. (Optional) To simulate the Watch, you can configure the fields in the Simulate Tab. The webhook action mode must be set to force_execute.
    Figure 14. Simulating Advanced Watch

Troubleshooting

  • If the email alert fails, verify that the value of the kibana_email_connector field matches the name of a Kibana email connector and that this email connector works in the Test tab.

Limitations

  • Remote Syslog messages require UDP. TCP is not supported currently.

Enabling Secure Email Alerts through SMTP Setting

Refresh the page to view the updated SMTP Settings fields.

The following is an example of the UI SMTP Settings in previous releases:
Figure 15. SMTP Setting
After upgrading the Analytics Node from an earlier version to the DMF 8.6.* version, the following changes apply:
  • Server Name, User, Password, Sender, and Timezone no longer appear in the SMTP Settings.
  • A new field, Kibana Email Connector Name, has been added to SMTP Settings.
  • The system retains Recipients and Dedupe Interval and their respective values in SMTP Settings.
  • If previously configured SMTP settings exist:
    • The system automatically creates a Kibana email connector named SMTPForAlerts using the values previously specified in the fields Server Name, User (optional), Password (optional), and Sender.
    • The Kibana Email Connector Name field automatically becomes SMTPForAlerts.
The following settings appear in the UI after the upgrade to the DMF 8.6.* version:
Figure 16. Upgraded SMTP Setting

Troubleshooting

When Apply & Test, do not send an email to the designated recipients, verify the recipient email addresses are comma-separated and spelled correctly. If it still doesn’t work, verify the designated Kibana email connector matches the name of an existing Kibana email connector. Test that connector by navigating to Stack Management > Rules and Connectors > Connectors, selecting the connector's name, and sending a test email in the Test tab.