Monitor application health and performance with Consul proxy metrics

10min
|
Consul

Consul helps you securely connect applications running in any environment, at any scale. Consul observability features enhance your service mesh capabilities with enriched metrics, logs, and distributed traces so you can improve performance and debug your distributed services with precision.

Consul proxy metrics give you detailed health and performance information about your service mesh applications. This includes upstream/downstream network traffic metrics, ingress/egress request details, error rates, and additional performance information that you can use to understand your distributed applications. Once you enable proxy metrics in Consul, you do not need to configure or instrument your applications in the service mesh to leverage proxy metrics.

In this tutorial, you will enable proxy metrics for your Consul data plane. You will use Grafana to explore dashboards that provide information regarding health, performance, and operations for your service mesh applications. In the process, you will learn how using these features can provide you with deep insights, reduce operational overhead, and contribute to a more holistic view of your service mesh applications.

Scenario overview

HashiCups is a coffee shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. At the beginning of this tutorial, you will use Terraform to deploy the HashiCups microservices, a self-managed Consul cluster, and an observability suite on Elastic Kubernetes Service (EKS).

The architecture diagram of the scenario. This shows the Kubernetes environment and the flow of traffic from the client request through the self-managed Consul service mesh.

The Consul proxy sidecar container can collect Layer 7 (L7) metrics (HTTP status codes, request latency, transaction volume, etc.) for your service mesh applications. Consul can also collect metrics from the Consul management plane and gateways. By configuring the Consul Helm chart, you can configure the proxies to send this data to Prometheus, then visualize them with Grafana.

The observability traffic flow diagram of the scenario. This shows the flow of traffic from the observability suite to a single HashiCups microservice pod.

In this tutorial, you will:

Deploy the following resources with Terraform:
- Elastic Kubernetes Service (EKS) cluster
- A self-managed Consul datacenter on EKS
- Grafana and Prometheus on EKS
- HashiCups demo application
Perform the following Consul data plane procedures:
- Review and enable proxy metrics features
- Explore the demo application
- Explore dashboards with Grafana

Prerequisites

The tutorial assumes that you are familiar with Consul and its core functionality. If you are new to Consul, refer to the Consul Getting Started tutorials collection.

For this tutorial, you will need:

An AWS account configured for use with Terraform
(Optional) An HCP account
aws-cli >= 2.0
terraform >= 1.0
consul >= 1.16.0
consul-k8s >= 1.2.0
helm >= 3.0
git >= 2.0
kubectl > 1.24

Clone GitHub repository

Clone the GitHub repository containing the configuration files and resources.

$ git clone https://github.com/hashicorp-education/learn-consul-proxy-metrics

Change into the directory that contains the complete configuration files for this tutorial.

$ cd learn-consul-proxy-metrics/self-managed/eks

Review repository contents

This repository contains Terraform configuration to spin up the initial infrastructure and all files to deploy Consul, the demo application, and the observability suite resources.

Here, you will find the following Terraform configuration:

aws-vpc.tf defines the AWS VPC resources
eks-cluster.tf defines Amazon EKS cluster deployment resources
eks-consul.tf defines the self-managed Consul deployment
eks-hashicups-with-consul.tf defines the HashiCups resources
eks-observability.tf defines the Prometheus and Grafana resources
outputs.tf defines outputs you will use to authenticate and connect to your Kubernetes cluster
providers.tf defines AWS and Kubernetes provider definitions for Terraform
variables.tf defines variables you can use to customize the tutorial

Additionally, you will find the following directories and subdirectories:

dashboards contains the JSON configuration files for the example Grafana dashboards
api-gw contains the Kubernetes configuration files for the Consul API gateway
config contains the Kubernetes configuration files for the Consul telemetry collector intentions
hashicups contains the Kubernetes configuration files for HashiCups
helm contains the Helm charts for Consul, Grafana, and Prometheus

Deploy infrastructure and demo application

With these Terraform configuration files, you are ready to deploy your infrastructure.

Initialize your Terraform configuration to download the necessary providers and modules.

$ terraform init

Initializing the backend...

Initializing provider plugins...
## ...

Terraform has been successfully initialized!
## ...

Then, deploy the resources. Confirm the run by entering yes.

$ terraform apply

## ...
Do you want to perform these actions?
 Terraform will perform the actions described above.
 Only 'yes' will be accepted to approve.

 Enter a value: yes

## ...

Apply complete! Resources: 97 added, 0 changed, 0 destroyed.

Note

The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization.

Connect to your infrastructure

Now that you have deployed the Kubernetes cluster, configure kubectl to interact with it.

$ aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw kubernetes_cluster_id)

Enable Consul proxy metrics

In this section, you will review the parameters that enable Consul proxy metrics, upgrade your Consul installation to apply the new configuration, and restart your service mesh sidecar proxies.

Review the Consul values file

Consul lets you expose metrics for your service mesh applications and sidecars so they may be scraped by a Prometheus service that is outside of your service mesh. Review the highlighted lines in the values file below to see the parameters that enable this feature.

global:
## ...
  # Exposes Prometheus metrics for the Consul service mesh and sidecars.
  metrics:
    enabled: true
    # Enables Consul servers and clients metrics.
    enableAgentMetrics: true
    # Configures the retention time for metrics in Consul servers and clients.
    agentMetricsRetentionTime: "59m"

ui:
## ...
  # Enables displaying metrics in the Consul UI.
  metrics:
    enabled: true
    # The metrics provider specification.
    provider: "prometheus"
    # The URL of the prometheus metrics server.
    baseURL: http://prometheus-server.observability.svc.cluster.local

connectInject:
## ...
  # Enables metrics for Consul Connect sidecars.
  metrics:
    defaultEnabled: true

Refer to the Consul metrics for Kubernetes documentation to learn more about metrics configuration options and details.

Deploy the updated Consul values file

Update Consul in your Kubernetes cluster with Consul K8S CLI. Confirm the run by entering y.

$ consul-k8s upgrade -config-file=helm/consul-v2.yaml

Refer to the Consul K8S CLI documentation to learn more about additional settings.

Note

The upgrade could take up to 5 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for your updated Consul environment to become available.

Review the official Helm chart values to learn more about these settings.

Restart sidecar proxies

You need to restart your sidecar proxies to retrieve the updated proxy configuration. To do so, redeploy your HashiCups application.

$ kubectl rollout restart deployment --namespace default
deployment.apps/api-gateway restarted
deployment.apps/frontend restarted
deployment.apps/nginx restarted
deployment.apps/payments restarted
deployment.apps/product-api restarted
deployment.apps/product-api-db restarted
deployment.apps/public-api restarted
deployment.apps/traffic-generator restarted

Prometheus will now begin scraping the /metrics endpoint for all proxy sidecars on port 20200. Refer to the Consul metrics for Kubernetes documentation to learn more about changing these default parameters.

Confirm sidecar configuration

Confirm that your sidecar proxy configuration has been successfully updated by viewing the Envoy admin interface. You can connect to the Envoy admin interface by port-forwarding port 19000 from a service that has a sidecar proxy.

$ kubectl port-forward deploy/frontend 19000:19000

Open http://localhost:19000/config_dump in your browser to find the Envoy configuration. Search for 20200, the default endpoint port for Prometheus metrics. You should find two different stanzas that reference this port. One of them is included next for reference.

{
  "name": "envoy_prometheus_metrics_listener",
  "address": {
  "socket_address": {
    "address": "0.0.0.0",
    "port_value": 20200
  }
}

The presence of these stanzas confirms that Consul has configured the Envoy sidecar to expose Prometheus metrics.

Explore the demo application

In this section, you will visit your demo application to explore the HashiCups UI.

Retrieve the Consul API Gateway public DNS address.

$ export CONSUL_APIGW_ADDR=http://$(kubectl get svc/api-gateway -o json | jq -r '.status.loadBalancer.ingress[0].hostname') && echo $CONSUL_APIGW_ADDR
http://a4cc3e77d86854fe4bbcc9c62b8d381d-221509817.us-west-2.elb.amazonaws.com

Open the Consul API Gateway's URL in your browser and explore the HashiCups UI.

HashiCups in a state where all services are functional. Coffees are displayed on the screen and available to order.

Explore health insights dashboard

Consul proxy metrics help you monitor the health of your service mesh applications with information including: requests by status code, upstream/downstream connections, rejected connections, and Envoy cluster state. Most of these metrics are available for any service mesh application and require no additional application configuration.

Navigate to the HashiCups health monitoring Grafana dashboard.

$ export GRAFANA_HEALTH_DASHBOARD=http://$(kubectl get svc/grafana --namespace observability -o json | jq -r '.status.loadBalancer.ingress[0].hostname')/d/data-plane-health/ && echo $GRAFANA_HEALTH_DASHBOARD
http://a20fb6f2d1d3e4be296d05452a378ad2-428040929.us-west-2.elb.amazonaws.com/d/data-plane-health/

Note

The example dashboards take a few minutes to populate with data after the proxy metrics feature is enabled.

Notice that the example dashboard panes provide detailed health insights for HashiCups.

The HashiCups health monitoring dashboard. The dashboard displays a wide variety of health related metrics.

For example, the Upstream Rq by Status Code proxy statistics gives you a high-level overview of the HTTP requests throughout your service mesh. The Total active upstream connections graph shows how many upstream hosts are currently receiving requests and returning responses. These graphs can be useful to analyze the health of the upstream hosts in your service mesh and identify any anomalies in behavior.

Tip

Consul proxy metrics contain a large set of statistics that you can use to create custom dashboards for monitoring your service mesh applications according to your production environment's unique requirements. Refer to the Envoy proxy statistics overview for a complete list of available metrics.

Explore performance insights dashboard

In addition to monitoring service health, you can use Consul proxy metrics to monitor the performance of your service mesh applications. These metrics include network traffic statistics, CPU/memory usage by pod, data plane latency, and upstream/downstream connection data.

Navigate to the HashiCups performance monitoring Grafana dashboard.

$ export GRAFANA_PERFORMANCE_DASHBOARD=http://$(kubectl get svc/grafana --namespace observability -o json | jq -r '.status.loadBalancer.ingress[0].hostname')/d/data-plane-performance/ && echo $GRAFANA_PERFORMANCE_DASHBOARD
http://a20fb6f2d1d3e4be296d05452a378ad2-428040929.us-west-2.elb.amazonaws.com/d/data-plane-performance/

Note

The example dashboards take a few minutes to populate with data after the proxy metrics feature is enabled.

Notice that the example dashboard panes provide detailed performance insights for HashiCups.

The HashiCups performance monitoring dashboard. The dashboard displays a wide variety of performance related metrics.

For example, the Dataplane latency proxy statistics help you understand network performance for the respective percentiles of network traffic. In this example, p50 shows you the average performance and p99.9 shows you the worst performance for a given period of time. The Memory/CPU Usage % by pod limits panes can be useful to analyze the performance of the pods in your service mesh so you can modify resource allocations for any services that are over-provisioned or under-provisioned.

Clean up resources

Destroy the Terraform resources to clean up your environment. Confirm the destroy operation by inputting yes.

$ terraform destroy

## ...
Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

 Enter a value: yes

## ...

Destroy complete! Resources: 0 added, 0 changed, 97 destroyed.

Note

Due to race conditions with the cloud resources in this tutorial, you may need to run the destroy operation twice to remove all the resources.

Next steps

In this tutorial, you enabled proxy metrics in the Consul service mesh to enhance the health and performance monitoring of your service mesh applications. You did not need to configure or instrument for your applications to enable these features, leading to a very quick time-to-value for your service mesh applications. This integration offers faster incident resolution, increased application understanding, and reduced operational overhead.

For more information about the topics covered in this tutorial, refer to the following resources:

Collection Overview

Observe your network

Debug services with proxy access logs