Configuring Nomad Client Metrics

Nomad Metrics is a helper service used to collect metrics data from theNomad server and clients分别在服务和Nomad实例上运行。收集和使用指标使用DogStatsDprotocol and sent to the Services machine.

Nomad Metrics服务器

The Nomad Metrics container is run on the services host using the server flag and is installed as part of the CircleCI Server installation process, requiring no additional configuration.

Nomad指标客户端

The Nomad Metrics client is installed and run on all Nomad client instances. You will need to update your AWS Launch Configuration in order to install and configure it. Additionally, you will need to modify the AWS security group to ensure that UDP port 8125 is open on the Services machine. Steps for both configuration changes are explained below.

在继续之前,您应该登录AWS控制台的EC2服务部分。确保您登录到运行Circleci服务器的区域。

Updating the Services machine Security Group

  1. Select theInstancesLink位于左侧栏杆中的实例组下。

  2. Select the Services Box Instance. The name tag typically resemblescircleci_services.

  3. 在底部的“描述”框中,选择位于旁边的用户安全组链接Security Groupssection. It typically resembles*_users_sg

  4. 这将使您直接到突出显示用户安全组的安全组页面。在底部的描述框中,选择inound.标签后面是Edit按钮。

  5. Select theAdd a Rule按钮。From the drop-down, selectCustom UDP Rule。在端口范围字段中输入8125

  6. 源字段为您提供了一些选项。但是,这最终取决于您如何配置VPC和子网。以下是一些更常见的情景。

    1. (建议)允许来自Nomad客户端子网的流量。您通常可以匹配端口4647或3001的条目。例如,10.0.0.0/24

    2. 允许所有流量到UDP端口81250.。0.。0.。0./0

  7. Press the救Button

Updating the AWS Launch Configuration

先决条件

AWS EC2 Launch Configuration ID
  1. Select the自动缩放组(ASG)链接在左侧栏中。

  2. 使用类似于`* _nomad_clients_ASG`的名称标签找到ASG

  3. The Launch Configuration name is next to the ASG name IETerraform-20180814231555427200000001.

AWS EC2 Services Box Private IP Address
  1. Select theInstancesLink位于左侧栏杆中的实例组下

  2. Select the Services Box Instance. The name tag typically resemblescircleci_services.

  3. 在页面底部的“描述”框中,请记下私有IP地址。

Updating the Launch Configuration

  1. Select the启动配置link located underAuto Scalingin the sidebar to the left. Select the Launch Configuration you retrieved in the previous steps.

  2. 在底部的说明窗格中,选择Copy launch configuration按钮。

  3. 配置页面打开后,选择3. Configure detailslink located at the top of the page.

  4. 更新Namefield to something meaningful IENomad-Builder-with-metrics-lc-date

  5. Select theAdvanced Details落下。

  6. Copy and paste the launch configuration script from below in the text field next toUser data

  7. IMPORTANT:进入私人的IP地址服务框t Line 10. For example,export SERVICES_PRIVATE_IP="192.168.1.2"

  8. Select the跳过审查button and then the创建启动配置按钮。

#!/ bin / bash set -exu Export http_proxy =“Export https_proxy =”“导出no_proxy =”“导出aws_instance_metadata_url =”http:///169.254.169.254“export public_ip =”$(curl $ aws_instance_metadata_url /最新/ meta-数据/ public-ipv4)“导出private_ip =”$(curl $ aws_instance_metadata_url / meta-data / local-ipv4)“导出debian_frontend = nonInteractive unlame =”$(uname -r)“导出容器_name=”nomad_metrics“导出容器_image=“circleci / nomad-metrics:0.1.198-5f5befe”export services_private_ip =“”出口Nomad_metrics_port =“8125”Echo“------------------------------------------------------------------------------------------------------- apt-get更新&& apt-get -y升级回声“----------------------------------------安装NTP“Echo”------------------------------------- apt-get安装-y ntp#使用AWS NTP CONFIG for EC2实例,如果[-f / sys / hypervisor,则非aws/ uuid] && [`head -c 3 / sys / hypervisor / uuid` == EC2];然后cat << Eot> /etc/ntp.conf drivefile /var/lib/ntp/ntp.drift禁用监视器限制默认忽略icone限制127.0.0.1 make 255.0.0.0限制169.254.169.123 Nomodify Notrap服务器169.254.169.123更喜欢iBurst EOT EXLESecho“使用默认的NTP配置”FI服务NTP RESTART ECHO“----------------------------------- “echo”安装Docker“Echo”----------------------------------- APT-获取安装-y apt-transport-https ca-certificates curl软件 - 属性 - 常见curl -fssl https://download.docker.com/linux/ubuntu/gpg |Apt-key add  - 添加-apt-repository“deb [arch = amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs)稳定”apt-get安装-y“linux-image-$uname“apt-get更新apt-get -y安装docker-ce = 5:18.09.9〜3-0〜ubuntu-xenial#force docker使用Userns-Remap来缓解CVE 2019-5736 APT-Get-i Install JQmkdir -p / etc / docker [-f /etc/docker/daemon.json] ||echo'{}'> /etc/docker/daemon.json tmp = $(mktemp)cp /etc/docker/daemon.json /etc/docker/daemon.json.orig jq'。[“Userns-Remap”] ="default"' /etc/docker/daemon.json > "$tmp" && mv "$tmp" /etc/docker/daemon.json sudo echo 'export http_proxy="${http_proxy}"' >> /etc/default/docker sudo echo 'export https_proxy="${https_proxy}"' >> /etc/default/docker sudo echo 'export no_proxy="${no_proxy}"' >> /etc/default/docker sudo service docker restart sleep 5 echo "--------------------------------------" echo " Populating /etc/circleci/public-ipv4" echo "--------------------------------------" if ! (echo $PUBLIC_IP | grep -qP "^[\d.]+$") then echo "Setting the IPv4 address below in /etc/circleci/public-ipv4." echo "This address will be used in builds with \"Rebuild with SSH\"." mkdir -p /etc/circleci echo $PRIVATE_IP | tee /etc/circleci/public-ipv4 fi echo "--------------------------------------" echo " Installing nomad" echo "--------------------------------------" apt-get install -y zip curl -o nomad.zip https://releases.hashicorp.com/nomad/0.9.3/nomad_0.9.3_linux_amd64.zip unzip nomad.zip mv nomad /usr/bin echo "--------------------------------------" echo " Creating config.hcl" echo "--------------------------------------" export INSTANCE_ID="$(curl $aws_instance_metadata_url/latest/meta-data/instance-id)" mkdir -p /etc/nomad cat < /etc/nomad/config.hcl log_level = "DEBUG" name = "$INSTANCE_ID" data_dir = "/opt/nomad" datacenter = "default" advertise { http = "$PRIVATE_IP" rpc = "$PRIVATE_IP" serf = "$PRIVATE_IP" } client { enabled = true # Expecting to have DNS record for nomad server(s) servers = ["$SERVICES_PRIVATE_IP:4647"] node_class = "linux-64bit" options = {"driver.raw_exec.enable" = "1"} } telemetry { publish_node_metrics = true statsd_address = "$SERVICES_PRIVATE_IP:8125" } EOT echo "--------------------------------------" echo " Creating nomad.conf" echo "--------------------------------------" cat < /etc/systemd/system/nomad.service [Unit] Description="nomad" [Service] Restart=always RestartSec=30 TimeoutStartSec=1m ExecStart=/usr/bin/nomad agent -config /etc/nomad/config.hcl [Install] WantedBy=multi-user.target EOT echo "--------------------------------------" echo " Creating ci-privileged network" echo "--------------------------------------" docker network create --driver=bridge --opt com.docker.network.bridge.name=ci-privileged ci-privileged echo "--------------------------------------" echo " Starting Nomad service" echo "--------------------------------------" service nomad restart echo "--------------------------------------" echo " Setting up Nomad metrics" echo "--------------------------------------" docker pull $CONTAINER_IMAGE docker rm -f $CONTAINER_NAME || true docker run -d --name $CONTAINER_NAME \ --rm \ --net=host \ --userns=host \ $CONTAINER_IMAGE \ start --nomad-uri=http://localhost:4646 --statsd-host=$SERVICES_PRIVATE_IP --statsd-port=$NOMAD_METRICS_PORT --client

更新Auto Scaling组

  1. Select the自动缩放组(ASG)链接在左侧栏中。

  2. Select the ASG with a name tag similar to* _nomad_clients_asg.

  3. 在底部的描述框中,选择Edit按钮。

  4. Select the newly created Launch Configuration from the drop-down.

  5. Press the按钮。

  6. At this point, the older Nomad client instances will begin shutting down. They will be replaced with newer Nomad clients running Nomad Metrics.

statsd指标

Metrics sent via StatsD will be updated every 10s.

--server

The number of jobs in a terminal state (完全的and死的)will typically increase until Nomad garbage-collects the jobs from its state.
Name Type Description

circle.nomad.server_agent.poll_failure.

测量

1如果游牧代理的最后轮询失败;否则为0。该仪表独立设置circle.nomad.Client_Agent.Poll_Failure.poll_failure.poll_failure.poll_failure当Nomad-Metrics运行时--clientand--server模式同时。

circle.nomad.server_agent.jobs.pending

测量

群集中的待处理工作总数。

circle.nomad.server_agent.jobs.running.

测量

群集中的运行作业总数。

circle.nomad.server_agent.jobs.complete

测量

Total number of complete jobs across the cluster.

circle.nomad.server_agent.jobs.dead.

测量

群集中的死亡工作总数。

--client

Name Type Description

circle.nomad.Client_Agent.Poll_Failure.poll_failure.poll_failure.poll_failure

测量

1如果游牧代理的最后轮询失败;否则为0。

circle.nomad.client_agent.resources.total.cpu

测量

(见下文)

circle.nomad.Client_Agent.resources.used.cpu.

测量

(见下文)

circle.nomad.client_agent.resources.available.cpu

测量

(见下文)

circle.nomad.Client_Agent.Resources.Total.Memory.

测量

(见下文)

circle.nomad.client_Agent.resources.used.med.memory.

测量

(见下文)

circle.nomad.client_agent.resources.available.memory

测量

(见下文)

circle.nomad.client_agent.resources.total.disk

测量

(见下文)

circle.nomad.client_agent.resources.used.disk

测量

(见下文)

circle.nomad.client_agent.resources.available.disk

测量

(见下文)

circle.nomad.client_agent.resources.total.iops

测量

(见下文)

circle.nomad.client_agent.resources.used.iops

测量

(见下文)

circle.nomad.client_agent.resources.available.iops

测量

(见下文)

  • CPU resources are reported in units of MHz. Memory resources are reported in units of MB. Disk (capacity) resources are reported in units of MB.

  • Resource metrics are scoped to the Nomad node that nomad-metrics has been configured to poll. Figures from a single nomad-metrics job operating in--client模式是not整个群集的代表(尽管这些时期可以通过外部机制聚合来到达群集范围的视图。)

  • 所有指标都在circle.nomad.client_agent.resourcesnamespace will be accompanied with the following tags when writing to DogStatsD:

    • 流走真的如果Nomad节点已被标记为排水;错误的否则。

    • 状态:One of初始化ready那or