Argo Rollout 金丝雀发布实践

Argo Rollout 是 Kubernetes 生态中的渐进式交付工具之一,它提供了金丝雀发布、蓝绿部署等多种部署策略。本文将基于实际项目经验,介绍如何使用 Argo Rollout 实现金丝雀发布。

1. 概述

Argo Rollout 是 Argo 项目的一部分,专门用于 Kubernetes 应用的渐进式交付。它支持多种部署策略,包括:

  • 金丝雀发布(Canary):逐步将流量从旧版本转移到新版本
  • 蓝绿部署(Blue-Green):同时运行两个版本,快速切换
  • A/B 测试:基于用户特征进行流量分割

本文重点介绍金丝雀发布策略,并结合 Istio 和 Nginx Ingress 两种流量管理方式,展示完整的配置。

2. 整体架构

根据Argo Rollouts官方架构文档,Argo Rollouts由以下核心组件构成:

argo-rollout

2.1 组件详解

Argo Rollouts Controller

  • 监控集群中的Rollout资源变化
  • 读取Rollout定义并确保集群状态与定义一致
  • 不会干扰普通的Deployment资源

Rollout Resource

  • 自定义Kubernetes资源,与原生Deployment兼容
  • 包含额外的字段控制金丝雀和蓝绿部署的阶段、阈值和方法
  • 需要将Deployment迁移为Rollout才能被Argo Rollouts管理

Replica Sets

  • 标准Kubernetes ReplicaSet资源实例
  • Argo Rollouts添加额外元数据来跟踪不同版本
  • 完全由控制器自动管理,不应手动干预

Ingress/Service

  • 流量从用户进入集群并重定向到适当版本的机制
  • 支持多个服务:仅新版本、仅旧版本或两者兼有
  • 支持多种服务网格和Ingress解决方案进行流量分割

AnalysisTemplate 和 AnalysisRun

  • 连接Rollout到指标提供者的能力
  • 定义特定指标的阈值来决定更新是否成功
  • 支持自动推进、回滚或暂停Rollout

2.3 金丝雀发布流程

  1. 创建 Rollout 资源:定义部署策略和步骤
  2. 流量分割:通过 Istio 或 Nginx Ingress 控制流量分配
  3. 渐进式发布:按照预定义步骤逐步增加新版本流量
  4. 自动分析:基于指标自动判断是否继续或回滚
  5. 自动回滚:基于指标自动回滚到稳定版本

3. 配置实践

本文基于一个Helm Chart例子,展示了如何将Argo Rollout集成到部署流程中。

3.1 项目结构

argocd-demo/
├── charts/
│ └── rollout/ # 统一的rollout Chart
│ ├── templates/
│ │ ├── _helpers.tpl # 模板辅助函数
│ │ ├── app/ # 应用相关资源
│ │ │ ├── deployment.yaml
│ │ │ ├── rollout.yaml # Argo Rollout配置
│ │ │ └── registry-secret.yaml
│ │ └── traffic/ # 流量管理资源
│ │ ├── tls-secret.yaml
│ │ ├── istio/
│ │ │ ├── gateway.yaml
│ │ │ ├── services.yaml
│ │ │ └── virtualservice.yaml
│ │ ├── nginx-ingress/
│ │ │ ├── ingress.yaml
│ │ │ └── services.yaml
│ │ └── service/
│ │ └── loadbalancer.yaml
│ └── values.yaml
└── values.yaml

3.2 Rollout 资源定义

基于我们的Helm模板,Rollout资源支持动态配置:

{{- if .Values.enabled }}
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: {{ include "rollout.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "rollout.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
strategy:
canary:
{{- if .Values.traffic.nginxIngress.enabled }}
canaryService: {{ include "rollout.canaryServiceName" . }}
stableService: {{ include "rollout.stableServiceName" . }}
trafficRouting:
nginx:
stableIngress: {{ include "rollout.nginxIngressName" . }}
additionalIngressAnnotations:
canary-by-header: "X-Canary"
canary-by-header-value: "true"
{{- end }}
{{- if .Values.traffic.istio.enabled }}
canaryService: {{ include "rollout.canaryServiceName" . }}
stableService: {{ include "rollout.stableServiceName" . }}
trafficRouting:
istio:
virtualService:
name: {{ include "rollout.virtualServiceName" . }}
routes:
- primary
{{- end }}
steps:
- setWeight: 20
- pause: {}
{{- if or .Values.traffic.nginxIngress.enabled .Values.traffic.istio.enabled }}
- setCanaryScale:
weight: 50
- pause: {}
- setWeight: 50
- pause: {}
- setCanaryScale:
matchTrafficWeight: true
- pause: {}
{{- end }}
revisionHistoryLimit: 3
selector:
matchLabels:
{{- include "rollout.selectorLabels" . | nindent 6 }}
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "rollout.fullname" . }}
{{- end }}

3.3 配置管理

values.yaml 配置示例

rollout:
enabled: true
image: registry.example.com/demo/app
imageTag: v1
replicaCount: 3

# 应用名称配置
nameOverride: ""
fullnameOverride: ""

app:
name: "demo-app"
secrets:
registry:
enabled: true
name: registry-secret

# 流量管理配置
traffic:
# Istio 流量管理
istio:
enabled: false
host: demo-app.example.com
port: 8080
gateway:
name: common-inbound-gateway
tlsName: tls-secret
tlsNamespace: istio-system

# Nginx Ingress 流量管理
nginxIngress:
enabled: false
host: demo-app.example.com
port: 8080
tlsName: tls-secret
tlsNamespace: "" # 使用 release namespace

# 4层 Service 流量管理
service:
enabled: false
port: 8080
lbId: lb-xxxxxxxxxxxxxxxxx

3.4 流量管理配置

基于我们的Helm模板,支持三种流量管理方式:Istio、Nginx Ingress和4层Service。

3.4.1 Istio 流量管理

VirtualService 配置

{{- if and .Values.enabled .Values.traffic.istio.enabled }}
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: {{ include "rollout.virtualServiceName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "rollout.labels" . | nindent 4 }}
component: virtualservice
spec:
gateways:
- {{ .Values.traffic.istio.gateway.name }}
hosts:
- {{ .Values.traffic.istio.host }}
http:
- match:
- headers:
X-Canary:
exact: 'true'
name: canary
route:
- destination:
host: {{ include "rollout.canaryServiceName" . }}
port:
number: {{ .Values.traffic.istio.port }}
weight: 100
- match:
- headers:
X-Canary:
exact: 'false'
name: stable
route:
- destination:
host: {{ include "rollout.stableServiceName" . }}
port:
number: {{ .Values.traffic.istio.port }}
weight: 100
- name: primary
route:
- destination:
host: {{ include "rollout.stableServiceName" . }}
port:
number: {{ .Values.traffic.istio.port }}
headers:
request:
set:
X-Canary: 'false'
weight: 100
- destination:
host: {{ include "rollout.canaryServiceName" . }}
port:
number: {{ .Values.traffic.istio.port }}
headers:
request:
set:
X-Canary: 'true'
weight: 0
{{- end }}
3.4.2 Nginx Ingress 流量管理

Ingress 配置

{{- if and .Values.enabled .Values.traffic.nginxIngress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "rollout.nginxIngressName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "rollout.labels" . | nindent 4 }}
component: ingress
spec:
ingressClassName: nginx
tls:
- hosts:
- {{ .Values.traffic.nginxIngress.host }}
secretName: {{ .Values.traffic.nginxIngress.tlsName }}
rules:
- host: {{ .Values.traffic.nginxIngress.host }}
http:
paths:
- backend:
service:
name: {{ include "rollout.stableServiceName" . }}
port:
number: {{ .Values.traffic.nginxIngress.port }}
path: /
pathType: Prefix
{{- end }}
3.4.3 服务配置
{{- if and .Values.enabled .Values.traffic.istio.enabled }}
apiVersion: v1
kind: Service
metadata:
name: {{ include "rollout.canaryServiceName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "rollout.labels" . | nindent 4 }}
component: canary
spec:
ports:
- port: {{ .Values.traffic.istio.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "rollout.selectorLabels" . | nindent 4 }}

---
apiVersion: v1
kind: Service
metadata:
name: {{ include "rollout.stableServiceName" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "rollout.labels" . | nindent 4 }}
component: stable
spec:
ports:
- port: {{ .Values.traffic.istio.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "rollout.selectorLabels" . | nindent 4 }}
{{- end }}

3.5 模板辅助函数

我们的Helm Chart使用_helpers.tpl来统一管理命名规则:

{{/*
Canary Service Name
*/}}
{{- define "rollout.canaryServiceName" -}}
{{- printf "%s-canary" (include "rollout.appName" .) -}}
{{- end -}}

{{/*
Stable Service Name
*/}}
{{- define "rollout.stableServiceName" -}}
{{- printf "%s-stable" (include "rollout.appName" .) -}}
{{- end -}}

{{/*
Virtual Service Name
*/}}
{{- define "rollout.virtualServiceName" -}}
{{- printf "%s-vs" (include "rollout.appName" .) -}}
{{- end -}}

{{/*
Application Name
*/}}
{{- define "rollout.appName" -}}
{{- .Values.app.name -}}
{{- end -}}

4. 基于Helm的部署实践

4.1 环境准备

安装Argo Rollouts

# 安装Argo Rollouts Controller
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# 安装Argo Rollouts CLI
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x ./kubectl-argo-rollouts-linux-amd64
sudo mv ./kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

4.2 Helm部署流程

使用Helm部署或者结合ArgoCD部署,下列步骤是Helm部署和验证流程。

步骤一:使用Helm部署

部署Istio模式

# 使用Helm部署Istio模式的金丝雀发布
helm install demo-app . \
--namespace demo \
--create-namespace \
--set rollout.enabled=true \
--set rollout.traffic.istio.enabled=true \
--set rollout.traffic.nginxIngress.enabled=false \
--set rollout.traffic.service.enabled=false \
--set rollout.app.name=demo-app \
--set rollout.traffic.istio.host=demo-app.example.com

部署Nginx Ingress模式

# 使用Helm部署Nginx Ingress模式的金丝雀发布
helm install demo-app . \
--namespace demo \
--create-namespace \
--set rollout.enabled=true \
--set rollout.traffic.istio.enabled=false \
--set rollout.traffic.nginxIngress.enabled=true \
--set rollout.traffic.service.enabled=false \
--set rollout.app.name=demo-app \
--set rollout.traffic.nginxIngress.host=demo-app.example.com
步骤二:验证部署状态
# 查看Rollout状态
kubectl get rollout -n demo

# 查看Pod状态
kubectl get pods -n demo -l app=demo-app

# 查看服务状态
kubectl get svc -n demo -l app=demo-app

# 查看流量管理资源
kubectl get virtualservice -n demo # Istio模式
kubectl get ingress -n demo # Nginx模式
步骤三:触发金丝雀发布
# 方法1:通过Helm更新镜像版本
helm upgrade demo-app . \
--namespace demo \
--set rollout.imageTag=v2

# 方法2:直接更新Rollout资源
kubectl patch rollout demo-app-rollout -n demo \
--type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"demo-app","image":"registry.example.com/demo/app:v2"}]}}}}'
步骤四:监控发布进度
# 查看Rollout详细状态
kubectl describe rollout demo-app-rollout -n demo

# 使用Argo Rollouts CLI查看发布历史
kubectl argo rollouts get rollout demo-app-rollout -n demo

# 查看流量分配(Istio模式)
kubectl get virtualservice demo-app-vs -n demo -o yaml

# 查看Ingress配置(Nginx模式)
kubectl get ingress demo-app-stable -n demo -o yaml
步骤五:手动控制发布
# 推进到下一步
kubectl argo rollouts promote demo-app-rollout -n demo

# 暂停发布
kubectl argo rollouts pause demo-app-rollout -n demo

# 恢复发布
kubectl argo rollouts resume demo-app-rollout -n demo

# 回滚到稳定版本
kubectl argo rollouts abort demo-app-rollout -n demo

4.3 ArgoCD部署

如果使用ArgoCD,灰度发布中流量拓扑图如下:

Nginx Ingress 流量管理

灰度前
argo_rollout_ingress_1
灰度中
argo_rollout_ingress_2
灰度结束
argo_rollout_ingress_3

Istio 流量管理

灰度前
argo_rollout_istio_1
灰度中
argo_rollout_istio_2
argo_rollout_istio_3
灰度结束
argo_rollout_istio_4

4.4 流量测试

测试稳定版本
# 不带特殊头部的请求(默认走稳定版本)
curl http://demo-app.example.com/api/health

# 明确指定稳定版本
curl http://demo-app.example.com/api/health \
-H "X-Canary: false"
测试金丝雀版本
# 明确指定金丝雀版本
curl http://demo-app.example.com/api/health \
-H "X-Canary: true"
验证流量分配
# 连续发送请求,观察流量分配
for i in {1..10}; do curl -s http://demo-app.example.com/api/health; sleep 1; done

5. 高级特性

5.1 自动分析

Argo Rollouts支持通过AnalysisTemplate和AnalysisRun实现自动化的指标分析,这是实现智能金丝雀发布的关键特性。

AnalysisTemplate 定义

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: demo
spec:
metrics:
- name: success-rate
interval: 30s
successCondition: result[0] >= 0.95
failureCondition: result[0] < 0.90
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{job="{{args.service-name}}",status!~"5.."}[5m])) /
sum(rate(http_requests_total{job="{{args.service-name}}"}[5m]))

Rollout 中的分析配置

spec:
strategy:
canary:
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: demo-app-stable
- name: service-name-canary
value: demo-app-canary
steps:
- setWeight: 20
- pause: {duration: 10m}
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: demo-app-stable
- name: service-name-canary
value: demo-app-canary

5.2 渐进式发布

spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- setWeight: 20
- pause: {duration: 5m}
- setWeight: 40
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
- setWeight: 80
- pause: {duration: 5m}
- setWeight: 100

5.3 自动回滚

spec:
strategy:
canary:
rollbackWindow:
deployments: 5
replicas: 5
analysis:
templates:
- templateName: error-rate
args:
- name: service-name
value: demo-app-stable
- name: service-name-canary
value: demo-app-canary

6. 参考