[{"data":1,"prerenderedAt":1372},["ShallowReactive",2],{"doc-en-\u002Fen\u002Fdocs\u002Fobservability\u002Fmetrics-logs":3,"docs-en-all":1301},{"id":4,"title":5,"body":6,"category":1285,"description":1286,"draft":1287,"extension":1288,"icon":1289,"lastReviewed":1290,"meta":1291,"navigation":344,"order":47,"path":1292,"prerequisites":1293,"readingTime":1294,"seo":1295,"stem":1296,"tags":1297,"__hash__":1300},"docs_en\u002Fen\u002Fdocs\u002Fobservability\u002Fmetrics-logs.md","Metrics and logs",{"type":7,"value":8,"toc":1266},"minimark",[9,13,16,21,26,34,70,73,81,85,153,156,160,167,238,254,265,268,272,275,291,297,301,305,308,312,395,399,506,510,513,588,591,595,598,791,794,798,801,1018,1021,1031,1035,1038,1089,1092,1095,1140,1144,1147,1236,1239,1243,1262],[10,11,12],"p",{},"Observability usually requires a software stack parallel to the cluster: metric agent on each node, central time series, log aggregator, dashboard, alerter, tracer. Five components, each with its own configuration, its own update cycle, its own bill.",[10,14,15],{},"HeroCtl solves this internally. Metrics, logs, alerts, and tracing are already built into the control plane. You only plug in an external tool when the team has a concrete reason for it.",[17,18,20],"h2",{"id":19},"metrics","Metrics",[22,23,25],"h3",{"id":24},"the-default-endpoint","The default endpoint",[10,27,28,29,33],{},"Each server node exposes metrics in Prometheus format at ",[30,31,32],"code",{},"\u002Fv1\u002Fmetrics",":",[35,36,41],"pre",{"className":37,"code":38,"language":39,"meta":40,"style":40},"language-bash shiki shiki-themes github-dark-default","curl -H \"X-Heroctl-Token: $TOKEN\" https:\u002F\u002Fmanage.exemplo.com\u002Fv1\u002Fmetrics\n","bash","",[30,42,43],{"__ignoreMap":40},[44,45,48,52,56,60,64,67],"span",{"class":46,"line":47},"line",1,[44,49,51],{"class":50},"sQhOw","curl",[44,53,55],{"class":54},"sFSAA"," -H",[44,57,59],{"class":58},"s9uIt"," \"X-Heroctl-Token: ",[44,61,63],{"class":62},"sZEs4","$TOKEN",[44,65,66],{"class":58},"\"",[44,68,69],{"class":58}," https:\u002F\u002Fmanage.exemplo.com\u002Fv1\u002Fmetrics\n",[10,71,72],{},"Typical output (truncated):",[35,74,79],{"className":75,"code":77,"language":78},[76],"language-text","# HELP heroctl_node_cpu_usage_percent CPU em uso por nó\n# TYPE heroctl_node_cpu_usage_percent gauge\nheroctl_node_cpu_usage_percent{node=\"server-1\"} 23.4\nheroctl_node_cpu_usage_percent{node=\"server-2\"} 18.1\n\n# HELP heroctl_alloc_memory_bytes Memória usada por alocação\n# TYPE heroctl_alloc_memory_bytes gauge\nheroctl_alloc_memory_bytes{job=\"api\",alloc=\"abc123\"} 285212672\n","text",[30,80,77],{"__ignoreMap":40},[22,82,84],{"id":83},"what-ships-ready","What ships ready",[86,87,88,101],"table",{},[89,90,91],"thead",{},[92,93,94,98],"tr",{},[95,96,97],"th",{},"Metric family",[95,99,100],{},"Examples",[102,103,104,113,121,129,137,145],"tbody",{},[92,105,106,110],{},[107,108,109],"td",{},"Nodes",[107,111,112],{},"CPU, RAM, disk, network, load average, uptime",[92,114,115,118],{},[107,116,117],{},"Allocations",[107,119,120],{},"CPU, RAM, restarts, status, age",[92,122,123,126],{},[107,124,125],{},"Jobs",[107,127,128],{},"Healthy replicas, pending allocations, active deploys",[92,130,131,134],{},[107,132,133],{},"Router",[107,135,136],{},"Requests\u002Fs, p50\u002Fp95\u002Fp99 latency, 5xx errors per host",[92,138,139,142],{},[107,140,141],{},"Ingress TLS",[107,143,144],{},"Certificate validity, renewal failures",[92,146,147,150],{},[107,148,149],{},"Internal API",[107,151,152],{},"Latency, throughput, error rate",[10,154,155],{},"On a freshly installed cluster, this already feeds a complete dashboard with no extra configuration.",[22,157,159],{"id":158},"custom-application-metrics","Custom application metrics",[10,161,162,163,166],{},"Your application exposes ",[30,164,165],{},"\u002Fmetrics"," on any port, in Prometheus format. Declare it in the spec:",[35,168,172],{"className":169,"code":170,"language":171,"meta":40,"style":40},"language-yaml shiki shiki-themes github-dark-default","job: api-pagamentos\nmetrics:\n  enabled: true\n  path: \u002Fmetrics\n  port: 9090\n  interval: 15s\n","yaml",[30,173,174,186,194,205,216,227],{"__ignoreMap":40},[44,175,176,180,183],{"class":46,"line":47},[44,177,179],{"class":178},"sPWt5","job",[44,181,182],{"class":62},": ",[44,184,185],{"class":58},"api-pagamentos\n",[44,187,189,191],{"class":46,"line":188},2,[44,190,19],{"class":178},[44,192,193],{"class":62},":\n",[44,195,197,200,202],{"class":46,"line":196},3,[44,198,199],{"class":178},"  enabled",[44,201,182],{"class":62},[44,203,204],{"class":54},"true\n",[44,206,208,211,213],{"class":46,"line":207},4,[44,209,210],{"class":178},"  path",[44,212,182],{"class":62},[44,214,215],{"class":58},"\u002Fmetrics\n",[44,217,219,222,224],{"class":46,"line":218},5,[44,220,221],{"class":178},"  port",[44,223,182],{"class":62},[44,225,226],{"class":54},"9090\n",[44,228,230,233,235],{"class":46,"line":229},6,[44,231,232],{"class":178},"  interval",[44,234,182],{"class":62},[44,236,237],{"class":58},"15s\n",[10,239,240,241,243,244,246,247,246,250,253],{},"The cluster scrapes, aggregates, and serves them on the same ",[30,242,32],{}," endpoint. The metrics come labeled with ",[30,245,179],{},", ",[30,248,249],{},"alloc",[30,251,252],{},"node",", so querying is direct:",[35,255,259],{"className":256,"code":257,"language":258,"meta":40,"style":40},"language-promql shiki shiki-themes github-dark-default","rate(http_requests_total{job=\"api-pagamentos\",status=~\"5..\"}[5m])\n","promql",[30,260,261],{"__ignoreMap":40},[44,262,263],{"class":46,"line":47},[44,264,257],{},[10,266,267],{},"Official Prometheus clients exist for Go, Python, Java, Node.js, Ruby, .NET, Rust, and PHP. In any of these languages, instrumenting an application takes fifteen minutes.",[17,269,271],{"id":270},"embedded-panel","Embedded panel",[10,273,274],{},"The admin panel (port 8443) ships a ready-made charts section:",[276,277,278,282,285,288],"ul",{},[279,280,281],"li",{},"Cluster view: aggregated CPU, RAM, network",[279,283,284],{},"Job view: replicas, restarts, router latency",[279,286,287],{},"Allocation view: log stream, individual metrics",[279,289,290],{},"Host view: detail of each node, allocations on it",[10,292,293,294,296],{},"For most teams, this panel replaces a Grafana stood up externally. When you need heavily customized dashboards or correlation with external sources, it's worth pointing a Grafana at the ",[30,295,32],{}," endpoint as a regular Prometheus datasource.",[17,298,300],{"id":299},"logs","Logs",[22,302,304],{"id":303},"collection-model","Collection model",[10,306,307],{},"Each allocation has stdout and stderr captured by the local agent, compressed, and sent to the cluster's central log writer. There is no separate log agent to install, configure, or update.",[22,309,311],{"id":310},"real-time-tail","Real-time tail",[35,313,315],{"className":37,"code":314,"language":39,"meta":40,"style":40},"# stream do job inteiro (todas as alocações)\nheroctl logs -f --job api-pagamentos\n\n# uma alocação específica\nheroctl logs -f --alloc abc123\n\n# só stderr\nheroctl logs -f --job api-pagamentos --stream stderr\n",[30,316,317,323,340,346,351,365,369,375],{"__ignoreMap":40},[44,318,319],{"class":46,"line":47},[44,320,322],{"class":321},"sH3jZ","# stream do job inteiro (todas as alocações)\n",[44,324,325,328,331,334,337],{"class":46,"line":188},[44,326,327],{"class":50},"heroctl",[44,329,330],{"class":58}," logs",[44,332,333],{"class":54}," -f",[44,335,336],{"class":54}," --job",[44,338,339],{"class":58}," api-pagamentos\n",[44,341,342],{"class":46,"line":196},[44,343,345],{"emptyLinePlaceholder":344},true,"\n",[44,347,348],{"class":46,"line":207},[44,349,350],{"class":321},"# uma alocação específica\n",[44,352,353,355,357,359,362],{"class":46,"line":218},[44,354,327],{"class":50},[44,356,330],{"class":58},[44,358,333],{"class":54},[44,360,361],{"class":54}," --alloc",[44,363,364],{"class":58}," abc123\n",[44,366,367],{"class":46,"line":229},[44,368,345],{"emptyLinePlaceholder":344},[44,370,372],{"class":46,"line":371},7,[44,373,374],{"class":321},"# só stderr\n",[44,376,378,380,382,384,386,389,392],{"class":46,"line":377},8,[44,379,327],{"class":50},[44,381,330],{"class":58},[44,383,333],{"class":54},[44,385,336],{"class":54},[44,387,388],{"class":58}," api-pagamentos",[44,390,391],{"class":54}," --stream",[44,393,394],{"class":58}," stderr\n",[22,396,398],{"id":397},"filtering","Filtering",[35,400,402],{"className":37,"code":401,"language":39,"meta":40,"style":40},"# entre dois timestamps\nheroctl logs --job api-pagamentos \\\n  --since \"2026-04-25 10:00\" \\\n  --until \"2026-04-25 11:00\"\n\n# busca textual\nheroctl logs --job api-pagamentos --since 1h | grep \"panic\"\n\n# saída estruturada para processar com jq\nheroctl logs --job api-pagamentos --since 1h --format json\n",[30,403,404,409,423,433,441,445,450,475,479,485],{"__ignoreMap":40},[44,405,406],{"class":46,"line":47},[44,407,408],{"class":321},"# entre dois timestamps\n",[44,410,411,413,415,417,419],{"class":46,"line":188},[44,412,327],{"class":50},[44,414,330],{"class":58},[44,416,336],{"class":54},[44,418,388],{"class":58},[44,420,422],{"class":421},"suJrU"," \\\n",[44,424,425,428,431],{"class":46,"line":196},[44,426,427],{"class":54},"  --since",[44,429,430],{"class":58}," \"2026-04-25 10:00\"",[44,432,422],{"class":421},[44,434,435,438],{"class":46,"line":207},[44,436,437],{"class":54},"  --until",[44,439,440],{"class":58}," \"2026-04-25 11:00\"\n",[44,442,443],{"class":46,"line":218},[44,444,345],{"emptyLinePlaceholder":344},[44,446,447],{"class":46,"line":229},[44,448,449],{"class":321},"# busca textual\n",[44,451,452,454,456,458,460,463,466,469,472],{"class":46,"line":371},[44,453,327],{"class":50},[44,455,330],{"class":58},[44,457,336],{"class":54},[44,459,388],{"class":58},[44,461,462],{"class":54}," --since",[44,464,465],{"class":58}," 1h",[44,467,468],{"class":421}," |",[44,470,471],{"class":50}," grep",[44,473,474],{"class":58}," \"panic\"\n",[44,476,477],{"class":46,"line":377},[44,478,345],{"emptyLinePlaceholder":344},[44,480,482],{"class":46,"line":481},9,[44,483,484],{"class":321},"# saída estruturada para processar com jq\n",[44,486,488,490,492,494,496,498,500,503],{"class":46,"line":487},10,[44,489,327],{"class":50},[44,491,330],{"class":58},[44,493,336],{"class":54},[44,495,388],{"class":58},[44,497,462],{"class":54},[44,499,465],{"class":58},[44,501,502],{"class":54}," --format",[44,504,505],{"class":58}," json\n",[22,507,509],{"id":508},"retention","Retention",[10,511,512],{},"Default: 30 days per active allocation, 7 days after the allocation terminates. Configurable in the cluster spec:",[35,514,516],{"className":169,"code":515,"language":171,"meta":40,"style":40},"logs:\n  retention:\n    active_days: 30\n    terminated_days: 7\n  storage:\n    type: local\n    path: \u002Fvar\u002Flib\u002Fheroctl\u002Flogs\n    max_size_gb: 100\n",[30,517,518,524,531,541,551,558,568,578],{"__ignoreMap":40},[44,519,520,522],{"class":46,"line":47},[44,521,299],{"class":178},[44,523,193],{"class":62},[44,525,526,529],{"class":46,"line":188},[44,527,528],{"class":178},"  retention",[44,530,193],{"class":62},[44,532,533,536,538],{"class":46,"line":196},[44,534,535],{"class":178},"    active_days",[44,537,182],{"class":62},[44,539,540],{"class":54},"30\n",[44,542,543,546,548],{"class":46,"line":207},[44,544,545],{"class":178},"    terminated_days",[44,547,182],{"class":62},[44,549,550],{"class":54},"7\n",[44,552,553,556],{"class":46,"line":218},[44,554,555],{"class":178},"  storage",[44,557,193],{"class":62},[44,559,560,563,565],{"class":46,"line":229},[44,561,562],{"class":178},"    type",[44,564,182],{"class":62},[44,566,567],{"class":58},"local\n",[44,569,570,573,575],{"class":46,"line":371},[44,571,572],{"class":178},"    path",[44,574,182],{"class":62},[44,576,577],{"class":58},"\u002Fvar\u002Flib\u002Fheroctl\u002Flogs\n",[44,579,580,583,585],{"class":46,"line":377},[44,581,582],{"class":178},"    max_size_gb",[44,584,182],{"class":62},[44,586,587],{"class":54},"100\n",[10,589,590],{},"For longer retention, export to external storage (next section).",[22,592,594],{"id":593},"export-outside","Export outside",[10,596,597],{},"When you need retention in years, or correlation with logs from systems that don't run on the cluster, ready-made outputs are available:",[35,599,601],{"className":169,"code":600,"language":171,"meta":40,"style":40},"logs:\n  export:\n    - type: syslog\n      destination: logs.empresa.com.br:514\n      protocol: tcp\n      tls: true\n\n    - type: loki\n      url: https:\u002F\u002Floki.empresa.com.br\n      tenant: heroctl-prod\n\n    - type: cloudwatch\n      region: us-east-1\n      log_group: \u002Fheroctl\u002Fprod\n      credentials: ${secret.aws_logs}\n\n    - type: elasticsearch\n      url: https:\u002F\u002Felastic.empresa.com.br\n      index: heroctl-%Y.%m.%d\n      credentials: ${secret.es_creds}\n",[30,602,603,609,616,629,639,649,658,662,673,683,693,698,710,721,732,743,748,760,770,781],{"__ignoreMap":40},[44,604,605,607],{"class":46,"line":47},[44,606,299],{"class":178},[44,608,193],{"class":62},[44,610,611,614],{"class":46,"line":188},[44,612,613],{"class":178},"  export",[44,615,193],{"class":62},[44,617,618,621,624,626],{"class":46,"line":196},[44,619,620],{"class":62},"    - ",[44,622,623],{"class":178},"type",[44,625,182],{"class":62},[44,627,628],{"class":58},"syslog\n",[44,630,631,634,636],{"class":46,"line":207},[44,632,633],{"class":178},"      destination",[44,635,182],{"class":62},[44,637,638],{"class":58},"logs.empresa.com.br:514\n",[44,640,641,644,646],{"class":46,"line":218},[44,642,643],{"class":178},"      protocol",[44,645,182],{"class":62},[44,647,648],{"class":58},"tcp\n",[44,650,651,654,656],{"class":46,"line":229},[44,652,653],{"class":178},"      tls",[44,655,182],{"class":62},[44,657,204],{"class":54},[44,659,660],{"class":46,"line":371},[44,661,345],{"emptyLinePlaceholder":344},[44,663,664,666,668,670],{"class":46,"line":377},[44,665,620],{"class":62},[44,667,623],{"class":178},[44,669,182],{"class":62},[44,671,672],{"class":58},"loki\n",[44,674,675,678,680],{"class":46,"line":481},[44,676,677],{"class":178},"      url",[44,679,182],{"class":62},[44,681,682],{"class":58},"https:\u002F\u002Floki.empresa.com.br\n",[44,684,685,688,690],{"class":46,"line":487},[44,686,687],{"class":178},"      tenant",[44,689,182],{"class":62},[44,691,692],{"class":58},"heroctl-prod\n",[44,694,696],{"class":46,"line":695},11,[44,697,345],{"emptyLinePlaceholder":344},[44,699,701,703,705,707],{"class":46,"line":700},12,[44,702,620],{"class":62},[44,704,623],{"class":178},[44,706,182],{"class":62},[44,708,709],{"class":58},"cloudwatch\n",[44,711,713,716,718],{"class":46,"line":712},13,[44,714,715],{"class":178},"      region",[44,717,182],{"class":62},[44,719,720],{"class":58},"us-east-1\n",[44,722,724,727,729],{"class":46,"line":723},14,[44,725,726],{"class":178},"      log_group",[44,728,182],{"class":62},[44,730,731],{"class":58},"\u002Fheroctl\u002Fprod\n",[44,733,735,738,740],{"class":46,"line":734},15,[44,736,737],{"class":178},"      credentials",[44,739,182],{"class":62},[44,741,742],{"class":58},"${secret.aws_logs}\n",[44,744,746],{"class":46,"line":745},16,[44,747,345],{"emptyLinePlaceholder":344},[44,749,751,753,755,757],{"class":46,"line":750},17,[44,752,620],{"class":62},[44,754,623],{"class":178},[44,756,182],{"class":62},[44,758,759],{"class":58},"elasticsearch\n",[44,761,763,765,767],{"class":46,"line":762},18,[44,764,677],{"class":178},[44,766,182],{"class":62},[44,768,769],{"class":58},"https:\u002F\u002Felastic.empresa.com.br\n",[44,771,773,776,778],{"class":46,"line":772},19,[44,774,775],{"class":178},"      index",[44,777,182],{"class":62},[44,779,780],{"class":58},"heroctl-%Y.%m.%d\n",[44,782,784,786,788],{"class":46,"line":783},20,[44,785,737],{"class":178},[44,787,182],{"class":62},[44,789,790],{"class":58},"${secret.es_creds}\n",[10,792,793],{},"Multiple destinations can run at the same time. The cluster keeps the local copy for the retention period and replicates to the configured destinations.",[17,795,797],{"id":796},"alerts","Alerts",[10,799,800],{},"An alert is an expression over metrics that fires a webhook when true for a configured duration:",[35,802,804],{"className":169,"code":803,"language":171,"meta":40,"style":40},"alerts:\n  - name: api-erro-alto\n    expr: |\n      rate(http_requests_total{job=\"api-pagamentos\",status=~\"5..\"}[5m])\n        \u002F rate(http_requests_total{job=\"api-pagamentos\"}[5m]) > 0.05\n    for: 5m\n    severity: critical\n    annotations:\n      summary: \"Taxa de erro acima de 5% em api-pagamentos\"\n      runbook: https:\u002F\u002Fwiki.empresa.com.br\u002Frunbook\u002Fapi-pagamentos\n\n    notify:\n      - type: slack\n        webhook: ${secret.slack_oncall}\n      - type: pagerduty\n        routing_key: ${secret.pagerduty_critical}\n\n  - name: certificado-expirando\n    expr: heroctl_ingress_cert_expiry_days \u003C 14\n    for: 1h\n    severity: warning\n    notify:\n      - type: discord\n        webhook: ${secret.discord_ops}\n",[30,805,806,812,825,835,840,845,855,865,872,882,892,896,903,915,925,936,946,950,961,970,979,989,996,1008],{"__ignoreMap":40},[44,807,808,810],{"class":46,"line":47},[44,809,796],{"class":178},[44,811,193],{"class":62},[44,813,814,817,820,822],{"class":46,"line":188},[44,815,816],{"class":62},"  - ",[44,818,819],{"class":178},"name",[44,821,182],{"class":62},[44,823,824],{"class":58},"api-erro-alto\n",[44,826,827,830,832],{"class":46,"line":196},[44,828,829],{"class":178},"    expr",[44,831,182],{"class":62},[44,833,834],{"class":421},"|\n",[44,836,837],{"class":46,"line":207},[44,838,839],{"class":58},"      rate(http_requests_total{job=\"api-pagamentos\",status=~\"5..\"}[5m])\n",[44,841,842],{"class":46,"line":218},[44,843,844],{"class":58},"        \u002F rate(http_requests_total{job=\"api-pagamentos\"}[5m]) > 0.05\n",[44,846,847,850,852],{"class":46,"line":229},[44,848,849],{"class":178},"    for",[44,851,182],{"class":62},[44,853,854],{"class":58},"5m\n",[44,856,857,860,862],{"class":46,"line":371},[44,858,859],{"class":178},"    severity",[44,861,182],{"class":62},[44,863,864],{"class":58},"critical\n",[44,866,867,870],{"class":46,"line":377},[44,868,869],{"class":178},"    annotations",[44,871,193],{"class":62},[44,873,874,877,879],{"class":46,"line":481},[44,875,876],{"class":178},"      summary",[44,878,182],{"class":62},[44,880,881],{"class":58},"\"Taxa de erro acima de 5% em api-pagamentos\"\n",[44,883,884,887,889],{"class":46,"line":487},[44,885,886],{"class":178},"      runbook",[44,888,182],{"class":62},[44,890,891],{"class":58},"https:\u002F\u002Fwiki.empresa.com.br\u002Frunbook\u002Fapi-pagamentos\n",[44,893,894],{"class":46,"line":695},[44,895,345],{"emptyLinePlaceholder":344},[44,897,898,901],{"class":46,"line":700},[44,899,900],{"class":178},"    notify",[44,902,193],{"class":62},[44,904,905,908,910,912],{"class":46,"line":712},[44,906,907],{"class":62},"      - ",[44,909,623],{"class":178},[44,911,182],{"class":62},[44,913,914],{"class":58},"slack\n",[44,916,917,920,922],{"class":46,"line":723},[44,918,919],{"class":178},"        webhook",[44,921,182],{"class":62},[44,923,924],{"class":58},"${secret.slack_oncall}\n",[44,926,927,929,931,933],{"class":46,"line":734},[44,928,907],{"class":62},[44,930,623],{"class":178},[44,932,182],{"class":62},[44,934,935],{"class":58},"pagerduty\n",[44,937,938,941,943],{"class":46,"line":745},[44,939,940],{"class":178},"        routing_key",[44,942,182],{"class":62},[44,944,945],{"class":58},"${secret.pagerduty_critical}\n",[44,947,948],{"class":46,"line":750},[44,949,345],{"emptyLinePlaceholder":344},[44,951,952,954,956,958],{"class":46,"line":762},[44,953,816],{"class":62},[44,955,819],{"class":178},[44,957,182],{"class":62},[44,959,960],{"class":58},"certificado-expirando\n",[44,962,963,965,967],{"class":46,"line":772},[44,964,829],{"class":178},[44,966,182],{"class":62},[44,968,969],{"class":58},"heroctl_ingress_cert_expiry_days \u003C 14\n",[44,971,972,974,976],{"class":46,"line":783},[44,973,849],{"class":178},[44,975,182],{"class":62},[44,977,978],{"class":58},"1h\n",[44,980,982,984,986],{"class":46,"line":981},21,[44,983,859],{"class":178},[44,985,182],{"class":62},[44,987,988],{"class":58},"warning\n",[44,990,992,994],{"class":46,"line":991},22,[44,993,900],{"class":178},[44,995,193],{"class":62},[44,997,999,1001,1003,1005],{"class":46,"line":998},23,[44,1000,907],{"class":62},[44,1002,623],{"class":178},[44,1004,182],{"class":62},[44,1006,1007],{"class":58},"discord\n",[44,1009,1011,1013,1015],{"class":46,"line":1010},24,[44,1012,919],{"class":178},[44,1014,182],{"class":62},[44,1016,1017],{"class":58},"${secret.discord_ops}\n",[10,1019,1020],{},"Channels supported out of the box: Slack, Discord, PagerDuty, Opsgenie, generic webhook. For anyone wanting a custom integration (Telegram, email, SMS), the generic webhook covers everything.",[1022,1023,1024],"blockquote",{},[10,1025,1026,1030],{},[1027,1028,1029],"strong",{},"Warning:"," Start with a few critical alerts. Twenty noisy alerts become zero alerts — the team learns to ignore them. Five alerts that always indicate a real problem are useful.",[17,1032,1034],{"id":1033},"distributed-tracing","Distributed tracing",[10,1036,1037],{},"Tracing is available as opt-in in the job spec:",[35,1039,1041],{"className":169,"code":1040,"language":171,"meta":40,"style":40},"job: api-pagamentos\ntracing:\n  enabled: true\n  protocol: otlp\n  sample_rate: 0.1   # 10% das requisições\n",[30,1042,1043,1051,1058,1066,1076],{"__ignoreMap":40},[44,1044,1045,1047,1049],{"class":46,"line":47},[44,1046,179],{"class":178},[44,1048,182],{"class":62},[44,1050,185],{"class":58},[44,1052,1053,1056],{"class":46,"line":188},[44,1054,1055],{"class":178},"tracing",[44,1057,193],{"class":62},[44,1059,1060,1062,1064],{"class":46,"line":196},[44,1061,199],{"class":178},[44,1063,182],{"class":62},[44,1065,204],{"class":54},[44,1067,1068,1071,1073],{"class":46,"line":207},[44,1069,1070],{"class":178},"  protocol",[44,1072,182],{"class":62},[44,1074,1075],{"class":58},"otlp\n",[44,1077,1078,1081,1083,1086],{"class":46,"line":218},[44,1079,1080],{"class":178},"  sample_rate",[44,1082,182],{"class":62},[44,1084,1085],{"class":54},"0.1",[44,1087,1088],{"class":321},"   # 10% das requisições\n",[10,1090,1091],{},"The application instrumented with OpenTelemetry sends to the embedded collector. The panel shows traces correlated with logs and metrics from the same allocation.",[10,1093,1094],{},"For advanced visualization (span timeline, trace comparison, tail analysis), export to Jaeger, Tempo, or a SaaS like Honeycomb:",[35,1096,1098],{"className":169,"code":1097,"language":171,"meta":40,"style":40},"tracing:\n  export:\n    - type: otlp\n      endpoint: tempo.empresa.com.br:4317\n      tls: true\n",[30,1099,1100,1106,1112,1122,1132],{"__ignoreMap":40},[44,1101,1102,1104],{"class":46,"line":47},[44,1103,1055],{"class":178},[44,1105,193],{"class":62},[44,1107,1108,1110],{"class":46,"line":188},[44,1109,613],{"class":178},[44,1111,193],{"class":62},[44,1113,1114,1116,1118,1120],{"class":46,"line":196},[44,1115,620],{"class":62},[44,1117,623],{"class":178},[44,1119,182],{"class":62},[44,1121,1075],{"class":58},[44,1123,1124,1127,1129],{"class":46,"line":207},[44,1125,1126],{"class":178},"      endpoint",[44,1128,182],{"class":62},[44,1130,1131],{"class":58},"tempo.empresa.com.br:4317\n",[44,1133,1134,1136,1138],{"class":46,"line":218},[44,1135,653],{"class":178},[44,1137,182],{"class":62},[44,1139,204],{"class":54},[17,1141,1143],{"id":1142},"cost-comparison","Cost comparison",[10,1145,1146],{},"For a typical cluster — 4 nodes, 30 jobs, 100 million requests\u002Fmonth — a commercial SaaS observability stack runs between R$ 1,000 and R$ 2,000 per month. An equivalent self-hosted stack (Prometheus + Loki + Grafana + Alertmanager + Tempo) has low direct cost, but requires half a day of operations per week.",[86,1148,1149,1165],{},[89,1150,1151],{},[92,1152,1153,1156,1159,1162],{},[95,1154,1155],{},"Item",[95,1157,1158],{},"Internal stack",[95,1160,1161],{},"Commercial SaaS",[95,1163,1164],{},"Self-hosted stack",[102,1166,1167,1181,1195,1209,1223],{},[92,1168,1169,1172,1175,1178],{},[107,1170,1171],{},"Direct cost\u002Fmonth",[107,1173,1174],{},"R$ 0",[107,1176,1177],{},"R$ 1,000–2,000",[107,1179,1180],{},"R$ 100–300 (infra)",[92,1182,1183,1186,1189,1192],{},[107,1184,1185],{},"Setup time",[107,1187,1188],{},"0 (already running)",[107,1190,1191],{},"1 day",[107,1193,1194],{},"1 to 2 weeks",[92,1196,1197,1200,1203,1206],{},[107,1198,1199],{},"Maintenance",[107,1201,1202],{},"Alongside cluster",[107,1204,1205],{},"Zero",[107,1207,1208],{},"A few hours\u002Fweek",[92,1210,1211,1214,1217,1220],{},[107,1212,1213],{},"Limits",[107,1215,1216],{},"For teams up to ~50 jobs",[107,1218,1219],{},"Practically unlimited",[107,1221,1222],{},"Whatever your infra holds",[92,1224,1225,1228,1230,1233],{},[107,1226,1227],{},"Dashboard customization",[107,1229,271],{},[107,1231,1232],{},"High",[107,1234,1235],{},"Total",[10,1237,1238],{},"Practical recommendation: start with the internal stack. When operations grow beyond what it covers — usually past 50 jobs or log retention beyond 6 months — export to self-hosted Loki and Grafana. Commercial SaaS only pays off when team time is more expensive than the bill.",[17,1240,1242],{"id":1241},"next-steps","Next steps",[276,1244,1245,1254],{},[279,1246,1247,1248,1253],{},"Configure ",[1249,1250,1252],"a",{"href":1251},"#alertas","alerts wired to Slack or PagerDuty"," before the first critical deploy.",[279,1255,1256,1257,1261],{},"Review ",[1249,1258,1260],{"href":1259},"\u002Fen\u002Fdocs\u002Fsecurity\u002Frbac","RBAC"," to limit who sees which logs (logs may contain sensitive data).",[1263,1264,1265],"style",{},"html pre.shiki code .sQhOw, html code.shiki .sQhOw{--shiki-default:#FFA657}html pre.shiki code .sFSAA, html code.shiki .sFSAA{--shiki-default:#79C0FF}html pre.shiki code .s9uIt, html code.shiki .s9uIt{--shiki-default:#A5D6FF}html pre.shiki code .sZEs4, html code.shiki .sZEs4{--shiki-default:#E6EDF3}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sPWt5, html code.shiki .sPWt5{--shiki-default:#7EE787}html pre.shiki code .sH3jZ, html code.shiki .sH3jZ{--shiki-default:#8B949E}html pre.shiki code .suJrU, html code.shiki .suJrU{--shiki-default:#FF7B72}",{"title":40,"searchDepth":188,"depth":188,"links":1267},[1268,1273,1274,1281,1282,1283,1284],{"id":19,"depth":188,"text":20,"children":1269},[1270,1271,1272],{"id":24,"depth":196,"text":25},{"id":83,"depth":196,"text":84},{"id":158,"depth":196,"text":159},{"id":270,"depth":188,"text":271},{"id":299,"depth":188,"text":300,"children":1275},[1276,1277,1278,1279,1280],{"id":303,"depth":196,"text":304},{"id":310,"depth":196,"text":311},{"id":397,"depth":196,"text":398},{"id":508,"depth":196,"text":509},{"id":593,"depth":196,"text":594},{"id":796,"depth":188,"text":797},{"id":1033,"depth":188,"text":1034},{"id":1142,"depth":188,"text":1143},{"id":1241,"depth":188,"text":1242},"observabilidade","Collect metrics, logs, and traces without standing up an external observability stack. When it's worth it, and when to integrate with an outside tool.",false,"md","i-lucide-activity","2026-04-26",{},"\u002Fen\u002Fdocs\u002Fobservability\u002Fmetrics-logs",[],"10 min read",{"title":5,"description":1286},"en\u002Fdocs\u002Fobservability\u002Fmetrics-logs",[19,299,1298,1299,796],"prometheus","opentelemetry","w_6cwcTz1ZG-xxdtOSCvdLKVWQ8lnaUBd2mN3Im5Gtc",[1302,1308,1314,1319,1325,1330,1335,1336,1342,1347,1352,1356,1361,1366],{"path":1303,"title":1304,"description":1305,"category":1306,"order":47,"icon":1307},"\u002Fen\u002Fdocs\u002Fapi\u002Fapi-reference","REST API reference","Endpoints, JWT authentication, curl examples, and error patterns of the HeroCtl API.","api","i-lucide-code",{"path":1309,"title":1310,"description":1311,"category":1312,"order":47,"icon":1313},"\u002Fen\u002Fdocs\u002Fdeploy\u002Ffirst-deploy","Deploy your first app","Bring up a Node.js application with a Postgres database in 50 lines of YAML. Includes health check, rolling deploy, and rollback.","deploy","i-lucide-rocket",{"path":1315,"title":1316,"description":1317,"category":1312,"order":188,"icon":1318},"\u002Fen\u002Fdocs\u002Fdeploy\u002Frolling-canary-blue-green","Rolling, canary, blue-green, and rainbow","Four deploy strategies. When to use each, with complete examples and honest trade-offs.","i-lucide-git-branch",{"path":1320,"title":1321,"description":1322,"category":1323,"order":188,"icon":1324},"\u002Fen\u002Fdocs\u002Fnetworking\u002Ffirewall","Firewall configuration","Which ports HeroCtl uses, which need to stay open, and which should never be exposed to the internet.","rede","i-lucide-shield",{"path":1326,"title":1327,"description":1328,"category":1323,"order":47,"icon":1329},"\u002Fen\u002Fdocs\u002Fnetworking\u002Fingress-tls","Ingress and automatic TLS","How to expose applications on port 443 with certificates issued and renewed automatically, without operating an external router.","i-lucide-globe",{"path":1331,"title":1332,"description":1333,"category":1285,"order":188,"icon":1334},"\u002Fen\u002Fdocs\u002Fobservability\u002Fbackup-restore","Backup and restore of cluster state","How to save, schedule, and restore HeroCtl control plane snapshots. Disaster recovery strategy.","i-lucide-archive",{"path":1292,"title":5,"description":1286,"category":1285,"order":47,"icon":1289},{"path":1337,"title":1338,"description":1339,"category":1340,"order":196,"icon":1341},"\u002Fen\u002Fdocs\u002Foperations\u002Fcli-reference","Complete CLI reference","All heroctl commands with synopsis, flags, and example. Use as a desk reference.","operacoes","i-lucide-terminal",{"path":1343,"title":1344,"description":1345,"category":1340,"order":188,"icon":1346},"\u002Fen\u002Fdocs\u002Foperations\u002Ffirst-cluster","Bring up a 3-node cluster","Form a cluster with 3 servers in under 10 minutes. Tolerates 1-node failure with no downtime.","i-lucide-network",{"path":1348,"title":1349,"description":1350,"category":1340,"order":47,"icon":1351},"\u002Fen\u002Fdocs\u002Foperations\u002Finstallation","Installation","Install HeroCtl on any Linux server with Docker in a single command. Covers prerequisites, bootstrap, and verification.","i-lucide-download",{"path":1353,"title":1354,"description":1355,"category":1340,"order":207,"icon":1329},"\u002Fen\u002Fdocs\u002Foperations\u002Fmulti-region","Multi-region (planned for Q4 2026)","What to expect from multi-region in HeroCtl, how to run across regions today, and the roadmap through 2027.",{"path":1259,"title":1357,"description":1358,"category":1359,"order":188,"icon":1360},"RBAC and access control (Business+)","Role, policy, and token model to limit who can submit, read, and operate the cluster.","seguranca","i-lucide-users",{"path":1362,"title":1363,"description":1364,"category":1359,"order":47,"icon":1365},"\u002Fen\u002Fdocs\u002Fsecurity\u002Fsecrets","Secret management","How to keep passwords, tokens, and keys outside the job spec, with encryption at rest and versioned rotation.","i-lucide-key",{"path":1367,"title":1368,"description":1369,"category":1370,"order":47,"icon":1371},"\u002Fen\u002Fdocs\u002Ftroubleshooting\u002Fcommon-problems","Troubleshooting common problems","The 12 most frequent problems in HeroCtl clusters, with symptom, diagnosis, and step-by-step fix.","troubleshooting","i-lucide-alert-triangle",1777362181778]