Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Streaming child nodes, parent is not catching up #18100

Open
Forza-tng opened this issue Jul 10, 2024 · 13 comments
Open

[Bug]: Streaming child nodes, parent is not catching up #18100

Forza-tng opened this issue Jul 10, 2024 · 13 comments
Assignees
Labels
bug cannot reproduce This is to tag issues we weren't able to reproduce the problem and fix it need feedback needs triage Issues which need to be manually labelled

Comments

@Forza-tng
Copy link

Bug description

The data from the streaming child node is not updated in realtime, and it does not catch up - the delay is increasing more and more over time.

Expected behavior

Real-time update of child nodes charts on the parent.

Steps to reproduce

Using Alpine Linux 3.20.1

  1. Install two nodesm; one parent and one child
  2. Set up streaming from the child to the parent
  3. Start the parent
  4. Start the child
  5. Watch the charts from the parent webui.
    ...

Installation method

from git

System info

Linux 6.6.33-0-virt #1-Alpine SMP PREEMPT_DYNAMIC Thu, 13 Jun 2024 07:49:22  0000 x86_64 Linux
/etc/alpine-release:3.20.1
/etc/os-release:NAME="Alpine Linux"
/etc/os-release:ID=alpine
/etc/os-release:VERSION_ID=3.20.1
/etc/os-release:PRETTY_NAME="Alpine Linux v3.20"

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.46.0-103-nightly
    Installation Type __________________________________________ : custom
    Package Architecture _______________________________________ : unknown
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /opt/netdata/etc/netdata
    Stock Configurations _______________________________________ : /opt/netdata/usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /opt/netdata/var/cache/netdata
    Permanent Databases ________________________________________ : /opt/netdata/var/lib/netdata
    Plugins ____________________________________________________ : /opt/netdata/usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /opt/netdata/usr/share/netdata/web
    Log Files __________________________________________________ : /opt/netdata/var/log/netdata
    Lock Files _________________________________________________ : /opt/netdata/var/lib/netdata/lock
    Home _______________________________________________________ : /opt/netdata/var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.6.33-0-virt
    Operating System ___________________________________________ : Alpine Linux
    Operating System ID ________________________________________ : alpine
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : unknown
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 4
    CPU Frequency ______________________________________________ : 2794000000
    RAM Bytes __________________________________________________ : 986451968
    Disk Capacity ______________________________________________ : 6442450944
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : unknown
    Virtualization Detection ___________________________________ : none
Container:
    Container __________________________________________________ : unknown
    Container Detection ________________________________________ : none
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip brotli)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine (compression) _____________________________________ : YES (zstd lz4)
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Brotli (generic-purpose lossless compression algorithm) ____ : YES
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : YES
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : NO
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : NO
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

I have two nodes on the same network. The parent ihas IP 10.12.9.4 and the child is a VM with IP 10.12.9.11. No firewall enabled between them.

I use this script to build netdata from git sources:

#!/bin/bash
CPUFLAGS="-march=x86-64"
export CFLAGS="-O1 -ggdb $CPUFLAGS"
export CFLAGS="-O1 -ggdb $CPUFLAGS"
export CXXFLAGS="-O1 -ggdb $CPUFLAGS"
export NETDATA_CMAKE_OPTIONS="-DCMAKE_BUILD_TYPE=Debug"
export TMPDIR="/var/tmp/"

ND=`mktemp -d -p $TMPDIR netdata-build.XXXX` || exit 1
echo  Working dir is: "$ND"
pushd "$ND" > /dev/null
git clone https://github.com/netdata/netdata.git --depth=100 --recursive "$ND"
./netdata-installer.sh --enable-lto --enable-ml --build-json-c --disable-telemetry --install-prefix /opt --dont-start-it

The parent configs

netdata.conf


[directories]
    log = /var/log/netdata
    cache = /var/cache/netdata
    home = /opt/netdata

[web]
    default port = 19999
    bind to = *
    allow badges from = *
    allow streaming from = *
    enable gzip compression = no
    web server threads = 6
    web server max sockets = 1024

[logs]
    errors flood protection period = 120
    errors to trigger flood protection = 1000

[plugins]
    netdata monitoring extended = yes

stream.conf

[stream]
    enabled = no

[91DCD32F-8BF7-4573-816B-56468CBEF079]
    enabled = yes
    allow from = *
    default postpone alarms on connect seconds = 120

The child configs

netdata.conf

[global]
    run as user = netdata
    hostname = dxsrv11
[db]
    mode = ram
    retention = 3600
[ml]
    enabled = no

[health]
    enabled = no

[web]
    mode = none

[directories]
    config = /opt/netdata/etc/netdata
    log = /var/log/netdata
    cache = /var/cache/netdata
    home = /opt/netdata

[logs]
    errors flood protection period = 1200
    errors to trigger flood protection = 200

stream.conf

[stream]
    enabled = yes
    destination = 10.12.9.4
    api key = 91DCD32F-8BF7-4573-816B-56468CBEF079
    default port = 19999
    send charts matching = *
    # 10MiB buffer
    buffer size bytes = 10485760

Logs

The logs do not show very much.

On the parent node, onlyt the following is logged once the child node connects:

time=2024-07-10T12:45:42.001 02:00 comm=netdata source=daemon level=info tid=21251 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=52436 src_capabilities="VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP PROGRESS " msg="STREAM dxsrv11 [10.12.9.11]:52436: receive thread started"
time=2024-07-10T12:45:42.001 02:00 comm=netdata source=daemon level=info tid=21251 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=52436 src_capabilities="VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP PROGRESS " msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8d4ea1a4-3ea9-11ef-9a24-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-nightly', update every 1, memory mode dbengine, history entries 0, streaming disabled (to '' with api key ''), health enabled, cache_dir '(null)', alarms default handler '', alarms default recipient ''"
time=2024-07-10T12:45:42.024 02:00 comm=netdata source=daemon level=info tid=21251 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=52436 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " msg="STREAM dxsrv11 [receive from [10.12.9.11]:52436]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T12:45:42.024 02:00 comm=netdata source=daemon level=info tid=21251 thread=RCVR[dxsrv11] msg_id=ed4cdb8f1beb4ad3b57cb3cae2d162fa node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=52436 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " code=CONNECTED msg="STREAM_RECEIVER for 'dxsrv11': connected and ready to receive data "

The child log shows:

time=2024-07-10T12:45:41.220 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=34203  msg="Netdata agent version 'v1.46.0-120-nightly' is starting"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="IEEE754: system is using IEEE754 DOUBLE PRECISION values"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34203  msg="TIMEZONE: using strftime(): 'CEST'"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="TIMEZONE: fixed as 'CEST'"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: next: initialize signals"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize signals - next: initialize static threads"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize static threads - next: initialize web server"
time=2024-07-10T12:45:41.221 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize web server - next: initialize ML"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize ML - next: initialize h2o server"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize h2o server - next: set resource limits"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="resources control: allowed file descriptors: soft = 1024, max = 4096"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, set resource limits - next: become daemon"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="Out-Of-Memory (OOM) score is already set to the wanted value 0"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=34203  msg="Cannot adjust netdata scheduling policy to batch (3), with priority 0. Falling back to nice."
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=34203  msg="Cannot get my current process scheduling policy."
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="netdata started on pid 34203."
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, become daemon - next: initialize threads after fork"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize threads after fork - next: initialize registry"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="Registry is disabled - use the central netdata"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=34203  msg="NETDATA STARTUP: in       0 ms, initialize registry - next: fork the spawn server"
time=2024-07-10T12:45:41.222 02:00 comm=netdata source=daemon level=info tid=34203  msg="Initializing spawn client."
time=2024-07-10T12:45:41.223 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       1 ms, fork the spawn server - next: collecting system info"
time=2024-07-10T12:45:41.291 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in      67 ms, collecting system info - next: initialize RRD structures"
time=2024-07-10T12:45:41.291 02:00 comm=netdata source=daemon level=info tid=34203  msg="SQLite database /var/cache/netdata/netdata-meta.db initialization"
time=2024-07-10T12:45:41.298 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34203  msg="SQLite database initialization completed"
time=2024-07-10T12:45:41.298 02:00 comm=netdata source=daemon level=info tid=34203  msg="SQLite database /var/cache/netdata/context-meta.db initialization"
time=2024-07-10T12:45:41.303 02:00 comm=netdata source=daemon level=info tid=34203  msg="STREAM: added streaming destination No 1: '10.12.9.4' to host 'dxsrv11'"
time=2024-07-10T12:45:41.303 02:00 comm=netdata source=daemon level=info tid=34203  msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8d4ea1a4-3ea9-11ef-9a24-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-nightly', update every 1, memory mode ram, history entries 4096, streaming enabled (to '10.12.9.4' with api key '91DCD32F-8BF7-4573-816B-56468CBEF079'), health enabled, cache_dir '/var/cache/netdata', alarms default handler '', alarms default recipient ''"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info tid=34203  msg="Creating archived hosts"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info tid=34203  msg="Created 0 archived hosts"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info tid=34203  msg="ACLK sync initialization completed"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info tid=34348 thread=ACLKSYNC msg="Starting ACLK synchronization thread"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34203  msg="NETDATA STARTUP: in      14 ms, initialize RRD structures - next: check for incomplete shutdown"
time=2024-07-10T12:45:41.305 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34203  msg="NETDATA STARTUP: in       0 ms, check for incomplete shutdown - next: collect claiming info"
time=2024-07-10T12:45:41.306 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34203  msg="Unable to load '/opt/netdata/var/lib/netdata/cloud.d/claimed_id', setting state to AGENT_UNCLAIMED"
time=2024-07-10T12:45:41.306 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       0 ms, collect claiming info - next: collect host labels"
time=2024-07-10T12:45:41.308 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       2 ms, collect host labels - next: start the static threads"
time=2024-07-10T12:45:41.309 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=34381 thread=EXPORTING msg="CONFIG: cannot load user exporting config '/opt/netdata/etc/netdata/exporting.conf'. Will try the stock version."
time=2024-07-10T12:45:41.309 02:00 comm=netdata source=daemon level=info tid=34383 thread=ACLK_MAIN msg="Waiting for Cloud to be enabled"
time=2024-07-10T12:45:41.309 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       1 ms, start the static threads - next: initialize commands API"
time=2024-07-10T12:45:41.311 02:00 comm=netdata source=daemon level=info tid=34203  msg="Initializing command server."
time=2024-07-10T12:45:41.311 02:00 comm=netdata source=daemon level=info tid=34381 thread=EXPORTING msg="No connector instances to activate"
time=2024-07-10T12:45:41.311 02:00 comm=netdata source=daemon level=info tid=34381 thread=EXPORTING msg="EXPORTING: no exporting connectors configured"
time=2024-07-10T12:45:41.311 02:00 comm=netdata source=daemon level=info tid=34381 thread=EXPORTING msg="cleaning up..."
time=2024-07-10T12:45:41.312 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: in       2 ms, initialize commands API - next: ready"
time=2024-07-10T12:45:41.314 02:00 comm=netdata source=daemon level=info tid=34203  msg="NETDATA STARTUP: completed in 93 ms. Enjoy real-time performance monitoring!"
time=2024-07-10T12:45:41.379 02:00 comm=netdata source=daemon level=error tid=34388 thread=P[tc] msg="child pid 34399 exited with code 1."
time=2024-07-10T12:45:41.379 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=34203  msg="SIGNAL: waitid(34399): failed - it seems the child is already reaped"
time=2024-07-10T12:45:41.391 02:00 comm=netdata source=daemon level=info tid=34400 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T12:45:41.391 02:00 comm=netdata source=daemon level=error tid=34400 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 34434 exited with code 1."
time=2024-07-10T12:45:41.391 02:00 comm=netdata source=daemon level=error tid=34400 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 34434) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T12:45:41.392 02:00 comm=netdata source=daemon level=info tid=34408 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T12:45:41.392 02:00 comm=netdata source=daemon level=error tid=34408 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 34432 exited with code 1."
time=2024-07-10T12:45:41.392 02:00 comm=netdata source=daemon level=error tid=34408 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 34432) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T12:45:41.393 02:00 comm=netdata source=daemon level=error tid=34407 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11   src_transport=pluginsd request="'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM'" msg="PLUGINSD: parser_action('ERROR') failed on line 1: { 'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM' } (quotes added to show parsing)"
time=2024-07-10T12:45:41.393 02:00 comm=netdata source=daemon level=error tid=34407 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 34436 exited with code 1."
time=2024-07-10T12:45:41.393 02:00 comm=netdata source=daemon level=error tid=34407 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 34436) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T12:45:41.499 02:00 comm=netdata source=daemon level=info tid=34406 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T12:45:41.500 02:00 comm=netdata source=daemon level=info tid=34406 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/charts.d.plugin' (pid 34430) does not generate useful output but it reports success (exits with 0). Will not start it again - it is now disabled.."
time=2024-07-10T12:45:42.000 02:00 comm=netdata source=daemon level=notice tid=34377 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-10T12:45:42.000 02:00 comm=netdata source=daemon level=info tid=34584 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="V1 V2 VN VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP BROTLI PROGRESS " dst_transport=http   msg="STREAM dxsrv11 [send]: thread created (task id 34584)"
time=2024-07-10T12:45:42.024 02:00 comm=netdata source=daemon level=info tid=34584 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T12:45:42.311 02:00 comm=netdata source=daemon level=info tid=34394 thread="P[proc netdev]" msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-10T12:45:42.314 02:00 comm=netdata source=daemon level=info tid=34390 thread=P[proc] module=proc.plugin[/proc/uptime] msg="Using now_boottime_usec() for uptime (dt is 5 ms)"
time=2024-07-10T12:45:53.336 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34599 exited with code 1."
time=2024-07-10T12:45:53.353 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34617 exited with code 1."
time=2024-07-10T12:45:53.360 02:00 comm=netdata source=daemon level=info tid=34203  msg="SIGNAL: reap_child(34629) exited with code: 0"
time=2024-07-10T12:45:53.371 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34633 exited with code 1."
time=2024-07-10T12:45:53.378 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=34203  msg="SIGNAL: waitid(34647): failed - it seems the child is already reaped"
time=2024-07-10T12:45:53.388 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34651 exited with code 1."
time=2024-07-10T12:45:53.394 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=34203  msg="SIGNAL: waitid(34663): failed - it seems the child is already reaped"
time=2024-07-10T12:45:53.405 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34667 exited with code 1."
time=2024-07-10T12:45:53.428 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34683 exited with code 1."
time=2024-07-10T12:45:53.435 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=34203  msg="SIGNAL: waitid(34707): failed - it seems the child is already reaped"
time=2024-07-10T12:45:53.459 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34711 exited with code 1."
time=2024-07-10T12:45:53.476 02:00 comm=netdata source=daemon level=error tid=34402 thread=P[cgroupsdisc] msg="child pid 34749 exited with code 1."
time=2024-07-10T12:46:43.829 02:00 comm=netdata source=daemon level=info tid=34387 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 790 replication requests, 0 charts pending replication"
time=2024-07-10T12:47:41.947 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T12:47:41.948 02:00 comm=netdata source=daemon level=info tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 34395) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T12:49:52.604 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T12:49:52.604 02:00 comm=netdata source=daemon level=info tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 34789) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T12:49:52.604 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=34203  msg="SIGNAL: waitid(34789): failed - it seems the child is already reaped"
time=2024-07-10T12:52:03.258 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T12:52:03.258 02:00 comm=netdata source=daemon level=info tid=34393 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 34822) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."

Charts

This is how the charts behave for the child node.

Immediately after starting the child:

image

After a new minutes

image

After some more time

image

After some more time

image

Parent node

There are no gaps in any charts on the parent node. It behaves normally with no apparent issues:
image

@Forza-tng Forza-tng added bug needs triage Issues which need to be manually labelled labels Jul 10, 2024
@ilyam8
Copy link
Member

ilyam8 commented Jul 10, 2024

Hi, @Forza-tng. Why do you use your script instead of the netdata installer? Can you try the same setup using static build for example?

@Forza-tng
Copy link
Author

Forza-tng commented Jul 10, 2024

Hi, @Forza-tng. Why do you use your script instead of the netdata installer? Can you try the same setup using static build for example?

How do you mean? The script is the same as described here: https://learn.netdata.cloud/docs/developer-and-contributor-corner/install-the-netdata-agent-from-a-git-checkout

I see build instructions for static builds, but where would I download the ones you publish? https://learn.netdata.cloud/docs/netdata-agent/installation/linux/static-binary-linux-packages

@ilyam8
Copy link
Member

ilyam8 commented Jul 10, 2024

You use kickstart with --static-only.

where would I download the ones you publish

GitHub. Check v1.46.1 assets.

@Forza-tng
Copy link
Author

You use kickstart with --static-only.

where would I download the ones you publish

GitHub. Check v1.46.1 assets.

Thank you.

I did rm -rf /opt/netdata and then ./kickstart.sh --static-only --no-updates --dont-start-it --disable-telemetry

Unfortunately, the problem is exactly the same. Is there a way to debug the streaming parts to see why it is falling behind with the updates? Perhaps change to another method of sending the data, etc?

image

Log on the parent after starting the child:

time=2024-07-10T13:50:06.001 02:00 comm=netdata source=daemon level=info tid=2298 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=58106 src_capabilities="VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYN
CFG SLOTS ZSTD GZIP PROGRESS " msg="STREAM dxsrv11 [10.12.9.11]:58106: receive thread started"
time=2024-07-10T13:50:06.001 02:00 comm=netdata source=daemon level=info tid=2298 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=58106 src_capabilities="VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYN
CFG SLOTS ZSTD GZIP PROGRESS " msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8c3f660a-3eb2-11ef-a617-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-g0cce79dfd', update every 1, memory mode dbengine
, history entries 0, streaming disabled (to '' with api key ''), health enabled, cache_dir '(null)', alarms default handler '', alarms default recipient ''"
time=2024-07-10T13:50:06.001 02:00 comm=netdata source=daemon level=info tid=2298 thread=RCVR[dxsrv11] node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=58106 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG
SLOTS ZSTD PROGRESS " msg="STREAM dxsrv11 [receive from [10.12.9.11]:58106]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T13:50:06.001 02:00 comm=netdata source=daemon level=info tid=2298 thread=RCVR[dxsrv11] msg_id=ed4cdb8f1beb4ad3b57cb3cae2d162fa node=dxsrv11 src_transport=http src_ip=10.12.9.11 src_port=58106 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICA
TION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " code=CONNECTED msg="STREAM_RECEIVER for 'dxsrv11': connected and ready to receive data "
# /opt/netdata/usr/sbin/netdata -W buildinfo
Packaging:
    Netdata Version ____________________________________________ : v1.46.0-120-g0cce79dfd
    Installation Type __________________________________________ : kickstart-static
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /opt/netdata/etc/netdata
    Stock Configurations _______________________________________ : /opt/netdata/usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /opt/netdata/var/cache/netdata
    Permanent Databases ________________________________________ : /opt/netdata/var/lib/netdata
    Plugins ____________________________________________________ : /opt/netdata/usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /opt/netdata/usr/share/netdata/web
    Log Files __________________________________________________ : /opt/netdata/var/log/netdata
    Lock Files _________________________________________________ : /opt/netdata/var/lib/netdata/lock
    Home _______________________________________________________ : /opt/netdata/var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.6.33-0-virt
    Operating System ___________________________________________ : Alpine Linux
    Operating System ID ________________________________________ : alpine
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : unknown
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 4
    CPU Frequency ______________________________________________ : 2794000000
    RAM Bytes __________________________________________________ : 986451968
    Disk Capacity ______________________________________________ : 6442450944
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : unknown
    Virtualization Detection ___________________________________ : none
Container:
    Container __________________________________________________ : unknown
    Container Detection ________________________________________ : none
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine (compression) _____________________________________ : YES (zstd lz4)
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Brotli (generic-purpose lossless compression algorithm) ____ : NO
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libyaml (library for parsing and emitting YAML) ____________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : NO
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : NO
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : NO
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

@stelfrag
Copy link
Collaborator

@Forza-tng , thank you -- will try to reproduce.

In the mean time, is it possible (as a test) to attempt to run the child with mode = dbengine

@stelfrag stelfrag self-assigned this Jul 10, 2024
@Forza-tng
Copy link
Author

@Forza-tng , thank you -- will try to reproduce.

In the mean time, is it possible (as a test) to attempt to run the child with mode = dbengine

Sure. I'll check it when I'm back at work again tomorrow.

@ilyam8 ilyam8 added the cannot reproduce This is to tag issues we weren't able to reproduce the problem and fix it label Jul 10, 2024
@ilyam8
Copy link
Member

ilyam8 commented Jul 10, 2024

@stelfrag I can't reproduce the issue:

  • Created 2 Alpine 3.20.1 VMs.
  • Configured Parent/Child (static) as described in the OP.

No issues.

@Forza-tng
Copy link
Author

Forza-tng commented Jul 10, 2024

@stelfrag I can't reproduce the issue:

  • created 2 Alpine 3.20.1 VMs.
  • configure Parent/Child as described in the OP.

No issues.

Thank you for checking. The host here is a bare metal storage server with a bonded nic. But i have no drops logged on the hardware level - still, something in the network could be affecting things. I attempted to save that tcp payload, but it's binary. Is there a way to send plaintext data or decode the communication in other ways?

Update: I enabled dbengine on the child this morning.

This is the complete log since I started the child yesterday
time=2024-07-10T13:50:04.925 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=36313  msg="Netdata agent version 'v1.46.0-120-g0cce79dfd' is starting"
time=2024-07-10T13:50:04.925 02:00 comm=netdata source=daemon level=info tid=36313  msg="IEEE754: system is using IEEE754 DOUBLE PRECISION values"
time=2024-07-10T13:50:04.925 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=36313  msg="TIMEZONE: using strftime(): 'CEST'"
time=2024-07-10T13:50:04.925 02:00 comm=netdata source=daemon level=info tid=36313  msg="TIMEZONE: fixed as 'CEST'"
time=2024-07-10T13:50:04.926 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=36313  msg="NETDATA STARTUP: next: initialize signals"
time=2024-07-10T13:50:04.926 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize signals - next: initialize static threads"
time=2024-07-10T13:50:04.926 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize static threads - next: initialize web server"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize web server - next: initialize ML"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize ML - next: initialize h2o server"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize h2o server - next: set resource limits"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="resources control: allowed file descriptors: soft = 1024, max = 4096"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, set resource limits - next: become daemon"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="Out-Of-Memory (OOM) score is already set to the wanted value 0"
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=36313  msg="Cannot adjust netdata scheduling policy to batch (3), with priority 0. Falling back to nice."
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=36313  msg="Cannot get my current process scheduling policy."
time=2024-07-10T13:50:04.927 02:00 comm=netdata source=daemon level=info tid=36313  msg="netdata started on pid 36315."
time=2024-07-10T13:50:04.928 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, become daemon - next: initialize threads after fork"
time=2024-07-10T13:50:04.928 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize threads after fork - next: initialize registry"
time=2024-07-10T13:50:04.928 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize registry - next: fork the spawn server"
time=2024-07-10T13:50:04.928 02:00 comm=netdata source=daemon level=info tid=36313  msg="Initializing spawn client."
time=2024-07-10T13:50:04.928 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, fork the spawn server - next: collecting system info"
time=2024-07-10T13:50:04.984 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in      55 ms, collecting system info - next: initialize RRD structures"
time=2024-07-10T13:50:04.984 02:00 comm=netdata source=daemon level=info tid=36313  msg="SQLite database /var/cache/netdata/netdata-meta.db initialization"
time=2024-07-10T13:50:04.991 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36313  msg="SQLite database initialization completed"
time=2024-07-10T13:50:04.991 02:00 comm=netdata source=daemon level=info tid=36313  msg="SQLite database /var/cache/netdata/context-meta.db initialization"
time=2024-07-10T13:50:04.996 02:00 comm=netdata source=daemon level=info tid=36313  msg="STREAM: added streaming destination No 1: '10.12.9.4' to host 'dxsrv11'"
time=2024-07-10T13:50:04.996 02:00 comm=netdata source=daemon level=info tid=36313  msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8c3f660a-3eb2-11ef-a617-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-g0cce79dfd', update every 1, memory mode ram, history entries 4096, streaming enabled (to '10.12.9.4' with api key '91DCD32F-8BF7-4573-816B-56468CBEF079'), health enabled, cache_dir '/var/cache/netdata', alarms default handler '', alarms default recipient ''"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info tid=36313  msg="Creating archived hosts"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info tid=36313  msg="Created 0 archived hosts"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info tid=36313  msg="ACLK sync initialization completed"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info tid=36460 thread=ACLKSYNC msg="Starting ACLK synchronization thread"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36313  msg="NETDATA STARTUP: in      13 ms, initialize RRD structures - next: check for incomplete shutdown"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36313  msg="NETDATA STARTUP: in       0 ms, check for incomplete shutdown - next: collect claiming info"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36313  msg="Unable to load '/opt/netdata/var/lib/netdata/cloud.d/claimed_id', setting state to AGENT_UNCLAIMED"
time=2024-07-10T13:50:04.997 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, collect claiming info - next: collect host labels"
time=2024-07-10T13:50:04.999 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       1 ms, collect host labels - next: start the static threads"
time=2024-07-10T13:50:05.000 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36492 thread=EXPORTING msg="CONFIG: cannot load user exporting config '/opt/netdata/etc/netdata/exporting.conf'. Will try the stock version."
time=2024-07-10T13:50:05.000 02:00 comm=netdata source=daemon level=info tid=36495 thread=ACLK_MAIN msg="Waiting for Cloud to be enabled"
time=2024-07-10T13:50:05.000 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36493 thread=WEB[1] msg="To use encryption it is necessary to set \"ssl certificate\" and \"ssl key\" in [web] !\u000A"
time=2024-07-10T13:50:05.000 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="starting worker 2"
time=2024-07-10T13:50:05.000 02:00 comm=netdata source=daemon level=info tid=36492 thread=EXPORTING msg="No connector instances to activate"
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36492 thread=EXPORTING msg="EXPORTING: no exporting connectors configured"
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36492 thread=EXPORTING msg="cleaning up..."
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       1 ms, start the static threads - next: initialize commands API"
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36313  msg="Initializing command server."
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="starting worker 3"
time=2024-07-10T13:50:05.001 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="starting worker 4"
time=2024-07-10T13:50:05.002 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: in       0 ms, initialize commands API - next: ready"
time=2024-07-10T13:50:05.002 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA STARTUP: completed in 77 ms. Enjoy real-time performance monitoring!"
time=2024-07-10T13:50:05.039 02:00 comm=netdata source=daemon level=error tid=36499 thread=P[tc] msg="child pid 36511 exited with code 1."
time=2024-07-10T13:50:05.040 02:00 comm=netdata source=daemon level=info tid=36521 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=error tid=36521 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36607 exited with code 1."
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=error tid=36521 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 36607) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=36313  msg="SIGNAL: waitid(36511): failed - it seems the child is already reaped"
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=info tid=36515 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=error tid=36515 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36608 exited with code 1."
time=2024-07-10T13:50:05.043 02:00 comm=netdata source=daemon level=error tid=36515 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 36608) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T13:50:05.046 02:00 comm=netdata source=daemon level=info tid=36313  msg="SIGNAL: reap_child(36626) exited with code: 1"
time=2024-07-10T13:50:05.046 02:00 comm=netdata source=daemon level=error tid=36522 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11   src_transport=pluginsd request="'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM'" msg="PLUGINSD: parser_action('ERROR') failed on line 1: { 'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM' } (quotes added to show parsing)"
time=2024-07-10T13:50:05.046 02:00 comm=netdata source=daemon level=error tid=36522 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36626 exited with code 1."
time=2024-07-10T13:50:05.046 02:00 comm=netdata source=daemon level=error tid=36522 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 36626) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-10T13:50:05.073 02:00 comm=netdata source=daemon level=info tid=36500 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-10T13:50:05.073 02:00 comm=netdata source=daemon level=info tid=36500 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/charts.d.plugin' (pid 36519) does not generate useful output but it reports success (exits with 0). Will not start it again - it is now disabled.."
time=2024-07-10T13:50:06.000 02:00 comm=netdata source=daemon level=notice tid=36488 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-10T13:50:06.000 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="V1 V2 VN VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP PROGRESS " dst_transport=http   msg="STREAM dxsrv11 [send]: thread created (task id 36698)"
time=2024-07-10T13:50:06.002 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T13:50:06.004 02:00 comm=netdata source=daemon level=info tid=36486 thread=P[idlejitter] msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-10T13:50:06.253 02:00 comm=netdata source=daemon level=info tid=36502 thread=P[proc] module=proc.plugin[/proc/uptime] msg="Using now_boottime_usec() for uptime (dt is 5 ms)"
time=2024-07-10T13:50:06.303 02:00 comm=netdata source=daemon level=error tid=36512 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 instance=system.softirq_latency context=system.softirq_latency src_transport=pluginsd  msg="PARSER: read failed: POLLHUP."
time=2024-07-10T13:50:06.418 02:00 comm=netdata source=daemon level=error tid=36512 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36606 killed by signal 8."
time=2024-07-10T13:50:06.418 02:00 comm=netdata source=daemon level=info tid=36512 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin' (pid 36606) was killed with SIGTERM. Disabling it."
time=2024-07-10T13:50:16.307 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36706 exited with code 1."
time=2024-07-10T13:50:16.324 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36724 exited with code 1."
time=2024-07-10T13:50:16.340 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36740 exited with code 1."
time=2024-07-10T13:50:16.355 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36758 exited with code 1."
time=2024-07-10T13:50:16.370 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36774 exited with code 1."
time=2024-07-10T13:50:16.392 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36790 exited with code 1."
time=2024-07-10T13:50:16.417 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36820 exited with code 1."
time=2024-07-10T13:50:16.432 02:00 comm=netdata source=daemon level=error tid=36605 thread=P[cgroupsdisc] msg="child pid 36854 exited with code 1."
time=2024-07-10T13:51:06.510 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 778 replication requests, 0 charts pending replication"
time=2024-07-10T13:52:05.674 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T13:52:05.674 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 36609) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T13:54:16.311 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T13:54:16.311 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 36889) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T13:56:26.963 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T13:56:26.963 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 36915) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T13:56:26.963 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=36313  msg="SIGNAL: waitid(36915): failed - it seems the child is already reaped"
time=2024-07-10T13:58:37.616 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T13:58:37.616 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37078) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:00:48.262 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:00:48.262 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37105) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:02:58.916 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:02:58.916 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37158) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:05:09.563 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:05:09.563 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37188) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:07:20.207 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:07:20.207 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37215) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:09:30.863 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:09:30.863 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37241) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:11:41.519 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:11:41.519 02:00 comm=netdata source=daemon level=info tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37271) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-10T14:13:52.169 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-10T14:13:52.169 02:00 comm=netdata source=daemon level=error tid=36520 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:'dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 37297) does not generate useful output, although it reports success (exits with 0).We have tried to collect something 11 times - unsuccessfully. Disabling it."
time=2024-07-10T14:13:52.169 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=36313  msg="SIGNAL: waitid(37297): failed - it seems the child is already reaped"
time=2024-07-10T18:04:25.659 02:00 comm=netdata source=daemon level=error tid=36698 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: buffer full (allocated 100000000 bytes) after sending 100233568 bytes. Restarting connection"
time=2024-07-10T18:04:26.000 02:00 comm=netdata source=daemon level=notice tid=36488 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-10T18:06:25.798 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T18:06:26.000 02:00 comm=netdata source=daemon level=info tid=36488 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-10T18:42:59.765 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 7626 replication requests, 174 charts pending replication"
time=2024-07-10T18:44:49.029 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 174 charts pending replication"
time=2024-07-10T18:45:53.290 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 174 charts pending replication"
time=2024-07-10T18:51:00.232 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 171 replication requests, 98 charts pending replication"
time=2024-07-10T18:54:09.502 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-10T18:55:59.765 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-10T19:05:32.399 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 95 replication requests, 98 charts pending replication"
time=2024-07-10T19:11:41.685 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-10T19:15:17.961 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-10T19:33:55.007 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 95 replication requests, 98 charts pending replication"
time=2024-07-10T19:45:55.325 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-10T19:52:54.617 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-10T20:29:13.503 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 95 replication requests, 98 charts pending replication"
time=2024-07-10T23:03:13.339 02:00 comm=netdata source=daemon level=error tid=36698 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: buffer full (allocated 100000000 bytes) after sending 115288279 bytes. Restarting connection"
time=2024-07-10T23:03:13.747 02:00 comm=netdata source=daemon level=notice tid=36486 thread=P[idlejitter] msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-10T23:05:13.479 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-10T23:05:14.000 02:00 comm=netdata source=daemon level=info tid=36488 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-10T23:41:33.766 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 7447 replication requests, 117 charts pending replication"
time=2024-07-10T23:42:54.101 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 117 charts pending replication"
time=2024-07-10T23:44:53.366 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 117 charts pending replication"
time=2024-07-10T23:49:42.918 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 24 replication requests, 98 charts pending replication"
time=2024-07-10T23:52:17.365 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-10T23:56:03.640 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T00:05:20.004 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 5 replication requests, 98 charts pending replication"
time=2024-07-11T00:10:26.642 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-11T00:17:47.936 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T00:35:49.375 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 5 replication requests, 98 charts pending replication"
time=2024-07-11T00:45:52.450 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-11T01:00:09.788 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T01:35:15.474 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 5 replication requests, 98 charts pending replication"
time=2024-07-11T04:00:41.419 02:00 comm=netdata source=daemon level=error tid=36698 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: buffer full (allocated 100000000 bytes) after sending 114777253 bytes. Restarting connection"
time=2024-07-11T04:00:42.000 02:00 comm=netdata source=daemon level=notice tid=36488 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-11T04:02:41.557 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-11T04:02:41.750 02:00 comm=netdata source=daemon level=info tid=36486 thread=P[idlejitter] msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-11T04:39:15.763 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 7444 replication requests, 149 charts pending replication"
time=2024-07-11T04:41:04.259 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 5 replication requests, 149 charts pending replication"
time=2024-07-11T04:42:14.637 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 149 charts pending replication"
time=2024-07-11T04:47:33.407 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 52 replication requests, 98 charts pending replication"
time=2024-07-11T04:49:59.671 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-11T04:50:49.971 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 3 replication requests, 98 charts pending replication"
time=2024-07-11T04:53:01.483 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-11T05:03:01.794 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T05:07:46.073 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-11T05:09:22.415 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 3 replication requests, 98 charts pending replication"
time=2024-07-11T05:13:46.228 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-11T05:33:13.589 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T05:42:23.900 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 2 replication requests, 98 charts pending replication"
time=2024-07-11T05:45:33.340 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 3 replication requests, 98 charts pending replication"
time=2024-07-11T05:54:09.700 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 92 replication requests, 98 charts pending replication"
time=2024-07-11T06:32:03.170 02:00 comm=netdata source=daemon level=info tid=36497 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 98 charts pending replication"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info errno="4, Interrupted system call" tid=36313  msg="SIGNAL: Received SIGTERM. Cleaning up to exit..."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36313  msg="Shutting down command server."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=36508 thread=DAEMON_COMMAND msg="Shutting down command event loop."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36508 thread=DAEMON_COMMAND msg="Shutting down command loop complete."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36313  msg="Command server has stopped."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36313  msg="NETDATA SHUTDOWN: initializing shutdown with code 0..."
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=error tid=36316 thread=P[WATCHER] msg="Shutdown process started"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 1 services [ MAINTENANCE ] to exit: 'SERVICE' (36490)"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [1/25] - 'create shutdown file' finished in 0 milliseconds"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [2/25] - 'dbengine exit mode' finished in 0 milliseconds"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [3/25] - 'close webrtc connections' finished in 0 milliseconds"
time=2024-07-11T08:27:32.400 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [4/25] - 'disable maintenance, new queries, new web requests, new streaming connections and aclk' finished in 0 milliseconds"
time=2024-07-11T08:27:33.069 02:00 comm=netdata source=daemon level=info tid=36460 thread=ACLKSYNC msg="ACLK SYNC: Shutting down ACLK synchronization event loop"
time=2024-07-11T08:27:33.252 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 5 services [ WEB_SERVER HEALTH ] to exit: 'HEALTH' (36487), 'WEB[1]' (36493), 'WEB[2]' (36507), 'WEB[3]' (36509), 'WEB[4]' (36514)"
time=2024-07-11T08:27:33.252 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [5/25] - 'stop maintenance thread' finished in 851 milliseconds"
time=2024-07-11T08:27:33.278 02:00 comm=netdata source=daemon level=info tid=36507 thread=WEB[2] msg="stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends"
time=2024-07-11T08:27:33.281 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends"
time=2024-07-11T08:27:33.281 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="closing all web server sockets..."
time=2024-07-11T08:27:33.281 02:00 comm=netdata source=daemon level=info tid=36493 thread=WEB[1] msg="all static web threads stopped."
time=2024-07-11T08:27:33.302 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 3 services [ WEB_SERVER HEALTH ] to exit: 'HEALTH' (36487), 'WEB[3]' (36509), 'WEB[4]' (36514)"
time=2024-07-11T08:27:33.329 02:00 comm=netdata source=daemon level=error errno="88, Not a socket" tid=36514 thread=WEB[4] msg="POLLFD: LISTENER: accept() failed."
time=2024-07-11T08:27:33.330 02:00 comm=netdata source=daemon level=info tid=36514 thread=WEB[4] msg="stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends"
time=2024-07-11T08:27:33.330 02:00 comm=netdata source=daemon level=error errno="88, Not a socket" tid=36509 thread=WEB[3] msg="POLLFD: LISTENER: accept() failed."
time=2024-07-11T08:27:33.330 02:00 comm=netdata source=daemon level=info tid=36509 thread=WEB[3] msg="stopped after 0 connects, 0 disconnects (max concurrent 0), 0 receptions and 0 sends"
time=2024-07-11T08:27:33.352 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 1 services [ HEALTH ] to exit: 'HEALTH' (36487)"
time=2024-07-11T08:27:34.209 02:00 comm=netdata source=daemon level=info tid=36487 thread=HEALTH msg="cleaning up..."
time=2024-07-11T08:27:34.254 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 17 services [ COLLECTORS STREAMING ] to exit: "
time=2024-07-11T08:27:34.254 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [6/25] - 'stop exporters, health and web servers threads' finished in 1001 milliseconds"
time=2024-07-11T08:27:34.256 02:00 comm=netdata source=daemon level=info tid=36494 thread=PD[apps] module=plugins.d[apps.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36498 killed by SIGTERM"
time=2024-07-11T08:27:34.266 02:00 comm=netdata source=daemon level=error errno="125, Operation canceled" tid=36513 thread=PD[go.d] module=plugins.d[go.d.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: thread cancelled while waiting for data."
time=2024-07-11T08:27:34.280 02:00 comm=netdata source=daemon level=info tid=36698 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send]: sending thread exits DISCONNECTED SHUTDOWN REQUESTED"
time=2024-07-11T08:27:34.289 02:00 comm=netdata source=daemon level=info tid=36489 thread=PLUGINSD msg="PLUGINSD: cleaning up..."
time=2024-07-11T08:27:34.289 02:00 comm=netdata source=daemon level=info tid=36489 thread=PLUGINSD msg="PLUGINSD: 'host:dxsrv11', stopping plugin thread: plugin:network-viewer"
time=2024-07-11T08:27:34.289 02:00 comm=netdata source=daemon level=info tid=36489 thread=PLUGINSD msg="PLUGINSD: 'host:dxsrv11', stopping plugin thread: plugin:debugfs"
time=2024-07-11T08:27:34.289 02:00 comm=netdata source=daemon level=info tid=36489 thread=PLUGINSD msg="PLUGINSD: cleanup completed."
time=2024-07-11T08:27:34.289 02:00 comm=netdata source=daemon level=info tid=36510 thread=PD[debugfs] module=plugins.d[debugfs.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36600 killed by SIGTERM"
time=2024-07-11T08:27:34.304 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 10 services [ COLLECTORS ] to exit: 'P[idlejitter]' (36486), 'STATS_GLOBAL' (36488), 'P[diskspace]' (36501), 'P[proc]' (36502), 'P[diskspace slo' (36503), 'P[cgroups]' (36504), 'P[proc netdev]' (36505), 'P[timex]' (36506), 'PD[network-view' (36516), 'STATSD_IN[1]' (36603)"
time=2024-07-11T08:27:34.342 02:00 comm=netdata source=daemon level=error errno="125, Operation canceled" tid=36516 thread=PD[network-view module=plugins.d[network-viewer.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: thread cancelled while waiting for data."
time=2024-07-11T08:27:34.342 02:00 comm=netdata source=daemon level=info tid=36516 thread=PD[network-view module=plugins.d[network-viewer.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 36611 killed by SIGTERM"
time=2024-07-11T08:27:34.354 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 7 services [ COLLECTORS ] to exit: 'P[idlejitter]' (36486), 'STATS_GLOBAL' (36488), 'P[diskspace]' (36501), 'P[proc]' (36502), 'P[diskspace slo' (36503), 'P[proc netdev]' (36505), 'P[timex]' (36506)"
time=2024-07-11T08:27:35.000 02:00 comm=netdata source=daemon level=info tid=36488 thread=STATS_GLOBAL msg="cleaning up..."
time=2024-07-11T08:27:35.005 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 6 services [ COLLECTORS ] to exit: 'P[idlejitter]' (36486), 'P[diskspace]' (36501), 'P[proc]' (36502), 'P[diskspace slo' (36503), 'P[proc netdev]' (36505), 'P[timex]' (36506)"
time=2024-07-11T08:27:35.206 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 5 services [ COLLECTORS ] to exit: 'P[diskspace]' (36501), 'P[proc]' (36502), 'P[diskspace slo' (36503), 'P[proc netdev]' (36505), 'P[timex]' (36506)"
time=2024-07-11T08:27:35.251 02:00 comm=netdata source=daemon level=info tid=36506 thread=P[timex] msg="cleaning up..."
time=2024-07-11T08:27:35.256 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 1 services [ REPLICATION ] to exit: 'REPLAY[1]' (36497)"
time=2024-07-11T08:27:35.256 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [7/25] - 'stop collectors and streaming threads' finished in 1002 milliseconds"
time=2024-07-11T08:27:35.857 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [8/25] - 'stop replication threads' finished in 601 milliseconds"
time=2024-07-11T08:27:35.858 02:00 comm=netdata source=daemon level=info tid=36313  msg="SERVICE CONTROL: waiting for the following 1 services [ CONTEXT ] to exit: 'RRDCONTEXT' (36496)"
time=2024-07-11T08:27:35.858 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [9/25] - 'prepare metasync shutdown' finished in 0 milliseconds"
time=2024-07-11T08:27:35.858 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [10/25] - 'disable ML detection and training threads' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36313  msg="All threads finished."
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [11/25] - 'stop context thread' finished in 400 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [12/25] - 'clear web client cache' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [13/25] - 'stop aclk threads' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [14/25] - 'stop all remaining worker threads' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [15/25] - 'cancel main threads' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [16/25] - 'flush dbengine tiers' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [17/25] - 'stop collection for all hosts' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36313  msg="No statements pending to finalize"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [18/25] - 'stop metasync threads' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [19/25] - 'wait for dbengine collectors to finish' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [20/25] - 'wait for dbengine main cache to finish flushing' finished in 0 milliseconds"
time=2024-07-11T08:27:36.259 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [21/25] - 'stop dbengine tiers' finished in 0 milliseconds"
time=2024-07-11T08:27:36.260 02:00 comm=netdata source=daemon level=info tid=36313  msg="CONTEXT: Closing sqlite database"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=info tid=36313  msg="METADATA: Closing sqlite database"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [22/25] - 'close SQL databases' finished in 2 milliseconds"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [23/25] - 'remove pid file' finished in 0 milliseconds"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [24/25] - 'free openssl structures' finished in 0 milliseconds"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=info tid=36316 thread=P[WATCHER] msg="shutdown step: [25/25] - 'remove incomplete shutdown file' finished in 0 milliseconds"
time=2024-07-11T08:27:36.262 02:00 comm=netdata source=daemon level=error tid=36316 thread=P[WATCHER] msg="Shutdown process ended in 3862 milliseconds"

########## RESTART - ENABLING dbengine #########

time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="Netdata agent version 'v1.46.0-120-g0cce79dfd' is starting"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info tid=51043  msg="IEEE754: system is using IEEE754 DOUBLE PRECISION values"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=51043  msg="TIMEZONE: using strftime(): 'CEST'"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info tid=51043  msg="TIMEZONE: fixed as 'CEST'"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=51043  msg="NETDATA STARTUP: next: initialize signals"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize signals - next: initialize static threads"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize static threads - next: initialize web server"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize web server - next: initialize ML"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize ML - next: initialize h2o server"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize h2o server - next: set resource limits"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="resources control: allowed file descriptors: soft = 1024, max = 4096"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, set resource limits - next: become daemon"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="Out-Of-Memory (OOM) score is already set to the wanted value 0"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=51043  msg="Cannot adjust netdata scheduling policy to batch (3), with priority 0. Falling back to nice."
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=51043  msg="Cannot get my current process scheduling policy."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="netdata started on pid 51045."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       1 ms, become daemon - next: initialize threads after fork"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize threads after fork - next: initialize registry"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize registry - next: fork the spawn server"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="Initializing spawn client."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, fork the spawn server - next: collecting system info"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      63 ms, collecting system info - next: initialize RRD structures"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database /var/cache/netdata/netdata-meta.db initialization"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="metadata database version is 18 (no migration needed)"
time=2024-07-11T08:28:35.522 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database initialization completed"
time=2024-07-11T08:28:35.523 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database /var/cache/netdata/context-meta.db initialization"
time=2024-07-11T08:28:35.523 02:00 comm=netdata source=daemon level=info tid=51043  msg="context database version is 1 (no migration needed)"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine-tier2"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine-tier2\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine-tier1"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine-tier2/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine-tier1\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 0, using 1 threads..."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine-tier2/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 2, using 1 threads..."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine-tier1/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine-tier1/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 1, using 1 threads..."
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 0 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 1 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 2 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="STREAM: added streaming destination No 1: '10.12.9.4' to host 'dxsrv11'"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8c3f660a-3eb2-11ef-a617-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-g0cce79dfd', update every 1, memory mode dbengine, history entries 0, streaming enabled (to '10.12.9.4' with api key '91DCD32F-8BF7-4573-816B-56468CBEF079'), health enabled, cache_dir '/var/cache/netdata', alarms default handler '', alarms default recipient ''"
time=2024-07-11T08:28:35.548 02:00 comm=netdata source=daemon level=info tid=51043  msg="Creating archived hosts"
time=2024-07-11T08:28:35.548 02:00 comm=netdata source=daemon level=info tid=51043  msg="Created 0 archived hosts"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="ACLK sync initialization completed"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      29 ms, initialize RRD structures - next: check for incomplete shutdown"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51043  msg="NETDATA STARTUP: in       0 ms, check for incomplete shutdown - next: collect claiming info"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51217 thread=ACLKSYNC msg="Starting ACLK synchronization thread"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51043  msg="Unable to load '/opt/netdata/var/lib/netdata/cloud.d/claimed_id', setting state to AGENT_UNCLAIMED"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, collect claiming info - next: collect host labels"
time=2024-07-11T08:28:35.551 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       2 ms, collect host labels - next: start the static threads"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51227 thread=EXPORTING msg="CONFIG: cannot load user exporting config '/opt/netdata/etc/netdata/exporting.conf'. Will try the stock version."
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51231 thread=ACLK_MAIN msg="Waiting for Cloud to be enabled"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="No connector instances to activate"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="EXPORTING: no exporting connectors configured"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51228 thread=WEB[1] msg="To use encryption it is necessary to set \"ssl certificate\" and \"ssl key\" in [web] !\u000A"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="cleaning up..."
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       1 ms, start the static threads - next: initialize commands API"
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51043  msg="Initializing command server."
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 2"
time=2024-07-11T08:28:35.554 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 3"
time=2024-07-11T08:28:35.554 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 4"
time=2024-07-11T08:28:35.609 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      55 ms, initialize commands API - next: ready"
time=2024-07-11T08:28:35.609 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: completed in 160 ms. Enjoy real-time performance monitoring!"
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51237 thread=P[tc] msg="child pid 51241 exited with code 1."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=info tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51366 exited with code 1."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 51366) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.613 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11   src_transport=pluginsd request="'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM'" msg="PLUGINSD: parser_action('ERROR') failed on line 1: { 'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM' } (quotes added to show parsing)"
time=2024-07-11T08:28:35.613 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51370) exited with code: 1"
time=2024-07-11T08:28:35.614 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51370 exited with code 1."
time=2024-07-11T08:28:35.614 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 51370) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=info tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=error tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51374 exited with code 1."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=error tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 51374) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51374) exited with code: 1"
time=2024-07-11T08:28:35.622 02:00 comm=netdata source=daemon level=info tid=51229 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.623 02:00 comm=netdata source=daemon level=info tid=51229 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/charts.d.plugin' (pid 51234) does not generate useful output but it reports success (exits with 0). Will not start it again - it is now disabled.."
time=2024-07-11T08:28:35.623 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51234): failed - it seems the child is already reaped"
time=2024-07-11T08:28:36.000 02:00 comm=netdata source=daemon level=notice tid=51222 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-11T08:28:36.000 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="V1 V2 VN VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP PROGRESS " dst_transport=http   msg="STREAM dxsrv11 [send]: thread created (task id 51431)"
time=2024-07-11T08:28:36.001 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-11T08:28:36.304 02:00 comm=netdata source=daemon level=info errno="13, Permission denied" tid=51238 thread=P[diskspace] msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-11T08:28:36.306 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51239 thread=P[proc] module=proc.plugin[/proc/uptime] msg="Using now_boottime_usec() for uptime (dt is 7 ms)"
time=2024-07-11T08:28:36.396 02:00 comm=netdata source=daemon level=error tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 instance=system.softirq_latency context=system.softirq_latency src_transport=pluginsd  msg="PARSER: read failed: POLLHUP."
time=2024-07-11T08:28:36.528 02:00 comm=netdata source=daemon level=error tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51372 killed by signal 8."
time=2024-07-11T08:28:36.528 02:00 comm=netdata source=daemon level=info tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin' (pid 51372) was killed with SIGTERM. Disabling it."
time=2024-07-11T08:28:46.381 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51439 exited with code 1."
time=2024-07-11T08:28:46.398 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51457 exited with code 1."
time=2024-07-11T08:28:46.403 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51469): failed - it seems the child is already reaped"
time=2024-07-11T08:28:46.414 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51473 exited with code 1."
time=2024-07-11T08:28:46.419 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51487) exited with code: 0"
time=2024-07-11T08:28:46.429 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51491) exited with code: 1"
time=2024-07-11T08:28:46.429 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51491 exited with code 1."
time=2024-07-11T08:28:46.445 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51507 exited with code 1."
time=2024-07-11T08:28:46.466 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51523 exited with code 1."
time=2024-07-11T08:28:46.492 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51553 exited with code 1."
time=2024-07-11T08:28:46.507 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51587 exited with code 1."
time=2024-07-11T08:29:37.075 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 781 replication requests, 0 charts pending replication"
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51371) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51371): failed - it seems the child is already reaped"

@stelfrag
Copy link
Collaborator

Thank you for checking. The host here is a bare metal storage server with a bonded nic. But i have no drops logged on the hardware level - still, something in the network could be affecting things. I attempted to save that tcp payload, but it's binary. Is there a way to send plaintext data or decode the communication in other ways?

On the parent node netdata.conf, go to [plugins] section and set

[plugins]                      
        netdata monitoring extended = yes

Restart the parent and in the dashboard under Netdata Monitoring section, there will be a

workers streaming receive

This will chart the streaming commands exchanged between parent - child

eg

image

@Forza-tng
Copy link
Author

This is how it looks:

image

@stelfrag
Copy link
Collaborator

This is how it looks:

Nothing suspicious here. Streaming seems to send about 330 charts/second

@Forza-tng
Copy link
Author

Forza-tng commented Jul 11, 2024

This is how it looks:

Nothing suspicious here. Streaming seems to send about 330 charts/second

It seems that enabling dbengine has enabled the child node to catch up and is now 'up to date' in real-time. Before with only ram it never caught up and there were gaps with missing data.

image

@Forza-tng
Copy link
Author

Forza-tng commented Jul 11, 2024

Here is another picture that shows the patchy retention of data before the dbengine (top chart) was enabled this morning.
image

Here is the complete log for the child node since restart this morning.
########## RESTART - ENABLING dbengine #########
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="Netdata agent version 'v1.46.0-120-g0cce79dfd' is starting"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info tid=51043  msg="IEEE754: system is using IEEE754 DOUBLE PRECISION values"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=51043  msg="TIMEZONE: using strftime(): 'CEST'"
time=2024-07-11T08:28:35.452 02:00 comm=netdata source=daemon level=info tid=51043  msg="TIMEZONE: fixed as 'CEST'"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info errno="22, Invalid argument" tid=51043  msg="NETDATA STARTUP: next: initialize signals"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize signals - next: initialize static threads"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize static threads - next: initialize web server"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize web server - next: initialize ML"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize ML - next: initialize h2o server"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize h2o server - next: set resource limits"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="resources control: allowed file descriptors: soft = 1024, max = 4096"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, set resource limits - next: become daemon"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=info tid=51043  msg="Out-Of-Memory (OOM) score is already set to the wanted value 0"
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=51043  msg="Cannot adjust netdata scheduling policy to batch (3), with priority 0. Falling back to nice."
time=2024-07-11T08:28:35.454 02:00 comm=netdata source=daemon level=error errno="38, Function not implemented" tid=51043  msg="Cannot get my current process scheduling policy."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="netdata started on pid 51045."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       1 ms, become daemon - next: initialize threads after fork"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize threads after fork - next: initialize registry"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info errno="17, File exists" tid=51043  msg="NETDATA STARTUP: in       0 ms, initialize registry - next: fork the spawn server"
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="Initializing spawn client."
time=2024-07-11T08:28:35.455 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, fork the spawn server - next: collecting system info"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      63 ms, collecting system info - next: initialize RRD structures"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database /var/cache/netdata/netdata-meta.db initialization"
time=2024-07-11T08:28:35.519 02:00 comm=netdata source=daemon level=info tid=51043  msg="metadata database version is 18 (no migration needed)"
time=2024-07-11T08:28:35.522 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database initialization completed"
time=2024-07-11T08:28:35.523 02:00 comm=netdata source=daemon level=info tid=51043  msg="SQLite database /var/cache/netdata/context-meta.db initialization"
time=2024-07-11T08:28:35.523 02:00 comm=netdata source=daemon level=info tid=51043  msg="context database version is 1 (no migration needed)"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine-tier2"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine-tier2\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: found 0 files in path /var/cache/netdata/dbengine-tier1"
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine-tier2/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: data files not found, creating in path \"/var/cache/netdata/dbengine-tier1\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51188 thread=DBENGINIT[0] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 0, using 1 threads..."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine-tier2/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.525 02:00 comm=netdata source=daemon level=info tid=51190 thread=DBENGINIT[2] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 2, using 1 threads..."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: created data file \"/var/cache/netdata/dbengine-tier1/datafile-1-0000000001.ndf\"."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: created journal file \"/var/cache/netdata/dbengine-tier1/journalfile-1-0000000001.njf\"."
time=2024-07-11T08:28:35.526 02:00 comm=netdata source=daemon level=info tid=51189 thread=DBENGINIT[1] msg="DBENGINE: populating retention to MRG from 1 journal files of tier 1, using 1 threads..."
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 0 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 1 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="DBENGINE: tier 2 is ready for data collection and queries"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="STREAM: added streaming destination No 1: '10.12.9.4' to host 'dxsrv11'"
time=2024-07-11T08:28:35.527 02:00 comm=netdata source=daemon level=info tid=51043  msg="Host 'dxsrv11' (at registry as 'dxsrv11') with guid '8c3f660a-3eb2-11ef-a617-def2eb0842f1' initialized, os 'linux', timezone 'CEST', program_name 'netdata', program_version 'v1.46.0-120-g0cce79dfd', update every 1, memory mode dbengine, history entries 0, streaming enabled (to '10.12.9.4' with api key '91DCD32F-8BF7-4573-816B-56468CBEF079'), health enabled, cache_dir '/var/cache/netdata', alarms default handler '', alarms default recipient ''"
time=2024-07-11T08:28:35.548 02:00 comm=netdata source=daemon level=info tid=51043  msg="Creating archived hosts"
time=2024-07-11T08:28:35.548 02:00 comm=netdata source=daemon level=info tid=51043  msg="Created 0 archived hosts"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="ACLK sync initialization completed"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      29 ms, initialize RRD structures - next: check for incomplete shutdown"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51043  msg="NETDATA STARTUP: in       0 ms, check for incomplete shutdown - next: collect claiming info"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51217 thread=ACLKSYNC msg="Starting ACLK synchronization thread"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51043  msg="Unable to load '/opt/netdata/var/lib/netdata/cloud.d/claimed_id', setting state to AGENT_UNCLAIMED"
time=2024-07-11T08:28:35.549 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       0 ms, collect claiming info - next: collect host labels"
time=2024-07-11T08:28:35.551 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       2 ms, collect host labels - next: start the static threads"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51227 thread=EXPORTING msg="CONFIG: cannot load user exporting config '/opt/netdata/etc/netdata/exporting.conf'. Will try the stock version."
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51231 thread=ACLK_MAIN msg="Waiting for Cloud to be enabled"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="No connector instances to activate"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="EXPORTING: no exporting connectors configured"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51228 thread=WEB[1] msg="To use encryption it is necessary to set \"ssl certificate\" and \"ssl key\" in [web] !\u000A"
time=2024-07-11T08:28:35.552 02:00 comm=netdata source=daemon level=info tid=51227 thread=EXPORTING msg="cleaning up..."
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in       1 ms, start the static threads - next: initialize commands API"
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51043  msg="Initializing command server."
time=2024-07-11T08:28:35.553 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 2"
time=2024-07-11T08:28:35.554 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 3"
time=2024-07-11T08:28:35.554 02:00 comm=netdata source=daemon level=info tid=51228 thread=WEB[1] msg="starting worker 4"
time=2024-07-11T08:28:35.609 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: in      55 ms, initialize commands API - next: ready"
time=2024-07-11T08:28:35.609 02:00 comm=netdata source=daemon level=info tid=51043  msg="NETDATA STARTUP: completed in 160 ms. Enjoy real-time performance monitoring!"
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51237 thread=P[tc] msg="child pid 51241 exited with code 1."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=info tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51366 exited with code 1."
time=2024-07-11T08:28:35.612 02:00 comm=netdata source=daemon level=error tid=51245 thread=PD[ioping] module=plugins.d[ioping.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 51366) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.613 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11   src_transport=pluginsd request="'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM'" msg="PLUGINSD: parser_action('ERROR') failed on line 1: { 'ERROR' 'python' 'IS' 'NOT' 'AVAILABLE' 'IN' 'THIS' 'SYSTEM' } (quotes added to show parsing)"
time=2024-07-11T08:28:35.613 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51370) exited with code: 1"
time=2024-07-11T08:28:35.614 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51370 exited with code 1."
time=2024-07-11T08:28:35.614 02:00 comm=netdata source=daemon level=error tid=51249 thread=PD[python.d] module=plugins.d[python.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 51370) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=info tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=error tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51374 exited with code 1."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=error tid=51248 thread=PD[perf] module=plugins.d[perf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 51374) exited with error code 1 and haven't collected any data. Disabling it."
time=2024-07-11T08:28:35.618 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51374) exited with code: 1"
time=2024-07-11T08:28:35.622 02:00 comm=netdata source=daemon level=info tid=51229 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11   src_transport=pluginsd request='DISABLE' msg="PLUGINSD: plugin called DISABLE. Disabling it."
time=2024-07-11T08:28:35.623 02:00 comm=netdata source=daemon level=info tid=51229 thread=PD[charts.d] module=plugins.d[charts.d.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/charts.d.plugin' (pid 51234) does not generate useful output but it reports success (exits with 0). Will not start it again - it is now disabled.."
time=2024-07-11T08:28:35.623 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51234): failed - it seems the child is already reaped"
time=2024-07-11T08:28:36.000 02:00 comm=netdata source=daemon level=notice tid=51222 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-11T08:28:36.000 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="V1 V2 VN VCAPS HLABELS CLAIM CLABELS LZ4 FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD GZIP PROGRESS " dst_transport=http   msg="STREAM dxsrv11 [send]: thread created (task id 51431)"
time=2024-07-11T08:28:36.001 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-11T08:28:36.304 02:00 comm=netdata source=daemon level=info errno="13, Permission denied" tid=51238 thread=P[diskspace] msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-11T08:28:36.306 02:00 comm=netdata source=daemon level=info errno="2, No such file or directory" tid=51239 thread=P[proc] module=proc.plugin[/proc/uptime] msg="Using now_boottime_usec() for uptime (dt is 7 ms)"
time=2024-07-11T08:28:36.396 02:00 comm=netdata source=daemon level=error tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 instance=system.softirq_latency context=system.softirq_latency src_transport=pluginsd  msg="PARSER: read failed: POLLHUP."
time=2024-07-11T08:28:36.528 02:00 comm=netdata source=daemon level=error tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="child pid 51372 killed by signal 8."
time=2024-07-11T08:28:36.528 02:00 comm=netdata source=daemon level=info tid=51233 thread=PD[ebpf] module=plugins.d[ebpf.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin' (pid 51372) was killed with SIGTERM. Disabling it."
time=2024-07-11T08:28:46.381 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51439 exited with code 1."
time=2024-07-11T08:28:46.398 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51457 exited with code 1."
time=2024-07-11T08:28:46.403 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51469): failed - it seems the child is already reaped"
time=2024-07-11T08:28:46.414 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51473 exited with code 1."
time=2024-07-11T08:28:46.419 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51487) exited with code: 0"
time=2024-07-11T08:28:46.429 02:00 comm=netdata source=daemon level=info tid=51043  msg="SIGNAL: reap_child(51491) exited with code: 1"
time=2024-07-11T08:28:46.429 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51491 exited with code 1."
time=2024-07-11T08:28:46.445 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51507 exited with code 1."
time=2024-07-11T08:28:46.466 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51523 exited with code 1."
time=2024-07-11T08:28:46.492 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51553 exited with code 1."
time=2024-07-11T08:28:46.507 02:00 comm=netdata source=daemon level=error tid=51369 thread=P[cgroupsdisc] msg="child pid 51587 exited with code 1."
time=2024-07-11T08:29:37.075 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 781 replication requests, 0 charts pending replication"
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51371) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:30:36.249 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51371): failed - it seems the child is already reaped"
time=2024-07-11T08:32:46.909 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:32:46.909 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51629) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:34:28.351 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 28 replication requests, 0 charts pending replication"
time=2024-07-11T08:34:57.558 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:34:57.558 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51661) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:34:57.558 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51661): failed - it seems the child is already reaped"
time=2024-07-11T08:37:08.205 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:37:08.205 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51691) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:39:18.855 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:39:18.855 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51718) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:41:29.506 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:41:29.507 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51744) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:41:29.507 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51744): failed - it seems the child is already reaped"
time=2024-07-11T08:43:40.147 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:43:40.147 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51781) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:43:40.147 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51781): failed - it seems the child is already reaped"
time=2024-07-11T08:45:50.797 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:45:50.798 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51811) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:48:01.443 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:48:01.444 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51839) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:48:01.444 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51839): failed - it seems the child is already reaped"
time=2024-07-11T08:50:12.084 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:50:12.084 02:00 comm=netdata source=daemon level=info tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51866) does not generate useful output but it reports success (exits with 0). Waiting a bit before starting it again.."
time=2024-07-11T08:52:22.732 02:00 comm=netdata source=daemon level=error errno="110, Operation timed out" tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11   src_transport=pluginsd  msg="PARSER: timeout while waiting for data."
time=2024-07-11T08:52:22.733 02:00 comm=netdata source=daemon level=error tid=51247 thread=PD[nfacct] module=plugins.d[nfacct.plugin] node=dxsrv11 src_transport=pluginsd msg="PLUGINSD: 'host:'dxsrv11', '/opt/netdata/usr/libexec/netdata/plugins.d/nfacct.plugin' (pid 51896) does not generate useful output, although it reports success (exits with 0).We have tried to collect something 11 times - unsuccessfully. Disabling it."
time=2024-07-11T08:52:22.733 02:00 comm=netdata source=daemon level=info errno="10, No child process" tid=51043  msg="SIGNAL: waitid(51896): failed - it seems the child is already reaped"
time=2024-07-11T12:33:56.319 02:00 comm=netdata source=daemon level=error tid=51431 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: buffer full (allocated 100000000 bytes) after sending 96690644 bytes. Restarting connection"
time=2024-07-11T12:33:56.359 02:00 comm=netdata source=daemon level=notice tid=51240 thread=P[cgroups] msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-11T12:35:56.462 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-11T12:35:57.000 02:00 comm=netdata source=daemon level=info tid=51222 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-11T14:12:42.668 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 9210 replication requests, 119 charts pending replication"
time=2024-07-11T14:15:57.370 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 4 replication requests, 115 charts pending replication"
time=2024-07-11T14:16:31.628 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 1 replication requests, 114 charts pending replication"
time=2024-07-11T14:17:47.261 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 7 replication requests, 107 charts pending replication"
time=2024-07-11T14:22:09.431 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 32 replication requests, 75 charts pending replication"
time=2024-07-11T14:24:07.416 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 14 replication requests, 61 charts pending replication"
time=2024-07-11T14:28:52.849 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 61 replication requests, 18 charts pending replication"
time=2024-07-11T14:59:32.657 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 18 replication requests, 15 charts pending replication"
time=2024-07-11T16:01:21.033 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 15 replication requests, 13 charts pending replication"
time=2024-07-11T18:07:33.430 02:00 comm=netdata source=daemon level=error tid=51431 thread=SNDR[dxsrv11] node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: buffer full (allocated 100000000 bytes) after sending 127938879 bytes. Restarting connection"
time=2024-07-11T18:07:34.000 02:00 comm=netdata source=daemon level=notice tid=51222 thread=STATS_GLOBAL msg="STREAM dxsrv11 [send]: not ready - collected metrics are not sent to parent."
time=2024-07-11T18:09:33.566 02:00 comm=netdata source=daemon level=info tid=51431 thread=SNDR[dxsrv11] msg_id=6e2e3839067648968b646045dbf28d66 node=dxsrv11 src_capabilities="VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS " dst_transport=http dst_ip=10.12.9.4 dst_port=19999 msg="STREAM dxsrv11 [send to 10.12.9.4]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY INTERPOLATED IEEE754 DYNCFG SLOTS ZSTD PROGRESS "
time=2024-07-11T18:09:33.606 02:00 comm=netdata source=daemon level=info tid=51220 thread=P[idlejitter] msg="STREAM dxsrv11 [send]: sending metrics to parent..."
time=2024-07-11T18:10:11.031 02:00 comm=netdata source=daemon level=info tid=51236 thread=REPLAY[1] msg="REPLICATION SUMMARY: finished, executed 6257 replication requests, 0 charts pending replication"

Here is an interesting chart showing the replication rate as seen on the parent.

image

Zooming in at 1pm mark.
image

What is going on here?

Screenshot_20240712_003011_Opera

EDIT: There is a small gap in the chart in the above screenshot. This is the parent node, and does that mean there is an issue on the parent side rather than child side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cannot reproduce This is to tag issues we weren't able to reproduce the problem and fix it need feedback needs triage Issues which need to be manually labelled
Projects
None yet
Development

No branches or pull requests

3 participants