Skip to content

Tags

Tags give the ability to mark specific points in history as being important
This project is mirrored from https://github.com/neondatabase/autoscaling. Pull mirroring updated .
  • v0.13.1
    v0.13.1
    
    Small release, primarily to fix a leak in the scheduler plugin.
    Also contains other bugfixes.
    
    Fixes:
    
    - plugin: Memory leak (#415)
    - plugin: Missing node metrics during initial load (#410)
    - neonvm/runner: Missing error logs (#401)
    - neonvm/runner: Various log.Printf calls with unnecessary trailing newline (#401)
    - neonvm/controller, informant: Typos in error messages (#407)
    
    Upgrade path from v0.12.x / v0.13.0:
    
    - No ordering requirements.
  • v0.13.0
    v0.13.0
    
    This relatively small release contains significant changes to existing
    behavior in both the autoscaler-agent and scheduler plugin.
    
    No breaking API changes (technically).
    
    Features:
    
    - agent: Memory-based scaling (#393)
      - Currently implemented in a similar manner to our load average-based
        scaling, via total memory usage, including the kernel.
    - plugin: Allow ignoring resource usage from namespaces (#399)
      - Carveout for 'overprovisioning' pods now that we're tracking
        everything.
    
    Fixes:
    
    - plugin: Improve plugin method logs (#405)
      - Previously, some notable metrics were being increased without
        suitable accompanying log messages.
    
    No protocol changes.
    
    Other changes:
    
    - plugin: Track all pods (#399)
      - Should make our accounting & metrics reporting much more accurate.
    - plugin: Remove 'System' reserved resources (#399)
      - No longer necessary, because we're tracking everything.
  • v0.12.2
    v0.12.2
    
    Small release, just containing #395 - a fix for #234, where the
    autoscaler-agent's per-VM Runner will panic when the scaling bounds
    decrease below the current usage.
    
    This was fast-tracked for release because of the impact on VM pools.
    It's not hard-blocking, but is significant enough that it's worth
    fixing beforehand.
  • v0.12.1
    v0.12.1
    
    This release contains bugfixes and new metrics (along with some changes
    to existing ones).
    
    No breaking API changes.
    
    Features:
    
    - plugin: New migration-related metrics (#387):
      - autoscaling_plugin_migrations_created_total
      - autoscaling_plugin_migrations_deleted_total
      - autoscaling_plugin_migration_create_fails_total
      - autoscaling_plugin_migration_delete_fails_total
    - plugin: Include node group in node resource metrics (#382)
    - agent: agent->informant request metrics now include the endpoint (#380)
    
    Fixes:
    
    - Add vmscrape.yaml to release assets (#392)
    - plugin: Fix spurious "updated scaling bounds" logs (#391)
      - Incidentally, this *also* entirely fixes our handling of scaling
        bounds changes.
    - plugin: Migration handling reliability improvements (#387)
    - informant: Fix parent process stall when child dies quickly (#389)
    - agent: Fix NeonVM downscaling not showing up in metrics (#381)
    
    No protocol changes.
    
    No other changes.
    
    Upgrade path from v0.12.0:
    
    - No ordering requirements.
  • v0.12.0
    v0.12.0
    
    This release contains bugfixes (lots of them!), new metrics, and
    BREAKING CHANGES TO OLD METRICS.
    
    No breaking API changes.
    
    Features:
    
    - neonvm: Propagate label/annotation changes to runner pod(s) (#279)
    - agent: Add scaling metrics! (#334)
      - All of:
        - autoscaling_agent_scheduler_plugin_{requested,approved}_{cpu,mem}_change_total
        - autoscaling_agent_informant_{requested,approved}_{cpu,mem}_change_total
        - autoscaling_agent_neonvm_requested_{cpu,mem}_change_total
        - autoscaling_agent_neonvm_outbound_requests_total
    - plugin: Add per-node resource metrics (#363)
      - Two new metrics:
        - autoscaling_plugin_node_cpu_resources_current
        - autoscaling_plugin_node_mem_resources_current
    
    Fixes:
    
    - Add whereabouts.yaml to release assets (#348)
    - neonvm: Don't propagate kubectl's last-applied-configuration annotation (#344)
    - agent: Reset Runner endState on restart (#349)
      - This bug caused the agent's metrics to never show a
        previously-panicked Runner as recovered, even when it was.
    - agent/schedwatch: Fix spurious close (#352)
      - This bug was causing agents to be unable to recognize new
        schedulers.
    - plugin/watch: Remove redundant error wrapping (#358)
    - plugin: Fix filter cycle metrics (#356)
      - This REMOVES two metrics:
        - autoscaling_plugin_filter_cycle_successes_total
        - autoscaling_plugin_filter_cycle_rejections_total
      - See the PR for more details.
    - README: fix make commands to reflect kind/k3d (#365)
    - plugin: Cleanup state for deleted k8s Nodes (#361)
      - Should *hopefully* fix a particular memory leak, but it's not clear.
    - informant/filecache: Close DB connections (#367)
      - This was causing some users to be unable to connect to their
        database because the informant took all the connections.
      - This was already released as v0.11.1
    - agent/billing: Move push logic into separate thread (#368)
      - This was preventing us from having more reasonable request timeouts
        (like... anything above 2s)
    
    No protocol changes.
    
    Other changes:
    
    - util/watch: More logs! (#351)
    - agent: Record neon/endpoint-id for each Runner if/when assigned (#353)
    - agent: Improve help message for autoscaling_agent_tracked_vms_current (#354)
    - agent/billing: Log IdempotencyKey of events (#366)
    - billing: Add x-trace-id header to requests (#372)
    
    Upgrade path from v0.11.0:
    
    - No ordering requirements, but considering the fixes to the agent's
      scheduler detection, it's probably worthwhile to update any agents
      first.
  • v0.11.1
    v0.11.1
    
    Hotfix release, backporting #367 to fix a bug in the informant that
    caused it to never close DB connections when the file cache integration
    is enabled.
  • v0.11.0
    v0.11.0
    
    This release contains bugfixes, new features, and large changes to the
    NeonVM controller.
    
    Breaking API changes:
    
    - neonvm: VirtualMachine .spec.extraNetwork fields changed (#256)
      - Removed multusNetworkNoIP
      - Made multusNetwork omitempty
    - neonvm: VirtualMachineMigrations no longer have post-copy enabled by default (#256)
    
    Features:
    
    - neonvm: Two new VmPhase types: "PreMigrating" and "Scaling" (#256)
    - neonvm: Migration source runner pod now has an ownerref pointing back
      to the migration (#332)
    - ci: Added support for k3d (#340)
    - plugin: new metrics
      - autoscaling_plugin_filter_cycle_successes_total (#346)
      - autoscaling_plugin_filter_cycle_rejections_total (#346)
      - autoscaling_plugin_extension_call_fails_total (#347)
    
    Fixes:
    
    - scheduler: Fixed agent-handler log keys explosion (#338)
      - NB: this was already released as v0.10.1
    - scheduler: Fixed missing `continue` when skipping completed pods (#342)
      - NB: this was already released as v0.10.2
    - scheduler: Fixed outdated log line (#343)
      - Removed "[autoscale-enforcer] load state: " prefix from the message
    - agent: Do informant health checks even when suspended (#341)
    
    No protocol changes.
    
    Other changes:
    
    - ci: kind and kubectl versions tweaked (#336)
    - k8s deps upgraded to 1.25.11 (#339)
    - plugin: Capitalize pluginCalls metric labels (#345)
    
    There's even more changes to the NeonVM controller that aren't listed
    here. For more, see #256.
    
    Upgrade path from v0.10.x:
    
    - No ordering requirements.
  • v0.10.2
    v0.10.2
    
    Hotfix release, backporting #342 to fix the scheduler plugin's handling
    of completed pods on startup.
  • v0.10.1
    v0.10.1
    
    Hotfix release, backporting #338 to fix scheduler plugin logs for agent
    requests.
  • v0.10.0
    v0.10.0
    
    This release contains bugfixes, ???, and a breaking change to the
    agent<->informant protocol.
    
    Breaking API changes:
    
    - agent<->informant: Include AgentID in informant /downscale and /upscale (#316)
      - This bumps the agent<->informant protocol to v2.
      - The agent currently supports both versions, and will for the
        immediate future.
    
    Features:
    
    - neonvm/builder: Make output prettier (#280)
    - Start switch from klog -> zap [agent/plugin/informant] (#323)
      - All kinds of dashboards need updating. It's for the best.
    
    Fixes:
    
    - agent/informant: Fix inverted condition for logs (#315)
    - plugin: Handle usage updates for non-autoscaling VMs (#312)
    - plugin: Fix Unreserve condition (#317)
    - util/watch: Set failingCurrent gauge to zero so it shows up (#320)
    - neonvm: Fix default ports from Go client (#257)
    
    Protocol changes:
    
    - See above, re: informant agent<->informant changes.
    
    Other changes:
    
    - deploy: Change metrics scrape interval 10s -> 60s (#321)
    - neonvm/runner: Set AutomountServiceAccountToken = false (#298)
    - agent/billing: Use NeonVM .status.cpus, not .spec.guest.cpus.use (#325)
    
    Upgrade path from v0.9.0:
    
    - All autoscaler-agents must be upgraded before any vm-informants
    - No other requirements.
  • v0.9.0
    v0.9.0
    
    This release contains bugfixes and upgrades to Kubernetes 1.25.
    
    Breaking API changes:
    
    - Upgrading to K8s 1.25. NB: Autoscaling requires K8s control planes
      with a version equal or +1; i.e. K8s 1.25 OR 1.26 is not required.
    
    Features:
    
    - New metrics! (#306, #310)
      - Too many to cover here; refer to those PRs intead.
    
    Fixes:
    
    - util/watch: Fix race condition on k8s watch.Update events (#295)
    - agent/informant: Fix informant server exit logs (#286)
    - api: Fix ExtractVmInfo disallowing min > use or use > max (#303)
      - this one may be counterintuitive at first. See #249 for context
    - agent: Fix vmEvent formatting (#307)
    - informant: Suspend old agent *before* new one (#308)
    - util/watch: Fix racy behavior with InitModeDefer (#305)
      - This was causing billing events to not be generated for VMs until an
        event *after* startup occurs for them.
    - plugin: Allow overcommitted nodes on startup (#313)
    - agent: Stop SchedulerWatch when Runner finishes (#314)
      - This was preventing the switchover to a new scheduler on upgrade or
        restart
    
    Other changes:
    
    - Fix yaml formatting for autoscaler-agent config deploy (#300)
    
    No protocol changes.
    
    Upgrade path from v0.8.0:
    
    - No ordering requirements.
  • v0.8.0
    v0.8.0
    
    This release contains bugfixes, a new component, minor public-facing API
    changes, and significant changes to the deployed services, but no
    inter-component API changes.
    
    Breaking API changes:
    
    - NeonVM: restart policy no longer applies directly to the pod (#293)
    
    Features:
    
    - Add patch for cluster-autoscaler compatability with VMs (#232)
    - NeonVM: implement RestartPolicy (#293)
    - NeonVM security and networking redesign (#245)
      - Runner pod no longer has Privileged: true
      - QEMU in the runner pod runs under its own user
      - Adapted generic-device-plugin for NeonVM, to give access to /dev/kvm
        and /dev/vhost-*
      - Switch from neonvm-vxlan-ipam to Whereabouts CNI
        -> Allows using overlay IP addresses in normal pods as well as VMs
      - Reconcile cycles improved
    - NeonVM/vm-builder: Add --enable-file-cache flag (default: off) (#265)
    - NeonVM: user RBAC roles (#284):
      - neonvm-virtualmachine-viewer-role
      - neonvm-virtualmachine-editor-role
      - neonvm-virtualmachinemigration-viewer-role
      - neonvm-virtualmachinemigration-editor-role
    - More logs for autoscaler-agent (#290, #291)
    - More autoscaler-agent metrics:
      - autoscaling_agent_runner_starts   (#273)
      - autoscaling_agent_runner_restarts (#273)
      - autoscaling_agent_runner_fatal_errors_total (#274)
      - autoscaling_errored_vm_runners_current      (#274)
    
    Fixes:
    
    - NeonVM/vm-builder: Fix command passthrough (#263)
    - NeonVM/vm-builder: Fix cgexec being ignored (#281)
    - NeonVM/vm-builder: Build without cgo (#255)
      - This removes the dependency on a dynamically loaded libc.
    - informant: Fix cgroup memory.high throttling (#223)
    - agent: Various logs fixes (#242, #267, #271, #272)
    - agent: Restart panicked/errored runners (#273)
    - agent/billing: Don't count VMs that aren't runnnig (#278)
    - agent, sched: Add ports to pod spec for metrics (#282)
    - agent, sched: Fix logging of MilliCPU (#261)
    - sched: Don't output command help on error (#253)
    - plugin: Handle completed pods as if deleted (#260)
    
    No protocol changes.
    
    Other changes:
    
    - Many unused RBAC (and other) items removed:
      - Namespace autoscaler-config (#245)
      - ClusterRole vm-view (#284)
      - ClusterRole vm-patcher (#284)
      - ClusterRoleBinding kube-system/autoscaler-vm-view (#284)
      - ClusterRoleBinding kube-system/autoscale-scheduler-as-vm-patcher (#284)
      - Role kube-system/autoscale-scheduler-config-reader (#284)
      - RoleBinding kube-system/autoscale-scheduler-config-reader (#284)
    - NeonVM: Rename 'runner' container to 'neonvm-runner' (#277)
    - agent: Network error metrics include root cause (#287)
    
    Upgrade path from v0.7.2:
    
    - No ordering requirements.
    - You may wish to remove old items as mentioned above.
  • v0.7.3-alpha3
    v0.7.3-alpha3
    
    This is a pre-release just for building and distributing images.
    Do not deploy anything from this release.
  • v0.7.3-alpha2
    v0.7.3-alpha2
    
    This is a pre-release just for building and distributing images.
    Do not deploy anything from this release.
  • v0.7.3-alpha1
    v0.7.3-alpha1
    
    This is a pre-release just for building and distributing images.
    Do not deploy anything from this release.
  • v0.7.2
    v0.7.2
    
    This is a hotfix release that reverts a change in behavior from v0.7.0:
    Alongside the change to allow fractional CPU, #172 changed the billing
    value type to a float. This was incorrect, fixed by #244.
  • v0.7.1
    v0.7.1
    
    This is a hotfix release that fixes a bug with v0.7.0: On Kubernetes
    nodes with cgroups v1, the NeonVM runner was failing to read cgroup CPU
    information due to a bad path. This, in turn, prevented any successful
    reconciling for VMs on these nodes, which - among other things -
    prevented autoscaling from functioning for these VMs.
  • v0.7.0
    v0.7.0
    
    This release contains bugfixes, new features, major public-facing API
    changes, *and* inter-component API changes.
    
    Live-upgrading is possible but must be done carefully. Read the "Upgrade
    path from v0.6.0" section at the end for more info.
    
    Breaking API changes:
    
    - Upgraded to Kubernetes 1.24 (#132)
    - VMs may have fractional CPU values (#172)
    
    Features:
    
    - Improve scaling bounds validation (#190)
    - Make api.ScalingBounds (for scaling annotations) public (#181)
    - informant: Respect max file cache size (#182)
    - agent: Add runner panics metrics (#180)
    - agent: Rework (improve!) scaling algorithm (#195)
      - In general, scaling should be much smoother now. There's still some
        work to do in this area (particularly around downscaling), but
        overall, a step that should be fairly impactful.
    - agent->informant health checks (#203)
    - Support for fractional CPU (#172)
      - !!!
    - NeonVM: Add current usage annotation to runner pod (#231)
    - NeonVM: Allow disabling service links (#235)
    
    Fixes:
    
    - VirtualMachineSpec.PodResources now sets the pod's resources (#138)
    - autoscaler-agents no longer produce logs about VM updates that aren't
      on their node (#186)
    - Fix NeonVM CRD still including VirtualMachineSpec.ServiceAccountName (#188)
    - plugin: Fix Unreserve verdict format string in logs (#206)
    - agent: Stop informant server when context canceled (#214)
      - This was the cause of a pretty notable goroutine leak that should
        now be fixed. See #196
    - agent: Fix log for /unregister response (#224)
    - agent: Fix inverted 'ErrServerClosed' check (#225)
      - This may have been causing spurious error logs and silencing actual
        errors.
    - Add node affinity to NeonVM's kube-multus-ds DaemonSet (#236)
    - agent: Fix deadlock on invalid plugin response (#237)
    
    Protocol changes:
    
    - agent->informant health checks are now supported, but not required (#203)
    - NeonVM CRD now supports fractional CPU - all of min/use/max. (#172)
    - NeonVM controller -> runner makes requests to /cpu_current and
      /cpu_change endpoints to get/set fractional CPU via the runner's
      cgroup manipulations. (#172)
    - agent->plugin resource requests can now request fractional CPU (#172)
    - plugin->agent permits can now return fractional CPU (#172)
      - note: plugin does not return fractional CPU unless the agent
        supports it. This makes it possible to do upgrades without
        significant downtime. (#238)
    
    Other changes:
    
    - Upgraded to Go 1.20 (#130)
    - agent/metrics: Make request error labels self-consistent (#193)
    - Mark scheduler with `priorityClassName: system-cluster-critical` (#227)
    
    Upgrade path from v0.6.0:
    
      note: each step produces a "valid" state - the system will operate
      successfully. It is not recommended to stay in a partial upgrade for
      long, because they have not been tested as much.
    
    1. Upgrade NeonVM controllers v0.6.0 -> v0.7.0
    2. Upgrade autoscale-scheduler v0.6.0 -> v0.7.0
      - note: it is ok to change to a compute unit with fractional CPU at
        this step! Old autoscaler-agents will be given a multiplied CU so it
        has an integer number of CPUs.
    3. Upgrade autoscaler-agent v0.6.0 -> v0.7.0
    
      note: Upgrading the vm-informant can be done at any point. Its
      protocol changes are opt-in.
  • v0.6.0
    v0.6.0
    
    This release contains bugfixes, new features, and minor public-facing
    API changes, but no inter-component API changes.
    
    Breaking API changes:
    
    - NeonVM: Removed VirtualMachineSpec.ServiceAccountName (#140)
    - NeonVM: Make vm-builder specific to Neon, with new vm-builder-generic
      for general-purpose use. vm-builder-generic is *almost* the same as
      the previous vm-builder, but it does not include vector by default (#133)
    - Require label "autoscaling.neon.tech/enabled=true" for autoscaling to
      be enabled (#38)
    
    Features:
    
    - Allow annotation "autoscaling.neon.tech/bounds=..." to override
      scaling bounds (#128)
    - NeonVM: add --quiet flag to vm-builder[-generic], which is off by
      default. Builds are more verbose without it. (#169)
    - agent, plugin: Add prometheus metrics (#92, #174, #175)
    - agent: Better config validation (#177)
    
    Fixes:
    
    - agent: always log informant register errors (#165)
    - agent: fix runner log prefix (#159)
    - NeonVM: fix ENTRYPOINT, CMD handling when there's mutiple strings (#184)
    
    No protocol changes.
    
    Upgrade path from v0.5.2:
    
    - No ordering requirements.
  • v0.5.2
    v0.5.2
    
    This release incorporates a handful of bugfixes and some new features.
    It is entirely inter-compatible with v0.5.1, with the exception of a
    minor change in the scheduler's "dump state" output.
    
    Features:
    
    - agent, plugin: Reimplement migration under load. (#112)
      - Note: The overlay network that allows VMs to preserve their IP
        addresses is not currently functional.
    
    Fixes:
    
    - plugin: Don't reject resource requests that aren't a multiple of the
      compute unit if the VM's resources are constrained to make satisfying
      that requirement impossible. (#108)
    - plugin: Fix missing JSON tags for Buffer and CapacityPressure in
      podResourceState. (#107)
      - Note: this changes the "dump state" JSON output
    - agent: Don't return from /suspend until NeonVM requests finished. This
      helps avoid possibilities of multiple autoscaler-agents acting at the
      same time.
    - agent/billing: panic if VM store unexpectedly stopped (#110)
    
    No protocol changes.
    
    Upgrade path from v0.5.1:
    
    - No ordering requirements.