Wait on network to be available before trying to run backups against remote repos#2176
Wait on network to be available before trying to run backups against remote repos#2176m3nu merged 14 commits intoborgbase:masterfrom
Conversation
2fe7de9 to
eaaadf0
Compare
Don't try and start jobs with remote repos until the network is up. This should prevent job failures when, for instance, waking from sleep. Mac implementation is a stub currently. Signed-off-by: Ryan Zeigler <zeiglerr@gmail.com>
eaaadf0 to
42e8bf9
Compare
Signed-off-by: Ryan Zeigler <zeiglerr@gmail.com>
cce8273 to
efc04e1
Compare
NetworkStatusMonitor is now a QObject with signal network_status_changed. The scheduler listens to this signal and handles rescheduling similar to the existing 'wake from suspend' logic works. Signed-off-by: Ryan Zeigler <zeiglerr@gmail.com>
It seems that sometimes the signal does't arrive. Signed-off-by: Ryan Zeigler <zeiglerr@gmail.com>
|
Would there be any hope of getting either feedback or a merge? |
|
Sorry, I missed this. There are too many stale PRs currently. Looks simple enough at first. |
m3nu
left a comment
There was a problem hiding this comment.
Review Summary
Purpose: Defer scheduled backups for remote repos when network is unavailable (e.g., after sleep/resume).
Positive Aspects
- Solves a real problem - Backup failures after sleep/resume due to network not being ready is a legitimate issue
- Good architecture - Extends existing
NetworkStatusMonitorpattern, uses Qt signals appropriately - Linux implementation is well done - Uses NetworkManager DBus API properly
- Fails gracefully - When status can't be determined, assumes network is up (won't block backups unnecessarily)
- Respects local repos - Only defers remote repository backups
Issues to Address
1. Bug: Inconsistent network state checking (Medium severity)
The code mixes two different NetworkManager enums:
In networkStateChanged slot (network_manager.py:135):
# This checks NMState (0-80 range)
self.network_status_changed.emit(state >= 60)In is_network_active() (network_manager.py:30):
# This checks NMConnectivityState (0-4 range)
return self._nm.get_connectivity_state() is not NMConnectivityState.NONEThese are different DBus properties:
StateChangedsignal emits NMState (values 0-80)Connectivityproperty returns NMConnectivityState (values 0-4)
This could cause the initial _net_up state (set via is_network_active()) to disagree with signal-based updates. Both should use the same approach.
Suggested fix - Option A (Use State consistently):
def is_network_active(self):
try:
state = read_dbus_property(self._nm._nm, 'State')
return state >= 60 # NM_STATE_CONNECTED_LOCAL or better
except DBusException:
logger.exception("Failed to check network state. Assuming connected")
return TrueOr Option B (Use Connectivity consistently):
@pyqtSlot("unsigned int")
def networkStateChanged(self, state):
logger.debug(f'network state changed: {state}')
# Re-check connectivity when state changes
try:
connectivity = self.get_connectivity_state()
self.network_status_changed.emit(connectivity is not NMConnectivityState.NONE)
except DBusException:
self.network_status_changed.emit(True)2. Typo (Trivial)
scheduler.py:93: "updating shcedule" → "updating schedule"
3. Unused import (Trivial)
scheduler.py:6: List is added to typing imports but never used in the file.
Notes
- macOS stub -
darwin.pyis_network_active()always returnsTrue. This is fine for now since it preserves existing behavior, and you've offered to implement it later. - Testing - Would be good to verify on Linux with NetworkManager by toggling airplane mode and confirming backups defer/resume correctly.
Overall this is a useful feature! Just needs the NMState vs NMConnectivityState inconsistency fixed before merge.
|
Suggestions are noted, I'll work on those updates. |
…Connectivity and NMState I chose NMState because there is no Connectivity signal and immediately polling for Connectivity on state change seems to return the old value.
|
Some changes made. I standardized on reading 'State' instead of 'Connectivity' due to what appears to be a delay in 'Connectivity' being updated on the nm side after the State change emits. |
Quick Review — Minor Issues FoundAll three original review comments from Jan 22 have been addressed. Found a few new minor items: 1. Wrong enum name for 2. Double "to" typo in Should be: 3. Misleading else-branch log message in if profile.schedule_make_up_missed and self._net_up:
# ... run the backup
elif profile.schedule_make_up_missed and not self._net_up:
logger.debug('Skipping catchup %s (%s), the network is not available', profile.name, profile.id)4. Trailing whitespace
(These would likely be caught by ruff/pre-commit.) |
|
I have addressed other notes with the exception of
This is incorrect. Per the docs ASLEEP and DISABLED both have the value 10 and ASLEEP is deprecated. |
|
Thanks for your patience with this PR, @rzeigler, and for working through all the review feedback! The three original review issues are all resolved, and the follow-up items have been addressed as well (and you were right about One remaining issue I noticed: Local repos blocked from catchup when network is downIn # scheduler.py ~line 314
if profile.schedule_make_up_missed and self._net_up:
# run the backup...
elif profile.schedule_make_up_missed and not self._net_up:
logger.debug('Skipping catchup %s (%s), the network is not available', ...)Meanwhile, elif not profile.repo.is_remote_repo() or self._net_up:
self.set_timer_for_profile(profile.id)So Suggested fix: needs_network = profile.repo is not None and profile.repo.is_remote_repo()
if profile.schedule_make_up_missed and (self._net_up or not needs_network):
# run backup...
elif profile.schedule_make_up_missed and not self._net_up and needs_network:
logger.debug('Skipping catchup %s (%s), the network is not available', profile.name, profile.id)Also two minor typos remaining:
|
|
Anything else on this? |
No, I think all good. I'll make a note to merge and test tomorrow. |
m3nu
left a comment
There was a problem hiding this comment.
🔍 Code Review
PR: #2176 - Wait on network to be available before trying to run backups against remote repos
Branch: wait-for-network → master
Changes: +113 / -5 lines across 4 files
📋 Summary
Adds reactive network monitoring to defer scheduled backups against remote repos when there is no network connection. On Linux, listens to NetworkManager's StateChanged DBus signal; on macOS, the implementation is stubbed to always return True.
✅ What's Good
- Clean approach to a real pain point (failed backups after resume from sleep)
- Proper use of DBus signals for reactive network monitoring on Linux
- Correct
NMStateenum mapping with documentation links - Graceful fallback:
NullNetworkStatusMonitor.is_network_active()returnsTrue - Thread-safe: properly integrates with existing lock pattern in scheduler
- Only affects remote repos, leaving local backups unhindered
- Good defensive re-fetch of network state on resume from suspend
🔍 Review Details
| Severity | Count | Details |
|---|---|---|
| 🔴 Error | 1 | Base class design |
| 🟡 Warning | 4 | Various |
| 🔵 Info | 2 | Style/minor |
1. 🔴 Base class is_network_active() raises NotImplementedError — If a new platform subclass is added and forgets to implement this method, scheduled backups will crash instead of gracefully degrading. Consider returning True by default (like NullNetworkStatusMonitor does) so the behavior falls back to "assume connected".
2. 🟡 macOS is_network_active() is stubbed — DarwinNetworkStatus.is_network_active() always returns True with a # Not yet implemented comment. macOS users won't benefit from this feature. Worth tracking as a follow-up issue.
3. 🟡 reload_all_timers() race condition — When network goes down, timers for remote repos are removed via remove_job(). When network comes back, networkStatusChanged signal fires and reschedules. However, if the signal is missed (e.g., during a brief connectivity blip), profiles stay unscheduled until the 15-minute reload timer fires. This is probably acceptable but worth noting.
4. 🟡 set_timer_for_profile() partial coverage — The needs_network check only guards the schedule_make_up_missed path. If a profile has schedule_make_up_missed=False and network is down, it will still schedule a future timer that may fire while network is still down — the backup then fails at BorgCreateJob.prepare() time (existing behavior, but this PR could address it for consistency).
5. 🟡 @pyqtSlot("unsigned int") fragility — The slot signature should match the DBus signal type. NM's StateChanged emits u (uint32). Verify this works reliably across different Qt/DBus versions.
6. 🔵 Naming convention — networkStatusChanged method in scheduler.py uses camelCase, inconsistent with the rest of the Python codebase which uses snake_case. Consider renaming to network_status_changed_handler or similar.
7. 🔵 NM_STATE_CONNECTING excluded — _is_network_connected() correctly excludes NM_STATE_CONNECTING (40), meaning a brief "connecting" phase after resume won't trigger premature backup attempts. Good decision.
📊 Verdict: COMMENT 💬
Solid approach to a real problem. The Linux/NetworkManager integration looks correct. Main concerns are the base class design (should default to True instead of NotImplementedError), the partial coverage in set_timer_for_profile(), and the macOS stub that should be tracked for follow-up. None of these are blockers, but addressing them would strengthen the PR.
🤖 Reviewed by Claude Code
|
I think we are good. The base class design follows existing convention and isn't a concern. Thanks again for the patience! |
Description
Reactively disable scheduled jobs against remote repositories when there is no network connection.
Related Issue
#2133
Motivation and Context
Running a backup against a remote repo will fail without a network connection.
This results in a notification on default settings, stress (because why are my backups failing) and could maybe result in backups not happening.
On my Linux systems configured to sleep, the first backup after resuming always seems to fail because the network has not finished activating yet.
This should allow that backup to just wait for a little while.
As a side benefit, this should also defer backups that happen in the normal course when there is no possible way they could work (i.e. when in airplane mode).
I thought about these changes for a little and its what I came up with. However, if the maintainers have a different approach they would like to try I'm more than willing to rework my PR or open a new one with that approach. I do also have a mac that I can in theory develop the stubbed mac implementation here, but I'm less familiar with how to programatically access the state of the mac network. If this approach is good, I can flesh that out also.
How Has This Been Tested?
I had a little bit of trouble figuring out how to manipulate vorta's database state so that I could actually start vorta, then put my system to sleep. Vorta always wanted to backup immediately. I assume this should be a reasonable facsimile of the behavior.
Regarding actual tests, I am not an experienced python developer and running nox locally seems to fail to collect tests in test_darwin.py which I wouldn't expect to execute at all on my system. That said, I don't think I changed anything covered by unit tests.
Screenshots (if appropriate):
Types of changes
Checklist:
I provide my contribution under the terms of the license of this repository and I affirm the Developer Certificate of Origin.