Skip to content

Conversation

@drivera73
Copy link
Contributor

@drivera73 drivera73 commented Dec 24, 2025

In some cases, when running primary zones using BIND as the DNS, the journal files may not get properly synchronized into their primary zone databases - possibly because bind isn't shut down cleanly using the O/S's service scripts. As a result, on the next bootup, BIND9 may refuse to boot because those journal files are corrupted.

These scripts try to maximize the instances under which those journal files are cleaned out properly. The right way to do the cleanout is by either running service named stop or rndc sync -clean. Either of these commands will instruct BIND to sync the journal files to their DBs and clean them out properly.

However, if that fails, then there are contingencies in place to forcibly remove those files - if they didn't get synchronized cleanly, then they're garbage and should be removed. If they did, then they disappear and there will be nothing to forcibly delete.

Either way, the intent is to ensure that BIND has no issues starting up when runnin a master zone.

This is related to issue #5068, also reported by me (via a different account ... sorry :) ).

Copy link
Member

@fichtner fichtner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t it be better to use a setup.sh script executed at each start? Not sure how the files are used across restarts.

@drivera73
Copy link
Contributor Author

Wouldn’t it be better to use a setup.sh script executed at each start? Not sure how the files are used across restarts.

That's indeed part of the solution. There is a patch for the existing setup.sh script, and the new early and stop syshook scripts.

@fichtner
Copy link
Member

Ok, sorry the path didn’t expand on mobile view very well.

I’d think we should try to minimize impact. Early could be reasonable in some cases maybe, but stop isn’t really useful as a trigger.

Cheers,
Franco

@drivera73
Copy link
Contributor Author

drivera73 commented Dec 24, 2025

Fair.

I added the stop hook for consistency, and to maximize the chances of the issue being triaged as cleanly as possible (i.e. with either rndc clean -sync or service named stop). The logic is this: by the time the stop hook is called there are only two possible scenarios:

  • The named service is still running
    • The named service must be stopped, but first call rndc clean -sync to flush out any journal files
    • Execute the other scenario ...
  • The named service has already been stopped, or was never started
    • The journal files are already cleaned up as a result
    • The journal files are not cleaned up, but since named is no longer running, they're effectively garbage

Under either scenario, at the end of the stop hook, the journal files can safely be deleted if present.

If we remove the stop hook, then those journal files would be cleaned up blindly on bootup by the early hook, which may not be as antiseptic as rndc clean -sync since all we can do is delete them directly. The goal of affording rndc the opportunity to do some cleanup is in case there's valuable information in those journals that we can still commit at the last minute. It may not happen very often, but might as well give it a shot ... who knows whose butt we'll be saving!

In reality, the early hook is just a contingency to maximize the chances of BIND bootup, since the expectation is that both the setup.sh and the stop hooks should be sufficient ... but since only the paranoid survive... :)

Cheers!

@farthinder
Copy link

For what it is worth, this fixes the worst of my issues when I try to use k8s external-dns to publish records in OpnSense Bind via rfc2136.

I still have the issue that whenever Bind in opnsense is restarted (eg. during manual config/record updates) all the records published via rfc2136 gets deleted. But this doesn't happen often in my use case and I have setup external-dns to sync every 30s so I can live with it.

For any one else running in to this, you can apply this patch with this command:

opnsense-patch -c plugins 8f4daa869f01ef203efa73ce4926f0945a0f1b11

Nice work! @drivera73

@drivera-armedia
Copy link

drivera-armedia commented Jan 10, 2026

Nice work! @drivera73

Thanks!

I still have the issue that whenever Bind in opnsense is restarted (eg. during manual config/record updates)
all the records published via rfc2136 gets deleted

I'm fairly sure the issue here is that the OPNSense UI believes it's the only one feeding records into the zone. Thus, what happens is that when you click on the Save button it will simply re-generate the zone completely based on the information it knows about, which doesn't include those RFC-2136 updates.

The solve for this would be to somehow tell the UI to "re-import the zone" so that it can adopt those new records into its own "knowledge". This would probably be a much larger undertaking, but it would probably be worth adding as a separate ticket since this feature could also be used to import existing zone files, which is currently not possible. Currently if one wishes to import existing zones, one must download a configuration backup, add the records into the XML, and then upload the updated configuration. This is cumbersome, error-prone, and would be resolved by that proposed new code.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants