Skip to content

Conversation

@fmstephe
Copy link
Owner

@fmstephe fmstephe commented Feb 3, 2025

We achieve this by removing the pair of addresses in RefPointer which point

           | dataAddress  -> []byte
RefPointer |
           | metaAddress -> metadata

Instead we arrange our addresses in a chain

RefPointer | metaAddress -> metadata | dataAddress -> []byte

We can still verify the generation in the RefPointer matches that in the metadata.

Shrinking the RefPointer down a single word allows us to perform the usual atomic actions, such as CAS etc. The complexity of attempting to support this with two words was too complex to really consider.

A potential future benefit would be to allow us to have a const flag which when enabled (using a compiler option to set a constant value) we can disable the metadata check completely and allow faster direct memory reads.

We make this move first because if we have a single-word in a
pointer-reference then we only have one place to keep the generation
tag. The single word will be the data-pointer.
This makes it clearer where the generation tag is being smuggled. It
also alerts the reader that directly using this field will use more data
than just a dataAddress.
This replaces the old 'gen uint8' we had before. The behaviour does not
change in this commit, we are simply storing the generation tag in a
larger field than before. The next steps will be to actually use this
new full address to do the data lookup.

Also added a helper function to test the validity of a
metadata/reference pair. This functionality was being duplicated between
Free() and DataPtr() methods.
This was an oversight, we were checking the dataAddress for nil but
didn't check the metadata.
The tagged address wraps up the address and generation tag into a single
type sized uint64. This cleans up some code and allows us to separate
out this tagging code from the RefPointer code.
Previously we separately checked that the generation tags were the same.
But this is redundant. Now we just check that the raw address values are
the same and print an explanatory error in the panic including the
taggedAddress values as hex. This should be enough to debug and error in
this case.

Just printing big raw hex values _could_ be harder to debug for users
who are not already familiar with the internals of the offheap package.
Maybe a more beginner friendly message could be used instead.
Because checkReference can fail if the generations _or_ the addresses
are different between a RefPointer and it's associated metadata, we
generate specific panic messages for each of these cases. Hopefully this
makes debugging easier for library users.
This makes no significant change, but attempts to make the code easier
to reason about. In the next steps the RefPointer code shouldn't know
about metadata explicitly at all.
This is a much simpler implementation. It also removed object meta-data
from the critical path of tracking free objects.

Now meta-data _only_ checks if the object is free or if the object has
the right generation tag for the accessing reference. These functions
are useful, but not essential, so the meta-data can be made optional.
The pointer store tracks a list of free RefPointers in a slice. This was
managed as a stack but using a slice directly. We have implemented a
very small stack implementation and use that instead. A little bit
cleaner.
We also add unit tests to demonstrate the expected reference and
metadata behaviour for the 'free()' and 'allocFromFree()'. This is a
unit testing hole which hand now been filled.
This shrinks the size of the generation tag to 7 bits. I think this is
likely fine in practice. The new maximum generation size is 127 or 0x7F
in hexadecimal.

You can set and read the is-free bit with specific methods on
taggedAddress.

This is not currently used. Will be actually used in the next commit.
We remove the 'isFree' field as it is no longer needed.

Fixed a big in taggedAddress.setGen(...), this was a sneaky one, as the
pointerMask being used was unintentionally clearing the is-free bit.
Move comments around to improve readability, and rename pointer*
constants to address* constants.
This simply clarifies that we don't check the is-free bit in this method
because it is never set in any RefPointers.
Next step will this field will become the _only_ address and it will
point to metadata only. The metadata will contain the actual data
address.
This method was bypassing the meta-data checks for data access. We
create a common method called by both 'Bytes()' and 'DataPtr()', which
does these checks.
The metadata now contains the address of the actual data. This means
that the RefPointer is only 64 bits wide. This also means that right now
you must do two dependent memory lookups to get to the data. This is
_probably_ slower than previously where we also did two memory lookups
but they were independent and speculative execuation _probably_
initiates the second lookup earlier.

This slow down is ok (and only theoretical, not benchmarked or
conclusively tested).

Making the RefPointer only 64 bits wide allows us to do vanilla atomic
operations like CAS etc. Previously doing CAS on two words looked
potentially very complex and likely much slower.

We will introduce a const flag in the future which can be switched off
at build time to bypass the metadata lookup and store the data address
directly in the RefPointer. This will speed up data access times, but
only when the flag is switched off at compile time. A good idea? A bad
idea? Ony time will tell. (It's for sure an idea)
Previously we weren't checking this, so any old Reference could
Realloc() an allocation at any time, even if that Reference was no
longer valid.

It's concerning that all this work has uncovered multiple holes in the
Reference generation checking logic. It's not clear how to improve this
beyond just thinking harder and reviewing code more carefully (neither
of these approaches are very reliable ways to find bugs in general). But
we will keep this problem in the back of our heads and see if we can
improve things.
It is not acceptable to call RefPointer.allocFromFree() when the
allocation is not free. This hole is fixed and tested.
This RefPointer method does all the generation/is-free checks on an
allocation. It's important that this method sits on the path of any
RefPointer method which accesses the data or meta-data.

Current methods which call this method are:

Realloc()
DataPtr()
Bytes()
free()

Also inlined the metadata.checkReference() method.  The check in
metadata _only_ checked that the generation matches, which doesn't
really warrant a separate method. The code seems more readable with the
generation check inlined into the accessibleActiveAddress() method.
Previously the code would update the method receiver in place. Now we
create a new RefPointer with the correct generation for the now active
allocation.

This is preparation for eliminating all of the pointer receiver methods
for RefPointer.
Because the RefPointer is only one word wide, and no methods update the
receiver, they return new modified RefPointers instead, we don't need
pointer receiver semantics.

This is a significant simplification of this part of the code-base.
Previously all RefPointer, and all Ref-types in general, are used as
values but have methods with pointer receivers. This caused some
unexpected allocations which could be fixed, but with odd-looking
contortions in the code and long comments to explain them.

Now there are no pointer receivers, no subtle allocations. Happy day.
Clarify that metadata contains both the generation _and_ the is-free
tag, which are used to check the integrity of RefPointer data accesses.

Writing these comments it strikes me that metadata is responsible for
the integrity checks when accessing data through Refpointers, but the
check themselves are done inside RefPointer methods. It seems like
moving this code into metadata will improve code clarity.
This is just a getter for the address field, and can be replaced by a
simple field access. It serves no clear purpose.
This method name more clearly reflects that a uintptr is being returned.
It also more clearly indicates that we are switching from a
taggedAddress, which is almost a pointer, but with a different name to
avoid confusing it with a pointer, to an actual pointer type uintptr.
This method simply creates a new taggedAddress with an incremented
generation tag. Only metadata increments the generation tag, while on
RefPointer the generation tag is set directly. This is because when we
allocFromFree a RefPointer the generation tag could be any legal value.

This is not a critical change, but tidies up a little bit of code.
Hopefully it is easier to read now.
@fmstephe fmstephe merged commit 796774f into master Feb 7, 2025
1 check passed
@fmstephe fmstephe deleted the 64bit-references branch February 7, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant