Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,13 @@ abstract class LocalBuffer[T <: java.nio.Buffer] {
def getBuf(length: Long) = {
Assert.usage(length <= Int.MaxValue)
if (tempBuf.isEmpty || tempBuf.get.capacity < length) {
tempBuf = Maybe(allocate(length.toInt))
// allocate a buffer that can store the required length, but with a minimum size. The
// majority of LocalBuffers should be smaller than this minimum size and so should avoid
// costly reallocations, while still being small enough that the JVM should have no
// problem quickly allocating it
val minBufferSize = 1024
val allocationSize = math.max(length.toInt, minBufferSize)
tempBuf = Maybe(allocate(allocationSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not getting why this improves performance.

It sounds to me like the reuse mechanism in this code was not working since if it was, your improvement here would not have helped, and that suggests that there is more improvement on the table if we get the reuse of these buffers to work better, or work at all, since I suspect it's not working at all.

I suggest: first take out all the reuse - just allocate and let the GC reclaim exactly-sized buffers and see if that's faster/slower. (checkpoint on the reuse mechanism overhead vs. just allocate+GC)

Second, instrument the reuse of these buffers so we count exactly how many of these are allocated vs. reused from the stack/pool. My bet is the reuse isn't working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reuse worked (I'll double check as you suggest), the issue is that the reuse is only within a single parse--it's only local to the PState. So every time we call parse() we will allocate a new 1MB buffer (if there is a specific length string in the format), and reuse that buffer for the lifetime of that single parse. When that parse ends, we let it get garbage collected along with the PState, and a new one will be allocated the next time parse() is called.

For files that are very small (e.g. <1KB) the 1 MB allocation makes a noticable difference. For larger files this doesn't make as much of a difference because 1MB isn't very much compared to all the other allocations daffodil does.

An alternative would be to make these a thread local and then share them among threads, then we could go back to the 1MB buffer size and you won't only take a single hit per thread. But I don't think the 1MB buffer size is all that important. It's going to be very rare for a single specified length string to be very big, likely no where close to 1MB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check if with vs without reuse changed performance, but I did add a print statement every time we call this getBuf function printed if we have to allocate a new buffer or not. Testing with NITF (which has lots of specified length strings and some relatively small files available), it allocated a 1024 byte buffer once and then reused it about 100+ times with no additional allocates. We might need additional testing to make sure it's working correctly everywhere for other formats, or it might be worth investigating other reuse (e.g. make LocalBuffer reuse among threads), but I think the LocalBuffer reuse logic is working as intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the particular LocalBuffer that was related to the 1MB specified length string allocation is actually from the InputSourceDataInputStream, not the PState. So technically it's only a single 1MB allocation per ISDIS created. If you are streaming data and reusing the same ISDIS you'll only have have one allocation. But if you create a new ISDIS per parse (which non-streaming use cases will do), then you will effectively get one allocation per parse() call.

So this performance only improves relatively small files that hvae specified length strings that for non-streaming data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested allocating the exact size needed vs allocating one large buffer and reusing. I really didn't see much of a difference. Allocate once + reuse was about the same or like 3-5% faster, but that could be within the JVM noise.

It's also possible it's dependent on the specific file and format. For very small formats with little reuse it's probably faster to just allocate exact sizes as needed. But if there are a lot of strings, maybe avoiding the allocations and GC overhead makes a differences. I can't really say without more testing, but I don't think there's a clear winner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks for double checking that the reuse is at least working. When we call parse repeatedly in a loop such as to parse a stream of messages, is that going to allocate this over and over, or are we reusing the PState in that case.

I answered my own question... we have to reuse the Pstate because the I/O position on the input has to be preserved.

However, other applications will call parse once per input record because they are carving off the input data record/message themselves based on UDP datagram size or other means. In that case those are brand new calls to parse with new PState allocated. This is also a fairly tight loop, so yes, eliminating this 1MB allocation and GC for each parse call should be an improvement.

We should consider pooling entire PState objects and resetting them so that we don't have to reallocate any of this stuff.

}
val buf = tempBuf.get
buf.clear
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd: ElementRuntimeData)
def isFieldNilLit(field: String): Boolean

override def parse(start: PState): Unit = {

val field = parseString(start)

val isFieldEmpty = field.length() == 0

if (isFieldEmpty && isEmptyAllowed) {
// Valid! Success ParseResult indicates nilled
} else if (isFieldEmpty && !isEmptyAllowed) {
// Fail!
PE(start, "%s - Empty field found but not allowed!", eName)
} else if (isFieldNilLit(field)) {
// Contains a nilValue, Success ParseResult indicates nilled
if (erd.isComplexType) {
// nillable complex types must have a nilValue of %ES;. For a literal nil specified length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine.

I just ask does logic about '%ES;' and complex nil values exist elsewhere as well?
I ask because there is a longstanding enhancement request for DFDL to allow more than just '%ES;' as nil values for complex types.

The DFDL Workgroup was uncertain where this restriction originated, but some members suspected there is a reason for it that has just been lost to history. Hence, if this feature were to be added it would be experimental until proven ok.

In any case this code is fine as it was already doing complex type nil logic, so it's not like you added another place to the code that reasons about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is it does, but it might be hard to find (I couldn't find anything via a quick scan). For example, the previous code didn't really have anything specific to complex nils, it just happened to work because it would either find data or it wouldn't and the nilValue for complex was %ES which matches empty.

It think we ever do add support for non-%ES complex nils, we'll just have to be very thorough about testing.

// complex to be nilled, that means either there must be a specified length that is zero
// or there isn't a specified length and we have reached the end of the data. If neither
// of these conditions are true, then there is non-empty data for this complex element and
// it cannot be nilled.
val bitLimit0b = start.bitLimit0b
val hasSpecifiedLength = bitLimit0b.isDefined
if (
(hasSpecifiedLength && (bitLimit0b.get - start.bitPos0b) > 0) ||
(!hasSpecifiedLength && start.dataInputStream.hasData)
) {
// Fail!
PE(start, "%s - Does not contain a nil literal", eName)
} else {
// Valid! Success ParseResult indicates nilled
}
} else {
// Fail!
PE(start, "%s - Does not contain a nil literal!", eName)
// Simple element, read a string up to the bitLimit and see if it matches the nilValue
val field = parseString(start)

val isFieldEmpty = field.length() == 0

if (isFieldEmpty && isEmptyAllowed) {
// Valid! Success ParseResult indicates nilled
} else if (isFieldEmpty && !isEmptyAllowed) {
// Fail!
PE(start, "%s - Empty field found but not allowed", eName)
} else if (isFieldNilLit(field)) {
// Contains a nilValue, Success ParseResult indicates nilled
} else {
// Fail!
PE(start, "%s - Does not contain a nil literal", eName)
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.daffodil.runtime1.processors.parsers

import org.apache.daffodil.io.processors.charset.BitsCharsetDecoderUnalignedCharDecodeException
import org.apache.daffodil.lib.exceptions.Assert
import org.apache.daffodil.lib.util.MaybeChar
import org.apache.daffodil.lib.util.Misc
import org.apache.daffodil.runtime1.processors.CharsetEv
Expand Down Expand Up @@ -86,8 +87,32 @@ trait StringOfSpecifiedLengthMixin extends PaddingRuntimeMixin with CaptureParsi

protected final def parseString(start: PState): String = {
val dis = start.dataInputStream
val maxLen = start.tunable.maximumSimpleElementSizeInCharacters
val startBitPos0b = dis.bitPos0b
val bitLimit0b = dis.bitLimit0b

// We want to limit the maximum length passed into getSomeString since that function can
// pre-allocate a buffer that size even if it won't find that many characters. So we
// calculate the maximum number of characters that we could possibly decode from the
// available bits and the character set.
//
// For fixed-width encodings, that is just the number of available bits divided by the
// fixed width of the encoding.
//
// For variable length encodings (e.g. UTF-8), the maximum number of characters that the
// available bits could possibly decode to is if every decoded character was the smallest
// possible representation. That smallest representation for variable-width encodings is
// bitWidthOfACodeUnit. So we divide the available bits but bitWidthOfACodeUnit.
//
// Note that the bitLimit should always be defined because bitLimit is how string of
// specified lengths limit lengths
Assert.invariant(bitLimit0b.isDefined)
val availableBits = bitLimit0b.get - startBitPos0b
val charset = charsetEv.evaluate(start)
val optWidth = charset.maybeFixedWidth
val bitsPerChar = if (optWidth.isDefined) optWidth.get else charset.bitWidthOfACodeUnit
// add one to allow for partial bytes at the end that could parse to a replacement char
val maxPossibleChars = (availableBits / bitsPerChar) + 1
val maxLen = math.min(maxPossibleChars, start.tunable.maximumSimpleElementSizeInCharacters)

val strOpt =
try {
Expand Down