-
Notifications
You must be signed in to change notification settings - Fork 71
Avoid large allocations related to specified length strings #1603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd: ElementRuntimeData) | |
| def isFieldNilLit(field: String): Boolean | ||
|
|
||
| override def parse(start: PState): Unit = { | ||
|
|
||
| val field = parseString(start) | ||
|
|
||
| val isFieldEmpty = field.length() == 0 | ||
|
|
||
| if (isFieldEmpty && isEmptyAllowed) { | ||
| // Valid! Success ParseResult indicates nilled | ||
| } else if (isFieldEmpty && !isEmptyAllowed) { | ||
| // Fail! | ||
| PE(start, "%s - Empty field found but not allowed!", eName) | ||
| } else if (isFieldNilLit(field)) { | ||
| // Contains a nilValue, Success ParseResult indicates nilled | ||
| if (erd.isComplexType) { | ||
| // nillable complex types must have a nilValue of %ES;. For a literal nil specified length | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Code looks fine. I just ask does logic about '%ES;' and complex nil values exist elsewhere as well? The DFDL Workgroup was uncertain where this restriction originated, but some members suspected there is a reason for it that has just been lost to history. Hence, if this feature were to be added it would be experimental until proven ok. In any case this code is fine as it was already doing complex type nil logic, so it's not like you added another place to the code that reasons about this.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My guess is it does, but it might be hard to find (I couldn't find anything via a quick scan). For example, the previous code didn't really have anything specific to complex nils, it just happened to work because it would either find data or it wouldn't and the nilValue for complex was %ES which matches empty. It think we ever do add support for non-%ES complex nils, we'll just have to be very thorough about testing. |
||
| // complex to be nilled, that means either there must be a specified length that is zero | ||
| // or there isn't a specified length and we have reached the end of the data. If neither | ||
| // of these conditions are true, then there is non-empty data for this complex element and | ||
| // it cannot be nilled. | ||
| val bitLimit0b = start.bitLimit0b | ||
| val hasSpecifiedLength = bitLimit0b.isDefined | ||
| if ( | ||
| (hasSpecifiedLength && (bitLimit0b.get - start.bitPos0b) > 0) || | ||
| (!hasSpecifiedLength && start.dataInputStream.hasData) | ||
| ) { | ||
| // Fail! | ||
| PE(start, "%s - Does not contain a nil literal", eName) | ||
| } else { | ||
| // Valid! Success ParseResult indicates nilled | ||
| } | ||
| } else { | ||
| // Fail! | ||
| PE(start, "%s - Does not contain a nil literal!", eName) | ||
| // Simple element, read a string up to the bitLimit and see if it matches the nilValue | ||
| val field = parseString(start) | ||
|
|
||
| val isFieldEmpty = field.length() == 0 | ||
|
|
||
| if (isFieldEmpty && isEmptyAllowed) { | ||
| // Valid! Success ParseResult indicates nilled | ||
| } else if (isFieldEmpty && !isEmptyAllowed) { | ||
| // Fail! | ||
| PE(start, "%s - Empty field found but not allowed", eName) | ||
| } else if (isFieldNilLit(field)) { | ||
| // Contains a nilValue, Success ParseResult indicates nilled | ||
| } else { | ||
| // Fail! | ||
| PE(start, "%s - Does not contain a nil literal", eName) | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not getting why this improves performance.
It sounds to me like the reuse mechanism in this code was not working since if it was, your improvement here would not have helped, and that suggests that there is more improvement on the table if we get the reuse of these buffers to work better, or work at all, since I suspect it's not working at all.
I suggest: first take out all the reuse - just allocate and let the GC reclaim exactly-sized buffers and see if that's faster/slower. (checkpoint on the reuse mechanism overhead vs. just allocate+GC)
Second, instrument the reuse of these buffers so we count exactly how many of these are allocated vs. reused from the stack/pool. My bet is the reuse isn't working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reuse worked (I'll double check as you suggest), the issue is that the reuse is only within a single parse--it's only local to the PState. So every time we call parse() we will allocate a new 1MB buffer (if there is a specific length string in the format), and reuse that buffer for the lifetime of that single parse. When that parse ends, we let it get garbage collected along with the PState, and a new one will be allocated the next time parse() is called.
For files that are very small (e.g. <1KB) the 1 MB allocation makes a noticable difference. For larger files this doesn't make as much of a difference because 1MB isn't very much compared to all the other allocations daffodil does.
An alternative would be to make these a thread local and then share them among threads, then we could go back to the 1MB buffer size and you won't only take a single hit per thread. But I don't think the 1MB buffer size is all that important. It's going to be very rare for a single specified length string to be very big, likely no where close to 1MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check if with vs without reuse changed performance, but I did add a print statement every time we call this getBuf function printed if we have to allocate a new buffer or not. Testing with NITF (which has lots of specified length strings and some relatively small files available), it allocated a 1024 byte buffer once and then reused it about 100+ times with no additional allocates. We might need additional testing to make sure it's working correctly everywhere for other formats, or it might be worth investigating other reuse (e.g. make LocalBuffer reuse among threads), but I think the
LocalBufferreuse logic is working as intended.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the particular LocalBuffer that was related to the 1MB specified length string allocation is actually from the InputSourceDataInputStream, not the PState. So technically it's only a single 1MB allocation per ISDIS created. If you are streaming data and reusing the same ISDIS you'll only have have one allocation. But if you create a new ISDIS per parse (which non-streaming use cases will do), then you will effectively get one allocation per parse() call.
So this performance only improves relatively small files that hvae specified length strings that for non-streaming data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested allocating the exact size needed vs allocating one large buffer and reusing. I really didn't see much of a difference. Allocate once + reuse was about the same or like 3-5% faster, but that could be within the JVM noise.
It's also possible it's dependent on the specific file and format. For very small formats with little reuse it's probably faster to just allocate exact sizes as needed. But if there are a lot of strings, maybe avoiding the allocations and GC overhead makes a differences. I can't really say without more testing, but I don't think there's a clear winner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks for double checking that the reuse is at least working. When we call parse repeatedly in a loop such as to parse a stream of messages, is that going to allocate this over and over, or are we reusing the PState in that case.
I answered my own question... we have to reuse the Pstate because the I/O position on the input has to be preserved.
However, other applications will call parse once per input record because they are carving off the input data record/message themselves based on UDP datagram size or other means. In that case those are brand new calls to parse with new PState allocated. This is also a fairly tight loop, so yes, eliminating this 1MB allocation and GC for each parse call should be an improvement.
We should consider pooling entire PState objects and resetting them so that we don't have to reallocate any of this stuff.