-
Notifications
You must be signed in to change notification settings - Fork 28
RSDK-11991: Historical module data #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
stuqdog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... maybe it's just that I don't have any real BSON experience but on reviewing this, it's not really clear to me how the semantics actually work. Do we expect that a typical user would know how to construct the appropriate BSONBytes for a query? If so, disregard! If not, we should maybe spend some time thinking about how to make this a bit more intuitive.
| static DataClient::BSONBytes default_query( | ||
| const std::string& part_id, | ||
| const std::string& resource, | ||
| std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds> time_point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Milliseconds seems maybe overly precise to me. It's a bit unwieldy to work with and I imagine the cases where a user will care about sub-second precision in a query are probably not too high?
Also: it's a little confusing that the time_point argument for the default query sets the time since, and is distinct from the time_back field. I could easily imagine someone seeing that time_back is the name of the field and so, e.g., create a time point with an arg of one hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also it would be nice to create an override of default_query that doesn't ask for a time_point but sets a default of 24 hours ago, akin to what we do in the go SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points! So default_query is kind of an implementation detail, maybe we could move it to a private header entirely. Users shouldn't actually call default_query, because the other method will do it for them, I just wanted to be able to unit test it.
I chose milliseconds just on the basis of the BSON spec for timestamp, which is in milliseconds, but you can do
using namespace std::chrono;
func_taking_ms(milliseconds{hours{5}});but maybe this could be a remark in the header.
The use of time point vs duration also comes down to testing, because I can't reliably do unit tests on something that calls now() - duration.
I could easily imagine someone seeing that time_back is the name of the field and so, e.g., create a time point with an arg of one hour.
This is a situation where the C++ type system would prevent someone from doing that, but also based on this discussion I think I should just hide this helper function away somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a situation where the C++ type system would prevent someone from doing that
So I agree that the type system won't let someone pass milliseconds as opposed to a time_point but I'm worried about the inversion logic of a time_point being "time since epoch", but the argument time_back being "time before now". So yeah, someone couldn't accidentally do
using namespace std:chrono;
auto hour = milliseconds(hours{1});
auto query = default_query("id", "resource", hour);but I think they could accidentally do
auto query = default_query("id", "resource", {hour});thinking that this will give them everything in the past hour but actually it's giving them everything since an hour past the epoch (if I'm mistaken and you can't curly brace construct a time_point that way, let me know!).
I'm certainly willing to be convinced that this is overly defensive thinking on my part, but also I think just hiding this helper away from users is probably the best solution.
| const std::string& resource, | ||
| std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds> time_point); | ||
|
|
||
| DataConsumer(DataClient& dc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... can we not use ViamClient's from_env method to get the DataClient within the constructor (similar to how we do in python) to avoid asking users to pass around the DataClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could also provide DataConsumer::from_env, the difference would just be that this implies DataConsumer owns its DataClient which is probably a moot point in a GC language like Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's probably fine if the DataConsumer owns/is responsible for the DataClient? Maybe there's a C++ memory reason to not like that but from a code design perspective I don't have a problem with it, I'd hope that in general users kind of just ignore the internals of the DataConsumer which becomes easier if there's a constructor that uses env vars to avoid asking for anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could give DataConsumer both an optional DataClient and a DataClient&. Then if you have DataConsumer::from_env the optional is engaged and the reference points into the internal field. If you call DataConsumer::DataConsumer(DataClient&) then the optional is disengaged and the caller is responsible for ensuring that the lifetime of the DataConsumer is a subset of the lifetime of the DataClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to implement the hybrid-ownership optional but I think we run into the same problem with DataClient, which has a relationship of strictly observing (but not owning) a ViamClient, see
https://github.com/viamrobotics/viam-cpp-sdk/blob/main/src/viam/sdk/app/viam_client.hpp
https://github.com/viamrobotics/viam-cpp-sdk/blob/main/src/viam/sdk/app/data_client.cpp
ViamClienthas aViamChannelDataClientobserves aViamChannelobtained from aViamClient
So for DataConsumer to potentially own its DataClient, it would need space for optionally storing an owned DataClient and an owned ViamClient.
I'm inclined to leave it as is, unless we want to go whole hog the other way and mandate that DataConsumer always owns its client(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd vote to leave it as-is. Can you add though a doxygen comment explaining that the DataConsumer must not outlive the DataClient?
stuqdog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally this looks good to me, though I do think the default_query method should probably be made private.
.clang-tidy
Outdated
| # readability-implicit-bool-conversion: We have decided that !ptr-type is cleaner than ptr-type==nullptr | ||
| # readability-magic-numbers: This encourages useless variables and extra lint lines | ||
| # misc-include-cleaner: TODO(RSDK-5479) this is overly finnicky, add IWYU support and fix. | ||
| # misc-no-recursion: global ban on recursion seems absurd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a reason for this. Stack is a limited resource, stack overflows are often silent, and stack corruption is often exploitable. So, unbounded recursion on user controlled input is a potent vector for security issues. It is made worse by the fact that there isn't a portable way to ask "how much stack space do I have left", or "how much stack will I need to invoke this function", so there isn't a natural way to decide what the bounds should be. So there are often good arguments for abandoning recursion in favor of either 1) iteration, or 2) moving the state to an explicit stack held in the heap.
If there is just one place this is happening, maybe a NOLINT is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is good to know! i'll just nolint where needed
| const std::string& resource, | ||
| std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds> time_point); | ||
|
|
||
| DataConsumer(DataClient& dc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could give DataConsumer both an optional DataClient and a DataClient&. Then if you have DataConsumer::from_env the optional is engaged and the reference points into the internal field. If you call DataConsumer::DataConsumer(DataClient&) then the optional is disengaged and the caller is responsible for ensuring that the lifetime of the DataConsumer is a subset of the lifetime of the DataClient.
| const std::string org_id = get_env("VIAM_PRIMARY_ORG_ID").value_or(""); | ||
| const std::string part_id = get_env("VIAM_MACHINE_PART_ID").value_or(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling this has already been debated, but this makes me kind of sad. I really dislike using environment variables to communicate configuration between processes like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I feel there surely must be a better way but this is what we're doing in the other SDKs
| namespace sdk { | ||
| namespace impl { | ||
|
|
||
| struct writer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is a hassle, but, I'd appreciate it if we explicitly ensured we were writing as little endian, per the bson spec, from the start. I know you think: hey, we will never ever care about that. We thought the same once, in another codebase, only to need to spend more than a year fixing the mistake. I believe boost has an endian library you can use to make this portable.
| } | ||
|
|
||
| void write_entry(const std::string& key, const std::string& val) { | ||
| write_header(int8_t{2}, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating an int8_t enum rather than using numeric constants. You can include only the types you care about.
And if you wanted to get fancy, you could have a trait that mapped T to enum value so you could do write_header(type_to_typecode_v<decltype(val)>) or whatever.
|
|
||
| write_bytes(uint8_t{0}); // end of object | ||
|
|
||
| *reinterpret_cast<int32_t*>(&buf.front()) = static_cast<int32_t>(buf.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No guarantee of alignment here, you should use copy.
| namespace sdk { | ||
| namespace impl { | ||
|
|
||
| struct writer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writer can be in the unnamed namespace.
acmorrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod what looks like an oversight.
| const std::string& resource, | ||
| std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds> time_point); | ||
|
|
||
| DataConsumer(DataClient& dc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd vote to leave it as-is. Can you add though a doxygen comment explaining that the DataConsumer must not outlive the DataClient?
| -readability-implicit-bool-conversion, | ||
| -readability-magic-numbers, | ||
| -misc-include-cleaner, | ||
| -misc-no-recursion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needed?
Implements historical module data for C++ modules, using the DataConsumer class
Most of this PR consists in implementing rudimentary BSON serialization, which is tested in the unit tests