-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13955. Handle empty datanode.id file gracefully #9479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3f842d5 to
ee916ad
Compare
Gargi-jais11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cchung100m for wokring on this.
Left some comments below.
| String dataNodeDir = conf.get(ScmConfigKeys.HDDS_DATANODE_DIR_KEY); | ||
| if (dataNodeDir == null || dataNodeDir.isEmpty()) { | ||
| throw new IOException("hdds.datanode.dir is not configured.", e); | ||
| } | ||
| File versionFile = new File(dataNodeDir, "hdds/VERSION"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hdds.datanode.dir can be a comma-separated list (e.g., /data/hdds1,/data/hdds2,/data/hdds3). conf.get() returns the entire string, creating an invalid path like /data/hdds1,/data/hdds2,/data/hdds3/hdds/VERSION.
Solution: Follow the pattern used in GeneratorDatanode.java
- use
HddsServerUtil.getDatanodeStorageDirs()to parse the comma-separated list and read from the first directory (all volumes VERSION file share the same datanode UUID).
Step 1: Get all storage directories (handles comma-separated list)
Step 2: Try each storage directory until we find a valid VERSION file with datanodeUUID.
Step 3: Create a new DaatnodeDetails with UUID found.
Step 4: Write the recovered Datanode ID to the datanode.id file
| if (datanodeDetailsYaml == null) { | ||
| throw new EmptyDatanodeIdFileException( | ||
| "Datanode ID file is empty: " + path.getAbsolutePath()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may happen that object exists but UUID might be null or empty. So we can have these checks as well.
| if (datanodeDetailsYaml == null) { | |
| throw new EmptyDatanodeIdFileException( | |
| "Datanode ID file is empty: " + path.getAbsolutePath()); | |
| } | |
| if (datanodeDetailsYaml == null | |
| || datanodeDetailsYaml.getUuid() == null | |
| || datanodeDetailsYaml.getUuid().isEmpty()) { | |
| throw new EmptyDatanodeIdFileException( | |
| "Datanode ID file is empty: " + path.getAbsolutePath()); | |
| } |
| /** | ||
| * Exception thrown when a Datanode ID file is empty. | ||
| */ | ||
| public class EmptyDatanodeIdFileException extends IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add a new exception class. Let's remove this and it also matches the existing codebase pattern.
Use IOException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt reply and suggestions. I've updated the part you mentioned.
| } | ||
|
|
||
| if (datanodeDetailsYaml == null) { | ||
| throw new EmptyDatanodeIdFileException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove EmptyDatanodeIdFileException and use
IOException("Datanode ID file is empty or has null UUID: " + path.getAbsolutePath());
|
@cchung100m Please add a unit test in
|
201536c to
bd9c536
Compare
bd9c536 to
89fec25
Compare
|
@ChenSammi and @peterxcli Could you please review the patch? |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cchung100m for the patch.
...r-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/ContainerUtils.java
Outdated
Show resolved
Hide resolved
| String dnUuid = null; | ||
| Collection<String> dataNodeDirs = | ||
| HddsServerUtil.getDatanodeStorageDirs(conf); | ||
| if (dataNodeDirs.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HddsServerUtil.getDatanodeStorageDirs checks whether return list is empty before return, so the double check here can be omitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ChenSammi
Thanks for the suggestions. I've updated the part you mentioned.
| throw new IOException("hdds.datanode.dir is not configured."); | ||
| } | ||
| for (String dataNodeDir : dataNodeDirs) { | ||
| File versionFile = new File(dataNodeDir, "hdds/VERSION"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use VERSION_FILE and HDDS_VOLUME_DIR to replace the plain string.
| } catch (IOException io) { | ||
| throw new IOException("Failed to parse DatanodeDetails from " | ||
| + path.getAbsolutePath(), io); | ||
| LOG.warn("Failed to read Datanode ID file as YAML. Reason: {}. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use "e" instead of e.getMessage() here. Stack trace will reveal more info than e.getMessage() during debug. e.getMessage() is usually used when constructing error message propagated from client to server. Same for L185.
|
Thanks @cchung100m for working on this task. Overall it looks good to me. Just a few minor comments. |
Gargi-jais11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cchung100m for updating the patch. Once the above few comments are resolved it looks good to go.
What changes were proposed in this pull request?
hdds.datanode.dirdirectoryWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13955
How was this patch tested?
Existing unit and integration tests