Skip to content

Conversation

@cchung100m
Copy link
Contributor

@cchung100m cchung100m commented Dec 11, 2025

What changes were proposed in this pull request?

  1. Fetch the datanode UUID from the VERSION file under hdds.datanode.dir directory
  2. Write the UUID to the datanode.id file

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13955

How was this patch tested?

Existing unit and integration tests

@cchung100m cchung100m marked this pull request as ready for review December 12, 2025 12:56
Copy link
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cchung100m for wokring on this.
Left some comments below.

Comment on lines 181 to 185
String dataNodeDir = conf.get(ScmConfigKeys.HDDS_DATANODE_DIR_KEY);
if (dataNodeDir == null || dataNodeDir.isEmpty()) {
throw new IOException("hdds.datanode.dir is not configured.", e);
}
File versionFile = new File(dataNodeDir, "hdds/VERSION");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdds.datanode.dir can be a comma-separated list (e.g., /data/hdds1,/data/hdds2,/data/hdds3). conf.get() returns the entire string, creating an invalid path like /data/hdds1,/data/hdds2,/data/hdds3/hdds/VERSION.

Solution: Follow the pattern used in GeneratorDatanode.java

  • use HddsServerUtil.getDatanodeStorageDirs() to parse the comma-separated list and read from the first directory (all volumes VERSION file share the same datanode UUID).

Step 1: Get all storage directories (handles comma-separated list)
Step 2: Try each storage directory until we find a valid VERSION file with datanodeUUID.
Step 3: Create a new DaatnodeDetails with UUID found.
Step 4: Write the recovered Datanode ID to the datanode.id file

Comment on lines 90 to 93
if (datanodeDetailsYaml == null) {
throw new EmptyDatanodeIdFileException(
"Datanode ID file is empty: " + path.getAbsolutePath());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may happen that object exists but UUID might be null or empty. So we can have these checks as well.

Suggested change
if (datanodeDetailsYaml == null) {
throw new EmptyDatanodeIdFileException(
"Datanode ID file is empty: " + path.getAbsolutePath());
}
if (datanodeDetailsYaml == null
|| datanodeDetailsYaml.getUuid() == null
|| datanodeDetailsYaml.getUuid().isEmpty()) {
throw new EmptyDatanodeIdFileException(
"Datanode ID file is empty: " + path.getAbsolutePath());
}

/**
* Exception thrown when a Datanode ID file is empty.
*/
public class EmptyDatanodeIdFileException extends IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add a new exception class. Let's remove this and it also matches the existing codebase pattern.
Use IOException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Gargi-jais11

Thanks for the prompt reply and suggestions. I've updated the part you mentioned.

}

if (datanodeDetailsYaml == null) {
throw new EmptyDatanodeIdFileException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove EmptyDatanodeIdFileException and use

IOException("Datanode ID file is empty or has null UUID: " + path.getAbsolutePath());

@Gargi-jais11
Copy link
Contributor

@cchung100m Please add a unit test in TestContainerUtils for the following scenario:

  1. Setup storage directory structure
  2. Create VERSION file with datanode UUID
  3. Simulate corrupted/empty datanode.id file. Creates an empty datanode.id file (0 bytes). Verifies the file is empty
  4. Calls ContainerUtils.readDatanodeDetailsFrom(datanodeIdFile, testConf)
  5. Verify recovery results
  • Assertions:
  • Recovered UUID matches the UUID in the VERSION file
  • datanode.id is recreated with content (file size > 0)
  • The recreated file can be read normally and contains the correct UUID

@cchung100m cchung100m marked this pull request as draft December 14, 2025 05:07
@cchung100m cchung100m marked this pull request as ready for review December 14, 2025 08:20
@cchung100m cchung100m marked this pull request as draft December 14, 2025 08:25
@cchung100m cchung100m marked this pull request as ready for review December 14, 2025 09:52
@Gargi-jais11
Copy link
Contributor

@ChenSammi and @peterxcli Could you please review the patch?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cchung100m for the patch.

@adoroszlai adoroszlai requested review from ChenSammi, Gargi-jais11 and errose28 and removed request for Gargi-jais11 December 21, 2025 07:50
String dnUuid = null;
Collection<String> dataNodeDirs =
HddsServerUtil.getDatanodeStorageDirs(conf);
if (dataNodeDirs.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HddsServerUtil.getDatanodeStorageDirs checks whether return list is empty before return, so the double check here can be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChenSammi
Thanks for the suggestions. I've updated the part you mentioned.

throw new IOException("hdds.datanode.dir is not configured.");
}
for (String dataNodeDir : dataNodeDirs) {
File versionFile = new File(dataNodeDir, "hdds/VERSION");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use VERSION_FILE and HDDS_VOLUME_DIR to replace the plain string.

} catch (IOException io) {
throw new IOException("Failed to parse DatanodeDetails from "
+ path.getAbsolutePath(), io);
LOG.warn("Failed to read Datanode ID file as YAML. Reason: {}. " +
Copy link
Contributor

@ChenSammi ChenSammi Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use "e" instead of e.getMessage() here. Stack trace will reveal more info than e.getMessage() during debug. e.getMessage() is usually used when constructing error message propagated from client to server. Same for L185.

@ChenSammi
Copy link
Contributor

Thanks @cchung100m for working on this task. Overall it looks good to me. Just a few minor comments.

Copy link
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cchung100m for updating the patch. Once the above few comments are resolved it looks good to go.

@cchung100m cchung100m marked this pull request as draft December 30, 2025 15:12
@cchung100m cchung100m marked this pull request as ready for review December 30, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants