Skip to content

Conversation

@rlmenge
Copy link
Contributor

@rlmenge rlmenge commented Dec 26, 2025

Add INET_DIAG_DESTROY Test Suite
Adds tests for CONFIG_INET_DIAG_DESTROY kernel feature - validates administrative TCP connection termination via ss -K.

INET_DIAG_DESTROY
"Provides a SOCK_DESTROY operation that allows privileged processes (e.g., a connection manager or a network administration tool such as ss) to close sockets opened by other processes. Closing a socket in this way interrupts any blocking read/write/connect operations on the socket and causes future socket calls to behave as if the socket had been disconnected. If unsure, say N."

Tests
verify_inet_diag_destroy (P2): Creates TCP connection, destroys with ss -K, verifies removal
verify_inet_diag_enabled (P3): Validates CONFIG_INET_DIAG and ss functionality

Validation
Tested on Fedora 43 and Ubuntu

@rlmenge rlmenge changed the title WIP: Initial inet_diag test suite Initial inet_diag test suite Jan 1, 2026
@rlmenge rlmenge marked this pull request as ready for review January 2, 2026 19:54
# Example output from ss -tn:
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# ESTAB 0 0 127.0.0.1:34567 127.0.0.1:45678
_connection_pattern = re.compile(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused code

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new test suite for validating the Linux kernel's INET_DIAG_DESTROY feature, which enables administrative termination of TCP connections via netlink socket interface. The implementation includes a new ss tool wrapper and comprehensive test cases to verify socket diagnostic capabilities.

Key Changes:

  • New Ss tool class with methods for checking connection state, killing connections, and detecting kernel feature support
  • Two test cases: verify_inet_diag_destroy (P2) validates connection termination functionality, and verify_inet_diag_enabled (P3) verifies basic INET_DIAG support
  • Helper Python script that creates and maintains TCP connections for testing purposes

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
lisa/tools/ss.py Implements the Ss tool wrapper with methods for connection management, kill support detection, and BSD compatibility
lisa/tools/__init__.py Adds Ss import and exports it in the public API
lisa/microsoft/testsuites/network/inet_diag.py Comprehensive test suite with helper methods for port discovery, connection state verification, and cleanup
lisa/microsoft/testsuites/network/TestScripts/lisa_tcp_test.py Python script that creates persistent TCP server-client connections on specified ports for testing

Comment on lines 271 to 277
if connection_process:
node.execute(
f"pkill -f {script_path}",
sudo=True,
)
# Remove temporary script
node.execute(f"rm -f {script_path}", sudo=True)
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup in the finally block may fail silently if the process was never created or if pkill fails. The pkill command should be run with expected_exit_code_failure_message or at minimum log whether cleanup succeeded. Additionally, consider checking if connection_process is not None before attempting cleanup operations that depend on it having been created.

Suggested change
if connection_process:
node.execute(
f"pkill -f {script_path}",
sudo=True,
)
# Remove temporary script
node.execute(f"rm -f {script_path}", sudo=True)
if connection_process is not None:
node.execute(
f"pkill -f {script_path}",
sudo=True,
expected_exit_code=0,
expected_exit_code_failure_message=(
f"Failed to clean up TCP test process using pkill for script "
f"{script_path}"
),
)
# Remove temporary script
node.execute(
f"rm -f {script_path}",
sudo=True,
expected_exit_code=0,
expected_exit_code_failure_message=(
f"Failed to remove temporary script {script_path} during cleanup"
),
)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As recommended by Copilot, do you want to check for cleanup error and mark node as dirty if cleanup fails?
or is it fine if the cleanup fails silently and node gets reused?

Comment on lines 258 to 262
if not connection_destroyed:
raise LisaException(
f"Connection on port {test_port} was not destroyed "
f"within timeout period"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it's not destroyed, is there any debug info that can be collected/logged that helps in debugging the test failure without requiring a repro?

Comment on lines 234 to 239
if not connection_ready:
raise LisaException(
f"TCP connection not established on port {test_port} "
f"within timeout period"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the connection is not established, what does it mean? Does it only happen due to a kernel bug?
Can we collect any debug information?

sudo=True,
)
# Remove temporary script
node.execute(f"rm -f {script_path}", sudo=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the RM tool instead

lisa/tools/ss.py Outdated
return False

@classmethod
def _freebsd_tool(cls) -> Optional[type[Tool]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it tested in bsd? The PR description mentions "Validation Tested on Fedora 43 and Ubuntu"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering I made the inet_diag tests unsupported_os=[BSD, Windows], I think it is fair to remove the BSD support for now for SS. Will address.

- Change log.warning() to log.debug() for timeout messages
- Move ss tool initialization after kernel config checks
- Remove unused _verify_connection_exists() helper method
- Add null check for connection_process before cleanup
- Use expected_exit_code_failure_message in pkill cleanup
- Add logging for pkill cleanup success/failure
- Use Rm tool instead of raw execute for file removal
- Add detailed debug info collection when connection fails to establish
- Add detailed debug info collection when connection fails to destroy
- Add BSD notes to test suite description
- Import Rm tool for proper file cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants