Skip to content

Conversation

@calvinp0
Copy link
Member

Addition of CREST Adapter that complements the heuristic adapter.

This comment was marked as resolved.

This comment was marked as resolved.

return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'CREST_PATH' is not used.

Copilot Autofix

AI 2 days ago

In general, an unused global variable should either be removed, renamed to indicate it is intentionally unused, or explicitly marked as public API if it is meant to be used by other modules. For a settings/config module, globals are typically defined precisely so that other modules can import them, so the appropriate fix is usually to make that intention explicit.

The single best fix here, without changing existing functionality, is to add CREST_PATH (and, for consistency, CREST_ENV_PATH) to the module’s __all__ list. If __all__ already exists in arc/settings/settings.py, we would add these names to it; if it does not, we will introduce a new __all__ definition near the bottom of the file (after the assignment to CREST_PATH, CREST_ENV_PATH). This explicitly declares these globals as part of the public API and will generally cause CodeQL’s “unused global variable” check to accept them as intentional. No new imports or helper methods are needed; we only add a small __all__ declaration.

Concretely, in arc/settings/settings.py, just below line 553 where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, we will insert a __all__ list that includes these two names. We will show this as a replacement block that replaces that final assignment line with the same assignment followed by the __all__ declaration.

Suggested changeset 1
arc/settings/settings.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/settings/settings.py b/arc/settings/settings.py
--- a/arc/settings/settings.py
+++ b/arc/settings/settings.py
@@ -551,3 +551,8 @@
 
 
 CREST_PATH, CREST_ENV_PATH = find_crest_executable()
+
+__all__ = [
+    "CREST_PATH",
+    "CREST_ENV_PATH",
+]
EOF
@@ -551,3 +551,8 @@


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

__all__ = [
"CREST_PATH",
"CREST_ENV_PATH",
]
Copilot is powered by AI and may make mistakes. Always verify output.
return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'CREST_ENV_PATH' is not used.

Copilot Autofix

AI 2 days ago

In general, for an unused global that is intentionally kept (for documentation or for potential future use), rename it to follow an "unused" naming convention so static analysis tools and human readers understand it is not a mistake. If the value has side effects on the right-hand side, keep the right-hand side intact and only change or remove the left-hand side variable name.

Here, find_crest_executable() must still be called so that CREST_PATH continues to be initialized exactly as before; we only need to make it clear that the second return value is not used. The simplest non‑breaking change is to bind the environment component to a clearly unused name such as _unused_crest_env_path. This preserves all side effects and behavior of find_crest_executable(), keeps CREST_PATH available exactly as before, and satisfies the convention for unused variables described in the background.

Concretely, in arc/settings/settings.py at the bottom where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, change CREST_ENV_PATH to _unused_crest_env_path. No imports or additional methods are required.

Suggested changeset 1
arc/settings/settings.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/settings/settings.py b/arc/settings/settings.py
--- a/arc/settings/settings.py
+++ b/arc/settings/settings.py
@@ -550,4 +550,4 @@
     return None, None
 
 
-CREST_PATH, CREST_ENV_PATH = find_crest_executable()
+CREST_PATH, _unused_crest_env_path = find_crest_executable()
EOF
@@ -550,4 +550,4 @@
return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()
CREST_PATH, _unused_crest_env_path = find_crest_executable()
Copilot is powered by AI and may make mistakes. Always verify output.
@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 3e88c36 to 7674f5f Compare February 2, 2026 09:49
@calvinp0 calvinp0 requested a review from Copilot February 2, 2026 12:10

This comment was marked as resolved.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 713209f to 9c25f3d Compare February 2, 2026 13:21
Adds a CREST TS search adapter for finding transition state conformers,
leveraging heuristics-generated guesses and CREST's conformer search capabilities.

This allows ARC to explore TS geometries using CREST.

Adds method source to TS guesses from CREST

Ensures CREST is recorded as a method source when a TS guess from CREST is deemed unique and considered as a possible TS guess.
This change ensures proper provenance tracking for TS guesses.

Improves error logging in CREST adapter

Enhances error messages in the CREST adapter to provide more context when failing to determine H abstraction atoms.
The error message now includes the reaction label and crest seed iteration number, which aids in debugging.
Adds a script to install CREST using a dedicated conda environment.
Integrates CREST into the `install_all.sh` script and the help message.
Also adds a `--no-rmg` flag in `install_all.sh` to optionally skip RMG-Py installation, and ensures PyRDL is installed only when ARC is installed.

Corrects a problem with the AutoTST activation/deactivation hooks and ensures that RMG-Py's path is correctly removed from the PATH and PYTHONPATH.
Also ensures TS-GCN can be correctly installed into its environment.
Adds the CREST adapter as an option for transition state (TS) guess generation, particularly for H_Abstraction reactions, improving the exploration of potential TS geometries.

Includes CREST in the default list of in-core adapters and allows it to be used in heuristics.

The heuristic adapter now passes the local path of the files to the h_abstraction method so the geometries are saved to the appropriate folders.
Saves the method used to generate the TS guesses in the file name so the geometries generated with CREST are clearly labeled as such.
Adds a function to reorder XYZ strings between atom-first and coordinate-first formats, with optional unit conversion.
Also introduces a backward compatible wrapper with a deprecation warning.

Uses constants for unit conversion factors

Replaces hardcoded unit conversion factors with values from the `constants` module. Also, determines atom ordering more robustly by inspecting only the first line.

Corrects angstrom to bohr conversion factor

The conversion factor between angstrom and bohr was inverted, which led to incorrect geometry conversions.

This commit corrects the conversion factor to ensure proper unit conversion when dealing with xyz coordinates.
Removes input files and submit script related to a specific server time limit test case.
The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.
Enables the usage of CREST as a TS adapter.

The implementation includes:
- Adding 'crest' to the list of available TS adapters.
- Implementing a function to automatically detect the CREST executable in various locations (standalone builds, conda environments, PATH).
- Providing instructions for activating the CREST environment, if necessary.

This allows ARC to leverage CREST for enhanced transition state search capabilities.

Improves crest executable search logic

Refactors the crest executable search to more reliably locate the correct version by normalizing path joining.

Updates the documentation for clarity.
Adds a `method_sources` attribute to the `TSGuess` class to track all
methods/sources that generated a particular transition state guess, allowing
for more comprehensive provenance information. This improves the clustering
of similar TS guesses. Normalizes method sources to a unique, ordered,
lowercase list.
Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.
Adds the angstrom to bohr conversion factor.

This conversion factor is required for the CREST adapter.
Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +307 to +308
if method_label.lower() not in other_tsg.method.lower():
other_tsg.method += f' and {method_label}'
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a duplicate TS guess is found, the code appends to the method string (e.g., 'Heuristics and CREST'). However, it doesn't update the method_sources list, which is now the canonical way to track multiple contributing methods. Consider updating method_sources here using TSGuess._normalize_method_sources() to maintain consistency with the new approach used in cluster_tsgs() and the CREST adapter.

Suggested change
if method_label.lower() not in other_tsg.method.lower():
other_tsg.method += f' and {method_label}'
if method_label.lower() not in other_tsg.method.lower():
base_method = other_tsg.method
other_tsg.method += f' and {method_label}'
existing_sources = getattr(other_tsg, "method_sources", None)
if existing_sources:
combined_sources = list(existing_sources) + [method_label]
else:
combined_sources = [base_method, method_label]
other_tsg.method_sources = TSGuess._normalize_method_sources(combined_sources)

Copilot uses AI. Check for mistakes.
r2_stretch: float = 1.2,
a2: float = 180,
dihedral_increment: Optional[int] = None,
path: str = ""
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path parameter is defined but never used within the function body. This parameter should either be removed from the function signature or its intended usage should be implemented.

Copilot uses AI. Check for mistakes.
Comment on lines +501 to +502
# Priority 1: /Local/ce_dana standalone builds
crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CREST executable search prioritizes a hard-coded path '/Local/ce_dana' which appears to be environment-specific. Consider making this configurable via an environment variable or moving it to lower priority to avoid issues in environments where this path doesn't exist or isn't relevant.

Suggested change
# Priority 1: /Local/ce_dana standalone builds
crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")
# Priority 1: standalone builds in a configurable directory (default: /Local/ce_dana)
standalone_dir = os.getenv("ARC_CREST_STANDALONE_DIR", "/Local/ce_dana")
crest_path = find_highest_version_in_directory(standalone_dir, "crest")

Copilot uses AI. Check for mistakes.
int: The first integer in the string

"""
return int(re.sub(r"[^\d]", "", s))
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extract_digits function will raise a ValueError if the input string contains no digits, as int("") will fail. Consider adding error handling or validation to ensure the string contains at least one digit before attempting conversion.

Suggested change
return int(re.sub(r"[^\d]", "", s))
digit_str = re.sub(r"[^\d]", "", s)
if not digit_str:
raise ValueError(f"No digits found in string: {s!r}")
return int(digit_str)

Copilot uses AI. Check for mistakes.
Comment on lines +569 to +586
occurrence_distances = []
for occurrence in condition_occurrences:
# Calculate the sum of distances to the two heavy atoms
hydrogen_key = f"{occurrence['H']}"
heavy_atoms = [f"{occurrence['A']}", f"{occurrence['B']}"]
try:
distances = dataframe.loc[hydrogen_key, heavy_atoms].sum()
occurrence_distances.append((occurrence, distances))
except KeyError as e:
logger.error(f"Error accessing distances for occurrence {occurrence}: {e}")

# Select the occurrence with the smallest distance
best_occurrence = min(occurrence_distances, key=lambda x: x[1])[0]
return {
"H": extract_digits(best_occurrence["H"]),
"A": extract_digits(best_occurrence["A"]),
"B": extract_digits(best_occurrence["B"]),
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all occurrences fail to compute distances (all KeyError exceptions at line 577), the occurrence_distances list will be empty, causing min() at line 581 to raise a ValueError. Consider adding a check to handle the case where occurrence_distances is empty, or raising a more descriptive error.

Copilot uses AI. Check for mistakes.
@@ -51,15 +51,36 @@ write_hook () {
cat >"$act" <<EOF
# AutoTST hook – $(date +%F)
export AUTOTST_ROOT="$repo_path"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The activation hook writes export AUTOTST_ROOT="$repo_path" into a shell script using the raw repo_path (derived from pwd), which will then be sourced on environment activation. If repo_path contains characters like $() or backticks (possible in directory names), those will be interpreted as command substitutions when the hook is executed, leading to command injection and arbitrary code execution in the user’s shell. To avoid this, ensure that the path is safely escaped or encoded before embedding it in a shell script (for example, avoid writing untrusted paths directly inside double quotes in sourced files).

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +182
AUTOTST_ESCAPED_PATH="$(pwd | sed 's#/#\\\\/#g')"
if sed --version >/dev/null 2>&1; then
sed -i "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc
else
sed -i '' "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc
fi
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the path-mode setup, the script replaces __AUTOTST_PATH__ in ~/.bashrc with the raw output of pwd, which is then used in a line export AUTOTST_ROOT="__AUTOTST_PATH__" added to the shell init file. If the working directory name includes characters such as $() or backticks, they will become part of ~/.bashrc and be executed as command substitutions when a new shell is started, enabling arbitrary code execution. You should sanitize or safely escape the directory path before writing it into ~/.bashrc, rather than interpolating pwd directly into a sourced shell script.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants