-
Notifications
You must be signed in to change notification settings - Fork 24
CREST Adapter #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CREST Adapter #807
Conversation
d6a83b4 to
c231a43
Compare
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, an unused global variable should either be removed, renamed to indicate it is intentionally unused, or explicitly marked as public API if it is meant to be used by other modules. For a settings/config module, globals are typically defined precisely so that other modules can import them, so the appropriate fix is usually to make that intention explicit.
The single best fix here, without changing existing functionality, is to add CREST_PATH (and, for consistency, CREST_ENV_PATH) to the module’s __all__ list. If __all__ already exists in arc/settings/settings.py, we would add these names to it; if it does not, we will introduce a new __all__ definition near the bottom of the file (after the assignment to CREST_PATH, CREST_ENV_PATH). This explicitly declares these globals as part of the public API and will generally cause CodeQL’s “unused global variable” check to accept them as intentional. No new imports or helper methods are needed; we only add a small __all__ declaration.
Concretely, in arc/settings/settings.py, just below line 553 where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, we will insert a __all__ list that includes these two names. We will show this as a replacement block that replaces that final assignment line with the same assignment followed by the __all__ declaration.
-
Copy modified lines R554-R558
| @@ -551,3 +551,8 @@ | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() | ||
|
|
||
| __all__ = [ | ||
| "CREST_PATH", | ||
| "CREST_ENV_PATH", | ||
| ] |
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, for an unused global that is intentionally kept (for documentation or for potential future use), rename it to follow an "unused" naming convention so static analysis tools and human readers understand it is not a mistake. If the value has side effects on the right-hand side, keep the right-hand side intact and only change or remove the left-hand side variable name.
Here, find_crest_executable() must still be called so that CREST_PATH continues to be initialized exactly as before; we only need to make it clear that the second return value is not used. The simplest non‑breaking change is to bind the environment component to a clearly unused name such as _unused_crest_env_path. This preserves all side effects and behavior of find_crest_executable(), keeps CREST_PATH available exactly as before, and satisfies the convention for unused variables described in the background.
Concretely, in arc/settings/settings.py at the bottom where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, change CREST_ENV_PATH to _unused_crest_env_path. No imports or additional methods are required.
-
Copy modified line R553
| @@ -550,4 +550,4 @@ | ||
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() | ||
| CREST_PATH, _unused_crest_env_path = find_crest_executable() |
3e88c36 to
7674f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
713209f to
9c25f3d
Compare
Adds a CREST TS search adapter for finding transition state conformers, leveraging heuristics-generated guesses and CREST's conformer search capabilities. This allows ARC to explore TS geometries using CREST. Adds method source to TS guesses from CREST Ensures CREST is recorded as a method source when a TS guess from CREST is deemed unique and considered as a possible TS guess. This change ensures proper provenance tracking for TS guesses. Improves error logging in CREST adapter Enhances error messages in the CREST adapter to provide more context when failing to determine H abstraction atoms. The error message now includes the reaction label and crest seed iteration number, which aids in debugging.
Adds a script to install CREST using a dedicated conda environment. Integrates CREST into the `install_all.sh` script and the help message. Also adds a `--no-rmg` flag in `install_all.sh` to optionally skip RMG-Py installation, and ensures PyRDL is installed only when ARC is installed. Corrects a problem with the AutoTST activation/deactivation hooks and ensures that RMG-Py's path is correctly removed from the PATH and PYTHONPATH. Also ensures TS-GCN can be correctly installed into its environment.
Adds the CREST adapter as an option for transition state (TS) guess generation, particularly for H_Abstraction reactions, improving the exploration of potential TS geometries. Includes CREST in the default list of in-core adapters and allows it to be used in heuristics. The heuristic adapter now passes the local path of the files to the h_abstraction method so the geometries are saved to the appropriate folders. Saves the method used to generate the TS guesses in the file name so the geometries generated with CREST are clearly labeled as such.
9c25f3d to
51368be
Compare
Adds a function to reorder XYZ strings between atom-first and coordinate-first formats, with optional unit conversion. Also introduces a backward compatible wrapper with a deprecation warning. Uses constants for unit conversion factors Replaces hardcoded unit conversion factors with values from the `constants` module. Also, determines atom ordering more robustly by inspecting only the first line. Corrects angstrom to bohr conversion factor The conversion factor between angstrom and bohr was inverted, which led to incorrect geometry conversions. This commit corrects the conversion factor to ensure proper unit conversion when dealing with xyz coordinates.
Removes input files and submit script related to a specific server time limit test case. The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.
Enables the usage of CREST as a TS adapter. The implementation includes: - Adding 'crest' to the list of available TS adapters. - Implementing a function to automatically detect the CREST executable in various locations (standalone builds, conda environments, PATH). - Providing instructions for activating the CREST environment, if necessary. This allows ARC to leverage CREST for enhanced transition state search capabilities. Improves crest executable search logic Refactors the crest executable search to more reliably locate the correct version by normalizing path joining. Updates the documentation for clarity.
Adds a `method_sources` attribute to the `TSGuess` class to track all methods/sources that generated a particular transition state guess, allowing for more comprehensive provenance information. This improves the clustering of similar TS guesses. Normalizes method sources to a unique, ordered, lowercase list.
Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.
Adds the angstrom to bohr conversion factor. This conversion factor is required for the CREST adapter.
Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 25 out of 26 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if method_label.lower() not in other_tsg.method.lower(): | ||
| other_tsg.method += f' and {method_label}' |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a duplicate TS guess is found, the code appends to the method string (e.g., 'Heuristics and CREST'). However, it doesn't update the method_sources list, which is now the canonical way to track multiple contributing methods. Consider updating method_sources here using TSGuess._normalize_method_sources() to maintain consistency with the new approach used in cluster_tsgs() and the CREST adapter.
| if method_label.lower() not in other_tsg.method.lower(): | |
| other_tsg.method += f' and {method_label}' | |
| if method_label.lower() not in other_tsg.method.lower(): | |
| base_method = other_tsg.method | |
| other_tsg.method += f' and {method_label}' | |
| existing_sources = getattr(other_tsg, "method_sources", None) | |
| if existing_sources: | |
| combined_sources = list(existing_sources) + [method_label] | |
| else: | |
| combined_sources = [base_method, method_label] | |
| other_tsg.method_sources = TSGuess._normalize_method_sources(combined_sources) |
| r2_stretch: float = 1.2, | ||
| a2: float = 180, | ||
| dihedral_increment: Optional[int] = None, | ||
| path: str = "" |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path parameter is defined but never used within the function body. This parameter should either be removed from the function signature or its intended usage should be implemented.
| # Priority 1: /Local/ce_dana standalone builds | ||
| crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest") |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CREST executable search prioritizes a hard-coded path '/Local/ce_dana' which appears to be environment-specific. Consider making this configurable via an environment variable or moving it to lower priority to avoid issues in environments where this path doesn't exist or isn't relevant.
| # Priority 1: /Local/ce_dana standalone builds | |
| crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest") | |
| # Priority 1: standalone builds in a configurable directory (default: /Local/ce_dana) | |
| standalone_dir = os.getenv("ARC_CREST_STANDALONE_DIR", "/Local/ce_dana") | |
| crest_path = find_highest_version_in_directory(standalone_dir, "crest") |
| int: The first integer in the string | ||
|
|
||
| """ | ||
| return int(re.sub(r"[^\d]", "", s)) |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extract_digits function will raise a ValueError if the input string contains no digits, as int("") will fail. Consider adding error handling or validation to ensure the string contains at least one digit before attempting conversion.
| return int(re.sub(r"[^\d]", "", s)) | |
| digit_str = re.sub(r"[^\d]", "", s) | |
| if not digit_str: | |
| raise ValueError(f"No digits found in string: {s!r}") | |
| return int(digit_str) |
| occurrence_distances = [] | ||
| for occurrence in condition_occurrences: | ||
| # Calculate the sum of distances to the two heavy atoms | ||
| hydrogen_key = f"{occurrence['H']}" | ||
| heavy_atoms = [f"{occurrence['A']}", f"{occurrence['B']}"] | ||
| try: | ||
| distances = dataframe.loc[hydrogen_key, heavy_atoms].sum() | ||
| occurrence_distances.append((occurrence, distances)) | ||
| except KeyError as e: | ||
| logger.error(f"Error accessing distances for occurrence {occurrence}: {e}") | ||
|
|
||
| # Select the occurrence with the smallest distance | ||
| best_occurrence = min(occurrence_distances, key=lambda x: x[1])[0] | ||
| return { | ||
| "H": extract_digits(best_occurrence["H"]), | ||
| "A": extract_digits(best_occurrence["A"]), | ||
| "B": extract_digits(best_occurrence["B"]), | ||
| } |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all occurrences fail to compute distances (all KeyError exceptions at line 577), the occurrence_distances list will be empty, causing min() at line 581 to raise a ValueError. Consider adding a check to handle the case where occurrence_distances is empty, or raising a more descriptive error.
| @@ -51,15 +51,36 @@ write_hook () { | |||
| cat >"$act" <<EOF | |||
| # AutoTST hook – $(date +%F) | |||
| export AUTOTST_ROOT="$repo_path" | |||
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The activation hook writes export AUTOTST_ROOT="$repo_path" into a shell script using the raw repo_path (derived from pwd), which will then be sourced on environment activation. If repo_path contains characters like $(…) or backticks (possible in directory names), those will be interpreted as command substitutions when the hook is executed, leading to command injection and arbitrary code execution in the user’s shell. To avoid this, ensure that the path is safely escaped or encoded before embedding it in a shell script (for example, avoid writing untrusted paths directly inside double quotes in sourced files).
| AUTOTST_ESCAPED_PATH="$(pwd | sed 's#/#\\\\/#g')" | ||
| if sed --version >/dev/null 2>&1; then | ||
| sed -i "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc | ||
| else | ||
| sed -i '' "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc | ||
| fi |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the path-mode setup, the script replaces __AUTOTST_PATH__ in ~/.bashrc with the raw output of pwd, which is then used in a line export AUTOTST_ROOT="__AUTOTST_PATH__" added to the shell init file. If the working directory name includes characters such as $(…) or backticks, they will become part of ~/.bashrc and be executed as command substitutions when a new shell is started, enabling arbitrary code execution. You should sanitize or safely escape the directory path before writing it into ~/.bashrc, rather than interpolating pwd directly into a sourced shell script.
Addition of CREST Adapter that complements the heuristic adapter.