CREST Adapter #807

calvinp0 · 2025-11-27T22:16:29Z

Addition of CREST Adapter that complements the heuristic adapter.

arc/job/adapters/ts/crest.py

arc/settings/settings.py

arc/job/adapters/ts/autotst_ts.py

arc/job/adapters/ts/crest.py

arc/job/adapters/ts/heuristics.py

arc/job/adapters/ts/crest.py

arc/job/adapters/ts/heuristics.py

arc/settings/settings.py

+    return None, None
+
+
+CREST_PATH, CREST_ENV_PATH = find_crest_executable()


In general, an unused global variable should either be removed, renamed to indicate it is intentionally unused, or explicitly marked as public API if it is meant to be used by other modules. For a settings/config module, globals are typically defined precisely so that other modules can import them, so the appropriate fix is usually to make that intention explicit.

The single best fix here, without changing existing functionality, is to add CREST_PATH (and, for consistency, CREST_ENV_PATH) to the module’s __all__ list. If __all__ already exists in arc/settings/settings.py, we would add these names to it; if it does not, we will introduce a new __all__ definition near the bottom of the file (after the assignment to CREST_PATH, CREST_ENV_PATH). This explicitly declares these globals as part of the public API and will generally cause CodeQL’s “unused global variable” check to accept them as intentional. No new imports or helper methods are needed; we only add a small __all__ declaration.

Concretely, in arc/settings/settings.py, just below line 553 where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, we will insert a __all__ list that includes these two names. We will show this as a replacement block that replaces that final assignment line with the same assignment followed by the __all__ declaration.

arc/settings/settings.py

+    return None, None
+
+
+CREST_PATH, CREST_ENV_PATH = find_crest_executable()


In general, for an unused global that is intentionally kept (for documentation or for potential future use), rename it to follow an "unused" naming convention so static analysis tools and human readers understand it is not a mistake. If the value has side effects on the right-hand side, keep the right-hand side intact and only change or remove the left-hand side variable name.

Here, find_crest_executable() must still be called so that CREST_PATH continues to be initialized exactly as before; we only need to make it clear that the second return value is not used. The simplest non‑breaking change is to bind the environment component to a clearly unused name such as _unused_crest_env_path. This preserves all side effects and behavior of find_crest_executable(), keeps CREST_PATH available exactly as before, and satisfies the convention for unused variables described in the background.

Concretely, in arc/settings/settings.py at the bottom where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, change CREST_ENV_PATH to _unused_crest_env_path. No imports or additional methods are required.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Adds a CREST TS search adapter for finding transition state conformers, leveraging heuristics-generated guesses and CREST's conformer search capabilities. This allows ARC to explore TS geometries using CREST. Adds method source to TS guesses from CREST Ensures CREST is recorded as a method source when a TS guess from CREST is deemed unique and considered as a possible TS guess. This change ensures proper provenance tracking for TS guesses. Improves error logging in CREST adapter Enhances error messages in the CREST adapter to provide more context when failing to determine H abstraction atoms. The error message now includes the reaction label and crest seed iteration number, which aids in debugging.

Adds a script to install CREST using a dedicated conda environment. Integrates CREST into the `install_all.sh` script and the help message. Also adds a `--no-rmg` flag in `install_all.sh` to optionally skip RMG-Py installation, and ensures PyRDL is installed only when ARC is installed. Corrects a problem with the AutoTST activation/deactivation hooks and ensures that RMG-Py's path is correctly removed from the PATH and PYTHONPATH. Also ensures TS-GCN can be correctly installed into its environment.

Adds the CREST adapter as an option for transition state (TS) guess generation, particularly for H_Abstraction reactions, improving the exploration of potential TS geometries. Includes CREST in the default list of in-core adapters and allows it to be used in heuristics. The heuristic adapter now passes the local path of the files to the h_abstraction method so the geometries are saved to the appropriate folders. Saves the method used to generate the TS guesses in the file name so the geometries generated with CREST are clearly labeled as such.

Adds a function to reorder XYZ strings between atom-first and coordinate-first formats, with optional unit conversion. Also introduces a backward compatible wrapper with a deprecation warning. Uses constants for unit conversion factors Replaces hardcoded unit conversion factors with values from the `constants` module. Also, determines atom ordering more robustly by inspecting only the first line. Corrects angstrom to bohr conversion factor The conversion factor between angstrom and bohr was inverted, which led to incorrect geometry conversions. This commit corrects the conversion factor to ensure proper unit conversion when dealing with xyz coordinates.

Removes input files and submit script related to a specific server time limit test case. The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.

Enables the usage of CREST as a TS adapter. The implementation includes: - Adding 'crest' to the list of available TS adapters. - Implementing a function to automatically detect the CREST executable in various locations (standalone builds, conda environments, PATH). - Providing instructions for activating the CREST environment, if necessary. This allows ARC to leverage CREST for enhanced transition state search capabilities. Improves crest executable search logic Refactors the crest executable search to more reliably locate the correct version by normalizing path joining. Updates the documentation for clarity.

Adds a `method_sources` attribute to the `TSGuess` class to track all methods/sources that generated a particular transition state guess, allowing for more comprehensive provenance information. This improves the clustering of similar TS guesses. Normalizes method sources to a unique, ordered, lowercase list.

Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.

Adds the angstrom to bohr conversion factor. This conversion factor is required for the CREST adapter.

Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T09:02:28Z

arc/job/adapters/ts/heuristics.py

+                        if method_label.lower() not in other_tsg.method.lower():
+                            other_tsg.method += f' and {method_label}'


When a duplicate TS guess is found, the code appends to the method string (e.g., 'Heuristics and CREST'). However, it doesn't update the method_sources list, which is now the canonical way to track multiple contributing methods. Consider updating method_sources here using TSGuess._normalize_method_sources() to maintain consistency with the new approach used in cluster_tsgs() and the CREST adapter.

Suggested change

if method_label.lower() not in other_tsg.method.lower():

other_tsg.method += f' and {method_label}'

if method_label.lower() not in other_tsg.method.lower():

base_method = other_tsg.method

other_tsg.method += f' and {method_label}'

existing_sources = getattr(other_tsg, "method_sources", None)

if existing_sources:

combined_sources = list(existing_sources) + [method_label]

else:

combined_sources = [base_method, method_label]

other_tsg.method_sources = TSGuess._normalize_method_sources(combined_sources)

Copilot · 2026-02-04T09:02:28Z

arc/job/adapters/ts/heuristics.py

                  r2_stretch: float = 1.2,
                  a2: float = 180,
                  dihedral_increment: Optional[int] = None,
+                  path: str = ""


The path parameter is defined but never used within the function body. This parameter should either be removed from the function signature or its intended usage should be implemented.

Copilot · 2026-02-04T09:02:28Z

arc/settings/settings.py

+    # Priority 1: /Local/ce_dana standalone builds
+    crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")


The CREST executable search prioritizes a hard-coded path '/Local/ce_dana' which appears to be environment-specific. Consider making this configurable via an environment variable or moving it to lower priority to avoid issues in environments where this path doesn't exist or isn't relevant.

Suggested change

# Priority 1: /Local/ce_dana standalone builds

crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")

# Priority 1: standalone builds in a configurable directory (default: /Local/ce_dana)

standalone_dir = os.getenv("ARC_CREST_STANDALONE_DIR", "/Local/ce_dana")

crest_path = find_highest_version_in_directory(standalone_dir, "crest")

Copilot · 2026-02-04T09:02:28Z

arc/job/adapters/ts/crest.py

+        int: The first integer in the string
+
+    """
+    return int(re.sub(r"[^\d]", "", s))


The extract_digits function will raise a ValueError if the input string contains no digits, as int("") will fail. Consider adding error handling or validation to ensure the string contains at least one digit before attempting conversion.

Suggested change

return int(re.sub(r"[^\d]", "", s))

digit_str = re.sub(r"[^\d]", "", s)

if not digit_str:

raise ValueError(f"No digits found in string: {s!r}")

return int(digit_str)

Copilot · 2026-02-04T09:02:29Z

arc/job/adapters/ts/crest.py

+            occurrence_distances = []
+            for occurrence in condition_occurrences:
+                # Calculate the sum of distances to the two heavy atoms
+                hydrogen_key = f"{occurrence['H']}"
+                heavy_atoms = [f"{occurrence['A']}", f"{occurrence['B']}"]
+                try:
+                    distances = dataframe.loc[hydrogen_key, heavy_atoms].sum()
+                    occurrence_distances.append((occurrence, distances))
+                except KeyError as e:
+                    logger.error(f"Error accessing distances for occurrence {occurrence}: {e}")
+
+            # Select the occurrence with the smallest distance
+            best_occurrence = min(occurrence_distances, key=lambda x: x[1])[0]
+            return {
+                "H": extract_digits(best_occurrence["H"]),
+                "A": extract_digits(best_occurrence["A"]),
+                "B": extract_digits(best_occurrence["B"]),
+            }


If all occurrences fail to compute distances (all KeyError exceptions at line 577), the occurrence_distances list will be empty, causing min() at line 581 to raise a ValueError. Consider adding a check to handle the case where occurrence_distances is empty, or raising a more descriptive error.

Copilot · 2026-02-04T09:02:29Z

devtools/install_autotst.sh

@@ -51,15 +51,36 @@ write_hook () {
    cat >"$act" <<EOF
 # AutoTST hook – $(date +%F)
 export AUTOTST_ROOT="$repo_path"


The activation hook writes export AUTOTST_ROOT="$repo_path" into a shell script using the raw repo_path (derived from pwd), which will then be sourced on environment activation. If repo_path contains characters like $(…) or backticks (possible in directory names), those will be interpreted as command substitutions when the hook is executed, leading to command injection and arbitrary code execution in the user’s shell. To avoid this, ensure that the path is safely escaped or encoded before embedding it in a shell script (for example, avoid writing untrusted paths directly inside double quotes in sourced files).

Copilot · 2026-02-04T09:02:29Z

devtools/install_autotst.sh

+        AUTOTST_ESCAPED_PATH="$(pwd | sed 's#/#\\\\/#g')"
+        if sed --version >/dev/null 2>&1; then
+            sed -i "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc
+        else
+            sed -i '' "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc
+        fi


In the path-mode setup, the script replaces __AUTOTST_PATH__ in ~/.bashrc with the raw output of pwd, which is then used in a line export AUTOTST_ROOT="__AUTOTST_PATH__" added to the shell init file. If the working directory name includes characters such as $(…) or backticks, they will become part of ~/.bashrc and be executed as command substitutions when a new shell is started, enabling arbitrary code execution. You should sanitize or safely escape the directory path before writing it into ~/.bashrc, rather than interpolating pwd directly into a sourced shell script.

github-actions bot added Module: Converter Module: Species labels Nov 27, 2025

github-advanced-security bot found potential problems Nov 27, 2025

View reviewed changes

calvinp0 requested a review from Copilot November 28, 2025 12:56

Copilot started reviewing on behalf of calvinp0 November 28, 2025 12:56 View session

github-advanced-security bot found potential problems Nov 28, 2025

View reviewed changes

arc/job/adapters/ts/crest.py Fixed Show fixed Hide fixed

Copilot finished reviewing on behalf of calvinp0 November 28, 2025 13:00

This comment was marked as resolved.

Sign in to view

github-advanced-security bot found potential problems Nov 28, 2025

View reviewed changes

arc/job/adapters/ts/crest.py Fixed Show fixed Hide fixed

calvinp0 requested a review from Copilot January 15, 2026 16:08

Copilot started reviewing on behalf of calvinp0 January 15, 2026 16:08 View session

This comment was marked as resolved.

Sign in to view

calvinp0 force-pushed the crest_adapter branch from d6a83b4 to c231a43 Compare January 27, 2026 12:39

github-advanced-security bot found potential problems Jan 27, 2026

View reviewed changes

calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 3e88c36 to 7674f5f Compare February 2, 2026 09:49

calvinp0 requested a review from Copilot February 2, 2026 12:10

Copilot started reviewing on behalf of calvinp0 February 2, 2026 12:11 View session

This comment was marked as resolved.

Sign in to view

Copilot AI reviewed Feb 2, 2026

View reviewed changes

calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 713209f to 9c25f3d Compare February 2, 2026 13:21

calvinp0 added 3 commits February 2, 2026 15:21

calvinp0 force-pushed the crest_adapter branch from 9c25f3d to 51368be Compare February 2, 2026 14:31

calvinp0 added 4 commits February 2, 2026 16:31

Removes files for server time limit test

011050d

Removes input files and submit script related to a specific server time limit test case. The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.

calvinp0 added 3 commits February 2, 2026 16:31

Fixes AutoTST installation on BSD sed

fbd4938

Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.

Adds angstrom to bohr conversion factor

51368be

Adds the angstrom to bohr conversion factor. This conversion factor is required for the CREST adapter.

Avoids project name collisions in parallel tests

84030d0

Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.

github-actions bot added the Module Functional label Feb 3, 2026

calvinp0 requested a review from Copilot February 4, 2026 08:49

Copilot started reviewing on behalf of calvinp0 February 4, 2026 08:49 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

@@ -551,3 +551,8 @@
             CREST_PATH, CREST_ENV_PATH = find_crest_executable()
+            __all__ = [
+                "CREST_PATH",
+                "CREST_ENV_PATH",
+            ]

		return None, None


		CREST_PATH, CREST_ENV_PATH = find_crest_executable()

		if method_label.lower() not in other_tsg.method.lower():
		other_tsg.method += f' and {method_label}'

-                        if method_label.lower() not in other_tsg.method.lower():
-                            other_tsg.method += f' and {method_label}'
+                        if method_label.lower() not in other_tsg.method.lower():
+                            base_method = other_tsg.method
+                            other_tsg.method += f' and {method_label}'
+                            existing_sources = getattr(other_tsg, "method_sources", None)
+                            if existing_sources:
+                                combined_sources = list(existing_sources) + [method_label]
+                            else:
+                                combined_sources = [base_method, method_label]
+                            other_tsg.method_sources = TSGuess._normalize_method_sources(combined_sources)

		# Priority 1: /Local/ce_dana standalone builds
		crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")

-    # Priority 1: /Local/ce_dana standalone builds
-    crest_path = find_highest_version_in_directory("/Local/ce_dana", "crest")
+    # Priority 1: standalone builds in a configurable directory (default: /Local/ce_dana)
+    standalone_dir = os.getenv("ARC_CREST_STANDALONE_DIR", "/Local/ce_dana")
+    crest_path = find_highest_version_in_directory(standalone_dir, "crest")

-    return int(re.sub(r"[^\d]", "", s))
+    digit_str = re.sub(r"[^\d]", "", s)
+    if not digit_str:
+        raise ValueError(f"No digits found in string: {s!r}")
+    return int(digit_str)

CREST Adapter #807

Are you sure you want to change the base?

CREST Adapter #807

Conversation

calvinp0 commented Nov 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Check notice

Copilot Autofix

Check notice

Copilot Autofix

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants