Parallel task manager accelerated as a plugin #1318

Iximiel · 2025-09-10T11:44:04Z

Description

Note that this PR is built on top of #1092

@GiovanniBussi, @gtribello, as we discussed last week I started this new approach to the PTM: the accelerated version is now served separately as a plugin.
This has a few advantages:

I can test the compilation on the CI without risking NVHPC blocking itself on some omp pragmas
You can use Plumed on the CPU compiled with gnu/clang along with the accelerated one
I have somehow implemented a mixed precision version of plumed (we have to discuss a few things about it)
starting from the overridden PTM in the plugin we can try different new acceleration strategies

I tried to avoid propagating the template declarations the more I could, but I did not always succeed

Looking at the diff some extra changes have slipped in, like the ones in check and some iostream inclusions.

This is only a WIP, only the standard multicolvars are included in the plugin, but I am working on expanding them

Target release

I would like my code to appear in release v2.11

Type of contribution

changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
changes to a module not authored by you
new module contribution or edit of a module authored by you

Copyright

I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.

the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.

Tests

I added a new regtest or modified an existing regtest to validate my changes.
I verified that all regtests are passed successfully on GitHub Actions.

src/colvar/Plane.h

src/tools/Tensor.h

+  constexpr TensorTyped& operator +=(const TensorTyped& b);
 /// decrement
-  TensorGeneric& operator -=(const TensorGeneric<n,m>& b);
+  constexpr TensorTyped& operator -=(const TensorTyped& b);


src/tools/Tensor.h

+  constexpr TensorTyped<T,m,n> transpose()const;
 /// << operator.
 /// Allows printing tensor `t` with `std::cout<<t;`
-  template<unsigned n_,unsigned m_>
-  friend std::ostream & operator<<(std::ostream &os, const TensorGeneric<n_,m_>&);
-/// Diagonalize tensor.
-/// Syntax is the same as Matrix::diagMat.
-/// In addition, it is possible to call if with m_ smaller than n_. In this case,
-/// only the first (smaller) m_ eigenvalues and eigenvectors are retrieved.
-/// If case lapack fails (info!=0) it throws an exception.
-/// Notice that tensor is assumed to be symmetric!!!
-  template<unsigned n_,unsigned m_>
-  friend void diagMatSym(const TensorGeneric<n_,n_>&,VectorGeneric<m_>&evals,TensorGeneric<m_,n_>&evec);
-/// Compute lowest eigenvalue and eigenvector, using a branchless iterative implementation.
-/// This function is amenable for running on openACC.
-/// The accuracy could be controlled increasing the number of iterations. The default value (24)
-/// seems sufficient for most applications.
-/// In principle, it could be extended to compute m eigenvalues by:
-/// - computing the lowest
-/// - compute the proper projector and reduce the matrix to <n-1,n-1>
-/// - proceed iteratively
-/// The interface should then be likely the same as diagMatSym
-  template<unsigned n_>
-  double lowestEigenpairSym(const TensorGeneric<n_, n_>& K, VectorGeneric<n_>& eigenvector, unsigned niter);
+  friend std::ostream & operator<< <>(std::ostream &os, const TensorTyped&);
 };

-template <unsigned n,unsigned m>
-void TensorGeneric<n,m>::auxiliaryConstructor()
+template<typename T, unsigned n, unsigned m>
+constexpr void TensorTyped<T,n,m>::auxiliaryConstructor()
 {}

-template <unsigned n,unsigned m>
+template<typename T, unsigned n, unsigned m>
 template<typename... Args>
-void TensorGeneric<n,m>::auxiliaryConstructor(double first,Args... arg) {
+constexpr void TensorTyped<T,n,m>::auxiliaryConstructor(T first,Args... arg) {
  d[n*m-(sizeof...(Args))-1]=first;
  auxiliaryConstructor(arg...);
 }

-template <unsigned n,unsigned m>
+template<typename T, unsigned n, unsigned m>
 template<typename... Args>
-TensorGeneric<n,m>::TensorGeneric(double first,Args... arg) {
+constexpr TensorTyped<T,n,m>::TensorTyped(T first,Args... arg) {


src/core/ActionShortcut.cpp

src/secondarystructure/SecondaryStructureBase.h

  keys.addFlag("VERBOSE",false,"write a more detailed output");
  keys.reset_style("VERBOSE","hidden");
-  if( keys.getDisplayName()!="SECONDARY_STRUCTURE_DRMSD" && keys.getDisplayName()!="SECONDARY_STRUCTURE_RMSD" ) {
+  //if( keys.getDisplayName()!="SECONDARY_STRUCTURE_DRMSD" && keys.getDisplayName()!="SECONDARY_STRUCTURE_RMSD" ) {


src/secondarystructure/SecondaryStructureShortcut.h

src/secondarystructure/SecondaryStructureDRMSD.h

+    const auto targetSize=target.size();
+
+    for (unsigned i=0; i<targetSize; ++i) {
+      const auto& [reference,k,j] = target[i];


src/secondarystructure/SecondaryStructureDRMSD.h

src/secondarystructure/SecondaryStructureRMSD.cpp

src/secondarystructure/SecondaryStructureBase.h

+    //Vector origin_old, origin_new;
+    //origin_old=pos[21];
+    //origin_new=pos[6]+distance;


src/secondarystructure/SecondaryStructureBase.h

+    //origin_new=pos[6]+distance;
    for(unsigned i=15; i<30; ++i) {
-      pos[i]+=( origin_new - origin_old );
+      //pos[i]+=( origin_new - origin_old );


src/function/FunctionShortcut.h

  keys.addActionNameSuffix("_GRID");
-  T tfunc;
-  tfunc.registerKeywords( keys );
+  //keys.addActionNameSuffix("_GRIDACC");


src/matrixtools/MatrixTimesMatrix.h

+  //if( keys.getDisplayName()=="MATRIX_PRODUCT" ) {
+  //  keys.addFlag("ELEMENTS_ON_DIAGONAL_ARE_ZERO",false,"set all diagonal elements to zero");
+  //}


src/matrixtools/MatrixTimesVectorBase.h

    nvectors = getNumberOfArguments()-1;
  }
-  if( getName()=="MATRIX_VECTOR_PRODUCT_ROWSUMS" ) {
+  if constexpr ( helpers::isRowSum<CV>) { //getName()=="MATRIX_VECTOR_PRODUCT_ROWSUMS" ) {


src/secondarystructure/SecondaryStructureRMSD.cpp

+//_REGISTER_ACTION(colv,"SECONDARY_STRUCTURE_RMSD_CPU");
+//typedef AccelerableShortcut<colv> shortcut;
+//_REGISTER_ACTION(shortcut,"SECONDARY_STRUCTURE_RMSD");


Iximiel · 2025-09-18T08:41:56Z

src/core/PlumedMain.h

+/// Returns the images from the dlloader
+  const std::vector <void*>& getDLHandles() const noexcept;


@GiovanniBussi I did this to use the action register's check with the extra handles within FunctionShortcut.
Is this a good idea?

In principle I do not see problems however:

I do not know how you use the result. This is supposed to be an internal detail, perhaps what you are doing can be directly encapsulated in an additional method of class PlumedMain

It is very difficult to see what this PR does because it is massive

* I do not know how you use the result. This is supposed to be an internal detail, perhaps what you are doing can be directly encapsulated in an additional method of class PlumedMain

I use it in FunctionShortcut.h . By "encapsulated it as a method of PlumedMain", do you mean to create something like plumed->check instead of calling the action register singleton while exposing the void* vector?

* It is very difficult to see what this PR does because it is massive

I know, I should have moved the tests as the very last thing, because those are eclipsing the other files in the gh UI.
Apart from that, this branch is based on #1092, for the mixed precision things, so part of the changes can be discussed there

I use it in FunctionShortcut.h . By "encapsulated it as a method of PlumedMain", do you mean to create something like plumed->check instead of calling the action register singleton while exposing the void* vector?

Yes exactly. The implementation based on void* is quite tricky, so I would rather add an ad hoc method to do (what you need to do)

It is ok if I am calling this new method checkAction, like the API command that does exactly what I want?

Sounds perfect

Iximiel · 2025-09-18T08:45:41Z

@gtribello @GiovanniBussi this is nearly done
I have to adjust a little the tests (that as now can only be run locally) but we can check the at least the compilation of the accelerated CV on the CI without disturbing the compilation of the main plumed library

src/core/ActionShortcut.cpp

@@ -147,9 +147,15 @@
        error("shortcut is using suffix but action created is not ActionWithValue");
      }
      Keywords thiskeys;
+      //TODO: check if it is possible to do this: plumed.getKeywordsForAction(av->getName(),thiskeys);


Iximiel · 2025-12-04T15:54:50Z

Now the CI can test the compilation of the PTM with openacc.
It might need some extra massages because setting up the compiler in the plugin folder is everything but straighforward

src/matrixtools/MatrixTimesMatrix.h

        error("cannot calculate dissimilarities for sparse matrices");
      }
    }
  }
  updateBookeepingArrays( taskmanager.getActionInput().outmat );
-  taskmanager.setupParallelTaskManager( 2*getPntrToArgument(0)->getNumberOfColumns(), getPntrToArgument(1)->getNumberOfStoredValues() );
+  //unsigned nvals = getPntrToComponent(0)->getNumberOfColumns();


Iximiel · 2025-12-05T07:53:43Z

@gtribello can you help me in getting right the documentation for the new shorcuts?

The Intel error is already corrected, I will push it with the documentation fix

gtribello · 2025-12-15T09:16:41Z

Hi @Iximiel

I had a look at this. Can't get the code to run on my laptop any more because of an update, but I have a suggestion on how to solve the issue with the manual. To be clear, the idea is that the DISSIMILARITY_CPU command would never be used in an input file. The input would always be:

DISSIMILARITY ...

If that is the case, then you just need to add the command:

keys.setDisplayName("DISSIMILARITY");

in the registerKeywords method of the actions that have a _CPU version. Adding this command hides the shortcut version from the manual builder. I hope this works.

Iximiel · 2025-12-15T16:06:06Z

Hi @Iximiel

I had a look at this. Can't get the code to run on my laptop any more because of an update, but I have a suggestion on how to solve the issue with the manual. To be clear, the idea is that the DISSIMILARITY_CPU command would never be used in an input file. The input would always be:
DISSIMILARITY ...
If that is the case, then you just need to add the command:
keys.setDisplayName("DISSIMILARITY");
in the registerKeywords method of the actions that have a _CPU version. Adding this command hides the shortcut version from the manual builder. I hope this works.

Thanks! I'll try that with DISSIMILARITY and MATRIX_PRODUCT
I am testing with curl disabled in the manual to speed up things

On my PC it DISSILMILARITY works, but MATRIX_PRODUCT(_CPU) not, it may be a question of getting the correct suffix?

Iximiel · 2025-12-16T09:08:43Z

@gtribello
I think it is something with the suffix and CV with underscores in the non suffixed names.
I'll try to do some massages to the code to make it work

edit: I may be wrong

gtribello · 2025-12-16T10:42:45Z

Hi @Iximiel

I was looking at this again. I think the problem might be that DISSIMILARITIES and MATRIX_PRODUCT use the same template Action class. You can see this from how they are registered:

typedef MatrixTimesMatrix<MatrixProduct> mtimes;
PLUMED_REGISTER_ACTION(mtimes,"MATRIX_PRODUCT")
typedef MatrixTimesMatrix<Dissimilarities> dissims;
PLUMED_REGISTER_ACTION(dissims,"DISSIMILARITIES")

The registerKeywords method for these two actions is the same. Perhaps a solution is to include the following code in MatrixTimesMatrix<T>::registerKeywords.

if( keys.getDisplayName().find("MATRIX_PRODUCT")!=std::string::npos ) {
     keys.setDisplayName("MATRIX_PRODUCT");
     keys.addFlag("ELEMENTS_ON_DIAGONAL_ARE_ZERO",false,"set all diagonal elements to zero");
} else {
    keys.setDisplayName("DISSIMILARITIES");
}

I hope this helps

Iximiel · 2025-12-16T11:05:49Z

Hi @gtribello

On my pc the manual problem looks solved using your first suggestion.

gtribello · 2025-12-16T11:11:30Z

Oh jolly good @Iximiel. It looks like it is solved here, too, as the doc-mpi is now building.

Iximiel · 2025-12-16T12:43:41Z

mac 13 is clinically dead as a runner, and the codeQL is not linked to this PR (but may be addressed)
I think it is done

preparing distance for the new plugin Setting up the new ptm in the new plugin deactivating a check in ActionShortcut::readInputLine setting up USEGPU for distance_vector Templatization of Angle and Torsion Mixed precision in Pbcs Mixed precision for GetInputData moved the multicolvar with openacc in a plugin Starting the migration of multicolvars to a template adding some templatization to the PTM updating the PTM accelerated now multicolvar is ready to be compiled in the openacc plugin adding the plugins to the modulemap setting up secondaryStructure for the plugin setting up secondary structure to target the plugin addressing some codecheck problems moving some of the volume kw in the headers moved Volumes to the plugin added the spherical harmonic to the plugin adding the matrixtimes family fixing a warning added Combine Between and <= and >= setting up contact matrix separating the GPU from the PTM moved the openACC tests updating some tests adding a way to get the dl handles in the actions addressed some codechek problems now checkin if an action is registered do note expose the vector of LOADed handles now the tests for existing GPU implementation pass, added fccubic, and better test compler guard to fccubic removing an unecessary shadowed variable now also the gpu-torsion test is checked against the CPU deactivated the quaternion tests small changes after the rebase deleted a support file

check if linuxWF can be run with nvhpc

github-advanced-security bot found potential problems Sep 10, 2025

View reviewed changes

Iximiel changed the title ~~Parallel tasck manager accelerated as a plugin~~ Parallel task manager accelerated as a plugin Sep 10, 2025

Iximiel force-pushed the ptmAccPlug-template branch from 3ab2517 to 5e86861 Compare September 11, 2025 13:05

github-advanced-security bot found potential problems Sep 11, 2025

View reviewed changes

Iximiel force-pushed the ptmAccPlug-template branch from 5e86861 to d9c3b6a Compare September 11, 2025 15:18

github-advanced-security bot found potential problems Sep 11, 2025

View reviewed changes

Iximiel force-pushed the ptmAccPlug-template branch from 3450686 to ab62229 Compare September 15, 2025 13:34

github-advanced-security bot found potential problems Sep 15, 2025

View reviewed changes

Iximiel force-pushed the ptmAccPlug-template branch 3 times, most recently from 621c9a5 to 56fb8fe Compare September 17, 2025 15:22

github-advanced-security bot found potential problems Sep 17, 2025

View reviewed changes

Iximiel force-pushed the ptmAccPlug-template branch from 56fb8fe to ef358f5 Compare September 18, 2025 08:22

Iximiel commented Sep 18, 2025

View reviewed changes

github-advanced-security bot found potential problems Sep 18, 2025

View reviewed changes

src/core/ActionShortcut.cpp Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Sep 18, 2025

View reviewed changes

Iximiel force-pushed the ptmAccPlug-template branch from 0bbeb75 to f921207 Compare December 4, 2025 15:52

Iximiel marked this pull request as ready for review December 4, 2025 15:53

github-advanced-security bot found potential problems Dec 4, 2025

View reviewed changes

Iximiel mentioned this pull request Dec 5, 2025

Now the CI will try to use nvhpc for real #1268

Closed

7 tasks

Iximiel added 2 commits December 16, 2025 13:47

Updating the workflow for compiling the PTM plugin in the CI

7d6e556

check if linuxWF can be run with nvhpc

Iximiel force-pushed the ptmAccPlug-template branch from 8338b3d to 7d6e556 Compare December 16, 2025 12:47

		/// Returns the images from the dlloader
		const std::vector <void*>& getDLHandles() const noexcept;

Parallel task manager accelerated as a plugin #1318

Are you sure you want to change the base?

Parallel task manager accelerated as a plugin #1318

Uh oh!

Conversation

Iximiel commented Sep 10, 2025

Description

Target release

Type of contribution

Copyright

Tests

Uh oh!

Uh oh!

Check notice

Check notice

Uh oh!

Check notice

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Iximiel Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

GiovanniBussi Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Iximiel Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

GiovanniBussi Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Iximiel Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

GiovanniBussi Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Iximiel commented Sep 18, 2025

Uh oh!

Uh oh!

Check notice

Iximiel commented Dec 4, 2025

Uh oh!

Check notice

Iximiel commented Dec 5, 2025

Uh oh!

gtribello commented Dec 15, 2025

Uh oh!

Iximiel commented Dec 15, 2025

Uh oh!

Iximiel commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gtribello commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Iximiel commented Dec 16, 2025

Uh oh!

gtribello commented Dec 16, 2025

Uh oh!

Iximiel commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Iximiel commented Dec 16, 2025 •

edited

Loading

gtribello commented Dec 16, 2025 •

edited

Loading