Skip to content

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Dec 1, 2025

#299 will be closed, was easier to just create a new branch and cherry-pick.

@kshyatt kshyatt requested a review from lkdvos December 1, 2025 10:29
@kshyatt kshyatt force-pushed the ksh/gpu3 branch 2 times, most recently from 2c9b2d3 to 45ba5dc Compare December 1, 2025 10:40
Comment on lines +29 to +37
function ROCDiagonalTensorMap{T}(::UndefInitializer, V::TensorMapSpace) where {T}
(numin(V) == numout(V) == 1 && domain(V) == codomain(V)) ||
throw(ArgumentError("DiagonalTensorMap requires a space with equal domain and codomain and 2 indices"))
return ROCDiagonalTensorMap{T}(undef, domain(V))
end
function ROCDiagonalTensorMap{T}(::UndefInitializer, V::ProductSpace) where {T}
length(V) == 1 ||
throw(ArgumentError("DiagonalTensorMap requires `numin(d) == numout(d) == 1`"))
return ROCDiagonalTensorMap{T}(undef, only(V))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have these generalizations of the constructor for the normal (i.e. non-gpu) DiagonalTensorMap right? Is this useful?

maxS = maximum(first, values(S))
minS = minimum(last, values(S))
maxS = mapreduce(maximum, max, values(S))
minS = mapreduce(minimum, min, values(S))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do first and last not work for the gpu arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no, as they require scalar indexing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have allowscalar here instead? I don't know anything about these performances, but it seems hard to imagine that computing the max/min is faster than just taking the element

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, we could test it. Generically usage of @allowscalar is extremely discouraged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possibly better solution would be to rewrite this entire for-loop to use broadcasting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't mind touching the internals of the DiagonalTensorMap we could also just take the minimum and maximum of the entire vector in one call as well

Comment on lines +190 to +174
function TensorKit.scalar(t::ROCTensorMap)

# TODO: should scalar only work if N₁ == N₂ == 0?
return @allowscalar dim(codomain(t)) == dim(domain(t)) == 1 ?
first(blocks(t))[2][1, 1] : throw(DimensionMismatch())
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is being somewhat rewritten in #291 so it might need to be revised if that PR is merged.

@Jutho
Copy link
Member

Jutho commented Dec 2, 2025

Ok I finished going through the AMD extension; I guess several of the comments will also apply to the CUDA extension so I'll wait a bit before reviewing that as well.

@kshyatt
Copy link
Member Author

kshyatt commented Dec 4, 2025

The AMD code is mostly preliminary at this point as TensorOperations.jl doesn't support AMD. (yet!), but I agree that segment should be improved. I'll respond to comments soon :)

@kshyatt
Copy link
Member Author

kshyatt commented Dec 5, 2025

Thanks for the reviews so far! I've incorporated the feedback and am running the CUDA tests right now.

@kshyatt kshyatt force-pushed the ksh/gpu3 branch 7 times, most recently from 95f91a7 to 8bbc3ff Compare December 5, 2025 13:10
Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the constructors for now, mostly with an eye of trying to minimize the amount of code duplication. Definitely feel free to discard some of these comments if that doesn't make too much sense, but I think there are some viable candidate ideas in there that would improve the overall robustness of TensorMap constructors/converters to begin with, and would simultaneously reduce the amount of boilerplate in the CUDA and AMD extensions

Comment on lines +6 to +12
function TensorKit.tensormaptype(S::Type{<:IndexSpace}, N₁, N₂, TorA::Type{<:StridedROCArray})
if TorA <: ROCArray
return TensorMap{eltype(TorA), S, N₁, N₂, ROCVector{eltype(TorA), AMDGPU.Mem.HIPBuffer}}
else
throw(ArgumentError("argument $TorA should specify a scalar type (`<:Number`) or a storage type `<:ROCVector{<:Number}`"))
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe explain where this definition comes from? I would have guessed that if we ask for the tensormaptype with a given A, this function really should return the TensorMap{..., A}, which would be the default implementation but is overwritten here. I'm guessing I'm missing something, but it might be good to add a comment in that case to explain why this is necessary.

@@ -0,0 +1,257 @@
const ROCTensorMap{T, S, N₁, N₂} = TensorMap{T, S, N₁, N₂, ROCVector{T, AMDGPU.Mem.HIPBuffer}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the AMD ecosystem, but would there ever be a point to be less restrictive about this, and define this as follows, possibly with a const to determine what the allowed array types are?

Suggested change
const ROCTensorMap{T, S, N₁, N₂} = TensorMap{T, S, N₁, N₂, ROCVector{T, AMDGPU.Mem.HIPBuffer}}
const ROCTensorMap{T, S, N₁, N₂, A <: DenseRocVector{T}} = TensorMap{T, S, N₁, N₂, A}

Comment on lines +14 to +16
function ROCTensorMap{T}(::UndefInitializer, V::TensorMapSpace{S, N₁, N₂}) where {T, S, N₁, N₂}
return ROCTensorMap{T, S, N₁, N₂}(undef, V)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here:
Would it also work if we rewrite the undef constructors logic in tensors.jl in a pattern like the following, using an additional intermediate step that selects the storage type, rather than doing that all in one go?

const _TensorMap{T, A <: DenseVector{T}, S, N1, N2} = TensorMap{T, S, N1, N2, A}
const DefaultTensorData{T} = Vector{T}

TensorMap{T}(::UndefInitializer, V::TensorMapSpace) where {T} =
    _TensorMap{T, DefaultTensorData{T}}(undef, V)
_TensorMap{T, A}(::UndefInitializer, V::TensorMapSpace) where {T, A} =
    TensorMap{T, spacetype(V), numout(V), numin(V), A}(undef, V)

and similar constructions for the other ones? This might avoid having to duplicate these constructors for AMD and CUDA, and could also pave the way for experiments with using TensorMap{T, S, N1, N2, Memory{T}}, which I'm guessing has very similar kinds of demands (and could be used to test the constructors etc without having to rely on buildkite?)

Alternatively, the domain and codomain can be specified by passing a [`HomSpace`](@ref)
using the syntax `codomain ← domain` or `domain → codomain`.
"""
function ROCTensorMap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double comment here:

  1. I think we probably also want TensorMap(::SectorDict{I,<:ROCMatrix}, ...) to return a RocTensorMap, which I don't think is currently happening.
  2. We might also want RocTensorMap(::SectorDict{I,<:AbstractMatrix}) to automatically promote to ROCVector?

Given this, and my previous comment, would it make sense to alter this constructor to first attempt to figure out a type A, and then call TensorMap{T,...,A}(data) which then enforces the data type?

using the syntax `codomain ← domain` or `domain → codomain`.
"""
function ROCTensorMap(
data::AbstractDict{<:Sector, <:ROCArray},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data::AbstractDict{<:Sector, <:ROCArray},
data::AbstractDict{<:Sector, <:ROCMatrix},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think we want to alter the TensorMap(data::AbstractDict{<:Sector, <:AbstractMatrix}, ...) for that, in such a way that we indeed first determine a suitable A<:DenseMatrix associated with the types in the values(data) list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the caveat that A <: DenseVector 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potato potato

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, although it can be slightly tricky to find a generic implementation of a function that takes in M <: AbstractMatrix and spits out A <: AbstractVector that corresponds to this, which is more or less why I brought it up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I have been struggling with that before. This was just my funny Friday evening comment.

return copy!(tnew, t)
end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not covered by the generic

function Base.convert(
TT::Type{TensorMap{T, S, N₁, N₂, A}}, t::AbstractTensorMap{<:Any, S, N₁, N₂}
) where {T, S, N₁, N₂, A}
if typeof(t) === TT
return t
else
tnew = TT(undef, space(t))
return copy!(tnew, t)
end
end

Comment on lines +196 to +201
function Base.copy!(tdst::ROCTensorMap{T, S, N₁, N₂}, tsrc::ROCTensorMap{T, S, N₁, N₂}) where {T, S, N₁, N₂}
space(tdst) == space(tsrc) || throw(SpaceMismatch("$(space(tdst))$(space(tsrc))"))
for ((c, bdst), (_, bsrc)) in zip(blocks(tdst), blocks(tsrc))
copy!(bdst, bsrc)
end
return tdst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is here to avoid going through the StridedView path which doesn't work on GPU? It might be reasonable to just remove the StridedView method, since this historically was there before everything was in a single large vector in TensorMap, and I would say this is a bit too opinionated for a generic AbstractTensorMap implementation.

Additionally, does this not coincide with the Base.copy!(::TensorMap, ::TensorMap) definition?

return tdst
end

function Base.copy!(tdst::ROCTensorMap, tsrc::TensorKit.AdjointTensorMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about possibly just having this implementation as the copy!(::AbstractTensorMap, ::AbstractTensorMap) implementation

# Conversion to ROCArray:
#----------------------
# probably not optimized for speed, only for checking purposes
function Base.convert(::Type{ROCArray}, t::AbstractTensorMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rewrite the base version to take Type{A} where {A <: Array}, rather than hardcoding Array there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can try!


function TensorKit.Factorizations.MAK.initialize_output(::typeof(svd_vals!), t::ROCTensorMap, alg::AbstractAlgorithm)
V_cod = infimum(fuse(codomain(t)), fuse(domain(t)))
return ROCDiagonalTensorMap{real(scalartype(t))}(undef, V_cod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider introducing a similar_diagonal(t, T) function/hook in the base implementations, such that we could overload just that function and avoid having to copy the boilerplate here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[edit] I hadn't seen you already more or less did this for the base implementation, in which case these functions might not be necessary anymore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants