Skip to content

Adapt to latest EQM to remove complex couplings#300

Merged
cortner merged 3 commits intomainfrom
FixComplexCoupling
Oct 1, 2025
Merged

Adapt to latest EQM to remove complex couplings#300
cortner merged 3 commits intomainfrom
FixComplexCoupling

Conversation

@zhanglw0521
Copy link
Collaborator

This is to fix #299, by using the latest version of EQM and RepLie, where the complex couplings are avoided.

One thing worth noting is that the fix seems to require Julia1.11.

@zhanglw0521
Copy link
Collaborator Author

zhanglw0521 commented Sep 26, 2025

Even if the tests pass, it remains to be checked whether we still have inexact error constructing ace1model with high correlation order and polynomial degrees, i.e.

julia> using ACEpotentials
julia> md = ace1_model(elements = [:Si],
                               order = 4,
                               totaldegree = 16,
                               rcut = 6.0)

I will paste the output below.

@cortner
Copy link
Member

cortner commented Sep 26, 2025

I'm curious why the tests are failing.

@cortner
Copy link
Member

cortner commented Sep 26, 2025

EquivariantModels: git object 0e3f3b6c2b858aed4d6442b54f0bae674e6bb432 could not be found

I forgot to push EM, will now rerun tests.

@zhanglw0521
Copy link
Collaborator Author

The above command returns a model that has A2Bmap

julia> md.model.tensor.A2Bmap
1445×8638 SparseArrays.SparseMatrixCSC{Float64, Int64} with 9734 stored entries:
⎡⠙⠲⠤⢤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠀⠀⠈⠉⠙⠒⠶⢤⣄⡦⣴⠤⣄⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠙⠒⠒⠒⠒⠲⠦⢤⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠓⠚⠒⠓⠲⠤⠤⠤⣄⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠛⠋⠓⠒⠲⠤⢤⣤⣤⣀⣀⠠⣤⣄⣀⠠⣤⣀⡀⠀⠀⠀⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠀⠀⠉⠉⠀⠈⠉⠉⠛⠒⠒⎦

@zhanglw0521
Copy link
Collaborator Author

zhanglw0521 commented Sep 26, 2025

One of the tests fails because the rmse slightly goes beyond the preset tolerance...

@cortner
Copy link
Member

cortner commented Sep 26, 2025

I wonder what changed.

@cortner
Copy link
Member

cortner commented Sep 26, 2025

But on 1.10 there are several more accuracy problems. very strange.

@cortner
Copy link
Member

cortner commented Sep 26, 2025

Is there a chance that the coupling coefficients are no longer orthogonal?

@zhanglw0521
Copy link
Collaborator Author

It is indeed very strange. I think the orthogonality still hold, but the normality may not. I will go back to RepLie and try both 1.10 and 1.11 there to see if they consist.

@zhanglw0521
Copy link
Collaborator Author

I am so very confused...

I tested four cases: ACEpot in the main branch, with Julia 1.10 and 1.11, ACEpot in this PR, with Julia 1.10 and 1.11.

For the same version of ACEpot, 1.10 and 1.11 return exactly the same couplings, meaning that calling old EM or new EM for coupling coeffs is not affected by Julia version.

I also tested the coupling coefficients generated by the two versions of ACEpot (the A2Bmap's) - they are full rank and they span the same space. Both of them are orthogonal but not orthonormal, with comparable condition numbers.

The weirdest part lies in the evaluation of the energy, ie, the function energy_forces_virial performs differently in 1.10 and 1.11 even when keeping the same at (an eight-atom system with 5 Si and 3 O, fixed bulk), ps, st. This can be checked by running the following script:

using ACEbase 
using ACEpotentials
M = ACEpotentials.Models

using AtomsBase, AtomsBuilder
# using EmpiricalPotentials
AB = AtomsBuilder

import AtomsCalculators

using Random, LuxCore, StaticArrays, LinearAlgebra, JLD2, Unitful
rng = Random.MersenneTwister(1234)

elements = (:Si, :O)
level = M.TotalDegree()
max_level = 15
order = 3
E0s = Dict( :Si => -158.54496821u"eV", 
            :O => -2042.0330099956639u"eV")
NZ = length(elements)

function make_rin0cut(zi, zj) 
   r0 = ACEpotentials.DefaultHypers.bond_len(zi, zj)
   return (rin = 0.0, r0 = r0, rcut = 6.5)
end

rin0cuts = SMatrix{NZ, NZ}([make_rin0cut(zi, zj) for zi in elements, zj in elements])


model = M.ace_model(; elements = elements, order = order, Ytype = :solid, 
                      level = level, max_level = max_level, maxl = 8, 
                      pair_maxn = 15, 
                      init_WB = :glorot_normal, init_Wpair = :glorot_normal,
                      E0s = E0s,
                      rin0cuts = rin0cuts
                      )

ps, st = LuxCore.setup(rng, model)

# read saved ps
WB = load("psWB.jld2")["WB"]
ps.WB[:] .= WB[:]
Wpair = load("psWpair.jld2")["Wpair"]
ps.Wpair[:] .= Wpair[:]

calc = M.ACEPotential(model, ps, st)   

# compare the A2Bmap - test RepLie for different Julia versions
A2B110 = load("A2Bmap_110.jld2")["A2B"]
A2B111 = load("A2Bmap_111.jld2")["A2B"]
calc.model.tensor.A2Bmap == A2B110 == A2B111

# splinification
lin_calc = M.splinify(calc, ps)
ps_lin, st_lin = LuxCore.setup(rng, lin_calc)
ps_lin.WB[:] .= ps.WB[:] 
ps_lin.Wpair[:] .= ps.Wpair[:]

at = AB.rattle!(AB.bulk(:Si, cubic=true), 0.0) # a fixed bulk

E = M.energy_forces_virial(at, calc, ps, st).energy
E_lin = M.energy_forces_virial(at, lin_calc, ps_lin, st_lin).energy
difference = abs(E - E_lin)

with the attached parameters/coefficients I saved.

Even for the main branch, the above difference between Julia versions remains. This difference caused the extra failing cases in this PR with Julia 1.10, and I think it is something beyond the changes I made.

Or have I missed anything else that can make the calc's different?

saved_ps.zip

@cortner
Copy link
Member

cortner commented Sep 30, 2025

I'm re-running all tests locally on different branches to see things for myself.

@zhanglw0521
Copy link
Collaborator Author

Thanks. And just to share information: locally I've gotten the same results as the CI results here.

@cortner
Copy link
Member

cortner commented Sep 30, 2025

yes, I would expect that but I want to run the tests on different branches to see where things break down.

@cortner
Copy link
Member

cortner commented Sep 30, 2025

These are the updates when I go from main (where all tests pass) to the current branch.

  [73ee3e68] ↑ EquivariantModels v0.0.5 ⇒ v0.0.6
  [f07d36f2] ↑ RepLieGroups v0.0.3 ⇒ v0.1.1
    Updating `~/gits/ACEpotentials.jl/Manifest.toml`
  [023d0394] - DecoratedParticles v0.0.7
  [73ee3e68] ↑ EquivariantModels v0.0.5 ⇒ v0.0.6
  [0e77f7df] + LazilyInitializedFields v1.3.0
  [89398ba2] + LocalRegistry v0.5.7
  [793d2195] + PartialWaveFunctions v0.2.0
  [2792f1a3] + RegistryInstances v0.1.0
  [d1eb7eb1] + RegistryTools v2.3.0
  [f07d36f2] ↑ RepLieGroups v0.0.3 ⇒ v0.1.1

Same failure as in CI:

Test silicon: Test Failed at /Users/ortner/gits/ACEpotentials.jl/test/test_silicon.jl:41
  Expression: (rmse[config])[obs] <= (expected[config])[obs]
   Evaluated: 0.0003602567072731953 <= 0.00035

I'm not so worried about Julia versions right now but about different results within the same Julia version. Here, the only thing that can be different are the coupling coefficients.

@zhanglw0521
Copy link
Collaborator Author

If we checked that the coupling coefficients from the two branches (main and this PR) have the same size, generate the same space, and keep the energy invariant, then can we say the different accuracy is just because of some tricky differences in fitting itself?

@cortner
Copy link
Member

cortner commented Sep 30, 2025

I can't quite decide whether to just accept this, tag a new version but then file an issue with a detailed summary and let the tests fail to make sure we come back to it?

@zhanglw0521
Copy link
Collaborator Author

zhanglw0521 commented Sep 30, 2025

I really can't say for certain, but the very last test I ran, which was specified for this failed test, seems to give a reason to accept it.

I stored the coupling coeffs generated by the two branches (attached below), from which we can see that each of their corresponding column differs only by a scaling. I then compared the fitting errors. The results actually indicate that the overall errors they have are comparable, while new couplings, somehow, perform better for force and virial, but slightly worse for energy, and hence the failed example.

Error in main

err_blr["rmse"]
Dict{String, Any} with 5 entries:
  "isolated_atom" => Dict("V"=>0.0, "E"=>0.0, "F"=>0.0)
  "dia"           => Dict("V"=>0.030166, "E"=>0.00158392, "F"=>0.0294047)
  "liq"           => Dict("V"=>0.0344579, "E"=>0.000335032, "F"=>0.178791)
  "set"           => Dict("V"=>0.0652927, "E"=>0.00262894, "F"=>0.138453)
  "bt"            => Dict("V"=>0.0886698, "E"=>0.00348342, "F"=>0.0713924)

Error in this PR

err_blr["rmse"]
Dict{String, Any} with 5 entries:
  "isolated_atom" => Dict("V"=>0.0, "E"=>0.0, "F"=>0.0)
  "dia"           => Dict("V"=>0.0299322, "E"=>0.00158943, "F"=>0.0288154)
  "liq"           => Dict("V"=>0.0339324, "E"=>0.000360257, "F"=>0.178246)
  "set"           => Dict("V"=>0.0649321, "E"=>0.00259179, "F"=>0.138032)
  "bt"            => Dict("V"=>0.0882134, "E"=>0.00342114, "F"=>0.071385)

I think this is something acceptable, though I have no idea why it is the case (perhaps it is that the scaling happened to behave like another weighting or a preconditioner or so...).

A2Bmaps.zip

@cortner
Copy link
Member

cortner commented Sep 30, 2025

Thank you for the thorough test. Then let's merge and tag but keeping the failing test for future discussion.

@cortner
Copy link
Member

cortner commented Sep 30, 2025

Actually I might change the tolerance a bit but still open an issue

@zhanglw0521
Copy link
Collaborator Author

Sounds good to me. Let's see if the new version resolves the complex couplings issue #299 and can hence enable larger basis.

@cortner cortner merged commit d5c7816 into main Oct 1, 2025
2 of 3 checks passed
@cortner cortner deleted the FixComplexCoupling branch October 1, 2025 19:10
@cortner
Copy link
Member

cortner commented Oct 1, 2025

this is now registered as 0.9.0. It feels like a bugfix but it is technically breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using higher correlation orders causes error

2 participants