Skip to content

Add support for musl libc which returns aligned memory#6255

Closed
rhuitl wants to merge 1 commit intoPointCloudLibrary:masterfrom
rhuitl:musl
Closed

Add support for musl libc which returns aligned memory#6255
rhuitl wants to merge 1 commit intoPointCloudLibrary:masterfrom
rhuitl:musl

Conversation

@rhuitl
Copy link
Contributor

@rhuitl rhuitl commented Mar 18, 2025

This patch adds a parameter WITH_MUSL that, when enabled, makes PCL go the std::malloc route in pcl_macros.h. This is valid because musl libc returns aligned memory from malloc() (https://www.openwall.com/lists/musl/2019/07/07/1).

Unfortunately, musl intentionally does not want to be detected, so it looks like we're forced to detect it outside the build (e.g. check for /etc/alpine-release as a proxy for musl libc) and pass the info into the build, like this patch is doing.

See also https://catfox.life/2022/04/16/the-musl-preprocessor-debate/ and https://stackoverflow.com/questions/58177815/how-to-actually-detect-musl-libc for some background on why there is no __MUSL__ or similar preprocessor macro.

With this patch, PCL can be built on Alpine Linux, which uses musl libc.

@mvieth
Copy link
Member

mvieth commented Mar 19, 2025

Hi, first of all, thanks for the pull request. I have a few questions:

With this patch, PCL can be built on Alpine Linux, which uses musl libc.

To my knowledge, PCL 1.14.1 is already available on Alpine Linux. Am I mistaken? Or is it PCL 1.15.0 that can only be successfully built with your proposed changes? Please describe any build error in more detail, so that I can understand the reasons behind this pull request better.

Why is it necessary to test defined (HAVE_MUSL) here:
https://github.com/PointCloudLibrary/pcl/pull/6255/files#diff-a629c14bc779490e345d4493a29a5460a2c950f6eeb5063d94152c2457466b6dR351 ? Is __GNUC__ not already defined, going to #define PCL_ALIGN(alignment) __attribute__((aligned(alignment))) anyway?

Why did you set PCL_SILENCE_MALLOC_WARNING=1 when using musl? PCL_SILENCE_MALLOC_WARNING is related to Eigen's aligned allocator, and is, as far as I can tell, not related to PCL_ALIGN or pcl::aligned_malloc or pcl::aligned_free?

@rhuitl
Copy link
Contributor Author

rhuitl commented Mar 19, 2025

Hi Markus, you are absolutely right, we don't need the first hunk. __GNUC__ is defined, so it's not needed.

But we need one for the block that defines MALLOC_ALIGNED. If I don't add it, I'll get

/root/.conan2/p/b/pcl6482d4b7926ca/b/src/common/include/pcl/pcl_macros.h:402:2: error: #error aligned_malloc not supported on your platform
  402 | #error aligned_malloc not supported on your platform
      |  ^~~~~

Not only for 1.15.0, but also for 1.13.x.

Once I got MALLOC_ALIGNED defined, I get this warning, which is silenced by setting PCL_SILENCE_MALLOC_WARNING:

/root/.conan2/p/b/pcl47be7c97fbc32/b/src/common/include/pcl/memory.h:71:2: warning: #warning "Potential runtime error due to aligned malloc mismatch! PCL was likely compiled without AVX support but you enabled AVX for your code (to silence this message at your own risk, define PCL_SILENCE_MALLOC_WARNING=1)" [-Wcpp]
   71 | #warning "Potential runtime error due to aligned malloc mismatch! PCL was likely compiled without AVX support but you enabled AVX for your code (to silence this message at your own risk, define PCL_SILENCE_MALLOC_WARNING=1)"
      |  ^~~~~~~

Here are all the built-in preprocessor macros: https://pastebin.com/38Y0iKMH

Note that I'm building for Alpine, but with Conan, so I'm not using any prebuilt packages. I took a look at the package you mentioned: https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/testing/pcl/APKBUILD. They apply two patches, and there is a discussion with the same error message I was facing here: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/36503

I'll take a closer look what they did, but it doesn't look like something that can be applied to PCL, in general.

@rhuitl
Copy link
Contributor Author

rhuitl commented Mar 19, 2025

Okay, so it turns out that no code changes are needed to get the desired behavior. Passing -DHAVE_POSIX_MEMALIGN=ON to CMake is all that's needed. It's already exported in pcl_config.h.in. I'll close this PR soon.

However, I still get the second warning, but I think it's a false alarm. EIGEN_MALLOC_ALREADY_ALIGNED is false (https://eigen.tuxfamily.org/dox/Memory_8h_source.html) because the test is glibc specific. It would be nice if there was a way to silence it, either by exporting PCL_SILENCE_MALLOC_WARNING or EIGEN_MALLOC_ALREADY_ALIGNED.

…that use musl libc (e.g. Alpine)

This addresses the warning:
src/common/include/pcl/memory.h:71:2: warning: #warning "Potential runtime error due to aligned malloc mismatch! PCL was likely compiled without AVX support but you enabled AVX for your code (to silence this message at your own risk, define PCL_SILENCE_MALLOC_WARNING=1)" [-Wcpp]
@rhuitl
Copy link
Contributor Author

rhuitl commented Mar 20, 2025

Summary: PCL can be compiled against musl libc by configuring PCL with -DHAVE_POSIX_MEMALIGN.

The remaining warning is coming from Eigen. I made sure the Eigen package exports EIGEN_MALLOC_ALREADY_ALIGNED to downstream, which is easy with Conan by adding

def package_info(self):
    # ...
    if self.settings.get_safe("os.distro") == "alpine":
        self.cpp_info.components["eigen3"].defines = ["EIGEN_MALLOC_ALREADY_ALIGNED"]

@rhuitl rhuitl closed this Mar 20, 2025
@mvieth
Copy link
Member

mvieth commented Mar 20, 2025

Summary: PCL can be compiled against musl libc by configuring PCL with -DHAVE_POSIX_MEMALIGN.

HAVE_POSIX_MEMALIGN should actually be set automatically by CMake: https://github.com/PointCloudLibrary/pcl/blob/master/cmake/pcl_find_sse.cmake#L47 However, the function that detects it may or may not be called, depending on PCL_ENABLE_SSE and CMAKE_CXX_FLAGS: https://github.com/PointCloudLibrary/pcl/blob/master/CMakeLists.txt#L110

Alternatively, you could also add -DMALLOC_ALIGNED=1 to the CMAKE_CXX_FLAGS, then pcl::aligned_malloc chooses std::malloc.

The remaining warning is coming from Eigen. I made sure the Eigen package exports EIGEN_MALLOC_ALREADY_ALIGNED to downstream, which is easy with Conan by adding

def package_info(self):
    # ...
    if self.settings.get_safe("os.distro") == "alpine":
        self.cpp_info.components["eigen3"].defines = ["EIGEN_MALLOC_ALREADY_ALIGNED"]

Seems like a good solution, at least as far as my knowledge of musl and Conan goes. Be aware though that EIGEN_MALLOC_ALREADY_ALIGNED must be true with reference to EIGEN_DEFAULT_ALIGN_BYTES, which in turn is chosen based on architecture options (SSE, AVX). For example, lets say you enable AVX512, then Eigen sets EIGEN_DEFAULT_ALIGN_BYTES=64. If you then set EIGEN_MALLOC_ALREADY_ALIGNED=1, std::malloc must return 64 byte aligned memory.

@rhuitl
Copy link
Contributor Author

rhuitl commented Mar 20, 2025

HAVE_POSIX_MEMALIGN should actually be set automatically by CMake: https://github.com/PointCloudLibrary/pcl/blob/master/cmake/pcl_find_sse.cmake#L47 However, the function that detects it may or may not be called, depending on PCL_ENABLE_SSE and CMAKE_CXX_FLAGS: https://github.com/PointCloudLibrary/pcl/blob/master/CMakeLists.txt#L110

I missed that. It's because Conan automatically adds a compiler flag: https://github.com/conan-io/conan/blob/develop2/conan/tools/build/flags.py#L44
So i get

-- Conan toolchain: Defining architecture flag: -m64
-- CMAKE_CXX_FLAGS=-m64
-- CMAKE_CXX_FLAGS_DEFAULT=

and PCL_CHECK_FOR_SSE is not run. I have a few remarks to this.

  • It looks like even though it's named PCL_CHECK_FOR_SSE, this CMake function would also set HAVE_POSIX_MEMALIGN on ARM platforms, right? Because for Alpine/ARM, we have the same alignment detection issue.
  • As an extra complication, the conanfile.py from https://github.com/conan-io/conan-center-index/blob/master/recipes/pcl/all/conanfile.py#L339 disables SSE on non-Intel platforms by setting PCL_ENABLE_SSE=OFF. I didn't try what happens if I force it. But I feel like it would be better to check for the memory alignment functions outside the SSE detection code. After all, pcl_macros.h does not care if SSE was enabled or not, it always errors out if the alignment is unknown.
  • I already opted out from -march=native (PCL_ENABLE_MARCHNATIVE) because the build happens on a build server, not on the machines where PCL will run.
  • It looks like PCL_CHECK_FOR_SSE figures out what SSE (and AVX) versions are available on the build agent.
  • Macros like __SSE4_1__ are needed to actually activate the optimized code paths (these are automatic, except on MSVC).

So is my assumption correct that I'd have to go fully custom with the flags in order to configure an SSE+AVX-enabled PCL build that will run on my target architecture? In particular the PCL-specific macros.

So a combination of -msse and -D__SSE__ etc. For AVX, it seems there are no macros to define.

Seems like a good solution, at least as far as my knowledge of musl and Conan goes. Be aware though that EIGEN_MALLOC_ALREADY_ALIGNED must be true with reference to EIGEN_DEFAULT_ALIGN_BYTES, which in turn is chosen based on architecture options (SSE, AVX). For example, lets say you enable AVX512, then Eigen sets EIGEN_DEFAULT_ALIGN_BYTES=64. If you then set EIGEN_MALLOC_ALREADY_ALIGNED=1, std::malloc must return 64 byte aligned memory.

Yes... correct, and it would only return 32 byte aligned memory. Not good. I changed my approach slightly, adding a patch for Eigen's Memory.h that adds __MUSL__.

Eigen patch

--- a/Eigen/src/Core/util/Memory.h	2021-08-18 22:41:58.000000000 +0200
+++ b/Eigen/src/Core/util/Memory.h	2025-03-20 20:20:06.748421180 +0100
@@ -38,6 +38,13 @@
   #define EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED 0
 #endif
 
+#if defined(__MUSL__) \
+ && defined(__LP64__) && ! defined( __SANITIZE_ADDRESS__ ) && (EIGEN_DEFAULT_ALIGN_BYTES == 16)
+  #define EIGEN_MUSL_MALLOC_ALREADY_ALIGNED 1
+#else
+  #define EIGEN_MUSL_MALLOC_ALREADY_ALIGNED 0
+#endif
+
 // FreeBSD 6 seems to have 16-byte aligned malloc
 //   See http://svn.freebsd.org/viewvc/base/stable/6/lib/libc/stdlib/malloc.c?view=markup
 // FreeBSD 7 seems to have 16-byte aligned malloc except on ARM and MIPS architectures
@@ -50,7 +57,8 @@
 
 #if (EIGEN_OS_MAC && (EIGEN_DEFAULT_ALIGN_BYTES == 16))     \
  || (EIGEN_OS_WIN64 && (EIGEN_DEFAULT_ALIGN_BYTES == 16))   \
- || EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED              \
+ || EIGEN_GLIBC_MALLOC_ALREADY_ALIGNED                      \
+ || EIGEN_MUSL_MALLOC_ALREADY_ALIGNED                       \
  || EIGEN_FREEBSD_MALLOC_ALREADY_ALIGNED
   #define EIGEN_MALLOC_ALREADY_ALIGNED 1
 #else

And conanfile.py now reads

if self.settings.get_safe("os.distro") == "alpine":
    self.cpp_info.components["eigen3"].defines = ["__MUSL__"]

@mvieth
Copy link
Member

mvieth commented Mar 22, 2025

But I feel like it would be better to check for the memory alignment functions outside the SSE detection code

I agree, HAVE_MM_MALLOC and HAVE_POSIX_MEMALIGN should be checked in CMakeLists.txt, independently from the SSE stuff.

It looks like PCL_CHECK_FOR_SSE figures out what SSE (and AVX) versions are available on the build agent.

I found this on the topic: https://cmake.org/cmake/help/book/mastering-cmake/chapter/Cross%20Compiling%20With%20CMake.html#using-compile-checks So if I understand correctly, check_cxx_source_runs does the check for the target architecture, but obviously cannot run the code.
I think it could make sense to change all the check_cxx_source_runs to check_cxx_source_compiles because running the code does not really provide any additional information anyway. The check for HAVE_SSE4_2_EXTENSIONS is the only one that does a run-time check, but I don't see how that would ever fail...
Additionally, PCL could set the default of PCL_ENABLE_SSE and PCL_ENABLE_AVX to false if CMAKE_SYSTEM_PROCESSOR contains arm, e.g.

if(CMAKE_SYSTEM_PROCESSOR MATCHES "arm")
option(PCL_ENABLE_SSE "Enable or Disable SSE optimizations." OFF)
else()
option(PCL_ENABLE_SSE "Enable or Disable SSE optimizations." ON)
endif()
mark_as_advanced(PCL_ENABLE_SSE)

Same for AVX.
If you like, fell free to start a pull request for these changes. Otherwise I will start one when I have time.

Related issues:
#5843
#6151

Macros like __SSE4_1__ are needed to actually activate the optimized code paths (these are automatic, except on MSVC).

The compiler flags and preprocessor defines are handled correctly by PCL's CMake scripts, no? Note that gcc and clang automatically define __SSE4_1__ and similar: https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile

So is my assumption correct that I'd have to go fully custom with the flags in order to configure an SSE+AVX-enabled PCL build that will run on my target architecture?

If CMAKE_CXX_FLAGS are not empty, then yes, PCL_CHECK_FOR_SSE and PCL_CHECK_FOR_AVX will currently not be run. I actually started a branch to change that, but never finished it:
mvieth@33cee56
Feel free to take a look and tell me what you think, maybe I can continue with it.

@rhuitl
Copy link
Contributor Author

rhuitl commented Mar 24, 2025

I think I'm going to change this PR to run the alignment detection routines outside the SSE detection. That would fully resolve the issue I initially had.

I'm now setting -march=haswell for x86_64 builds and -march=armv8.2-a for ARM from outside via CXXFLAGS and CFLAGS. Setting -march will implicitly set -mtune and all the SSE/AVX flags like -msse, as well as the preprocessor macros __SSE__ etc, so this is really all that needs to be done for GCC/Clang, and it can be done without modifying PCL.

So the question is what do we need PCL_CHECK_FOR_SSE for? I think we can differentiate three cases:

  • Make an optimized build to run on the local dev machine only, for PCL devs & researchers, who want the best performance on their local machine and don't need to run the binaries elsewhere.
  • Make an optimized build that can run on a wider range of machines. As PCL does not do runtime detection of cpu features, a compromise between compatibility and available extensions has to be made (e.g. Haswell, released in 2013).
  • Cross-compile for a different architecture

What I don't get is why we do compiler checks (or even runtime checks as you noticed), either with -march=native or without (PCL_ENABLE_MARCHNATIVE=0 or cross-compiling), and then extract the supported SSE level, and then add these -msse flags manually. Why not just use -march=native or no -march then (and nothing else), i.e., leave it up to the compiler to automatically set -msse flags?
I agree that checking if it actually runs seems to be redundant or counter-productive. But I don't know what the intention was when the first check has been introduced...

There are a few other things the SSE detection file does:

  • -fpmath=sse and -ffloat-store for x86_32 to avoid different calculation results as with SSE
  • Define the macros for MSVC which doesn't set them.

In summary, I feel like the SSE and AVX cmake files could be removed more or less completely. People should set -march depending on whether they want to optimize for the local machine (native) or for production (e.g. haswell for good SSE coverage) or without particular optimizations (no -march). When cross-compiling, you'll probably have a lot of CXXFLAGS anyway. And for MSVC I don't know, might be that for that one, we actually do need the checks so the macros can be defined.

@mvieth
Copy link
Member

mvieth commented Mar 30, 2025

I think I'm going to change this PR to run the alignment detection routines outside the SSE detection. That would fully resolve the issue I initially had.

Great! Feel free to do that.

I'm now setting -march=haswell for x86_64 builds and -march=armv8.2-a for ARM from outside via CXXFLAGS and CFLAGS.

Sounds like a good solution.

So the question is what do we need PCL_CHECK_FOR_SSE for? I think we can differentiate three cases:

* Make an optimized build to run on the local dev machine only, for PCL devs & researchers, who want the best performance on their local machine and don't need to run the binaries elsewhere.

* Make an optimized build that can run on a wider range of machines. As PCL does not do runtime detection of cpu features, a compromise between compatibility and available extensions has to be made (e.g. Haswell, released in 2013).

* Cross-compile for a different architecture

What I don't get is why we do compiler checks (or even runtime checks as you noticed), either with -march=native or without (PCL_ENABLE_MARCHNATIVE=0 or cross-compiling), and then extract the supported SSE level, and then add these -msse flags manually. Why not just use -march=native or no -march then (and nothing else), i.e., leave it up to the compiler to automatically set -msse flags? I agree that checking if it actually runs seems to be redundant or counter-productive. But I don't know what the intention was when the first check has been introduced...

There are a few other things the SSE detection file does:

* `-fpmath=sse` and `-ffloat-store` for x86_32 to avoid different calculation results as with SSE

* Define the macros for MSVC which doesn't set them.

In summary, I feel like the SSE and AVX cmake files could be removed more or less completely. People should set -march depending on whether they want to optimize for the local machine (native) or for production (e.g. haswell for good SSE coverage) or without particular optimizations (no -march). When cross-compiling, you'll probably have a lot of CXXFLAGS anyway. And for MSVC I don't know, might be that for that one, we actually do need the checks so the macros can be defined.

I think a big advantage of the sse/avx checks is that they enable an optimized/fast PCL installation by default, without requiring the user to do or specify anything. This is especially convenient for new users or users who are not so familiar with sse/avx/-march flags. Also, I am not sure if there is a (easy) way to detect the target architecture and pass it to PCL via -march in all package managers (vcpkg, apt, conan, homebrew, macports, ...).
Users who want to set their own architecture flags (like you) can still do that, so I see no disadvantage in having PCL_CHECK_FOR_SSE and PCL_CHECK_FOR_AVX.

@rhuitl
Copy link
Contributor Author

rhuitl commented Apr 3, 2025

I opened another PR for the extraction of the memory alignment detection code.

I think a big advantage of the sse/avx checks is that they enable an optimized/fast PCL installation by default, without requiring the user to do or specify anything. This is especially convenient for new users or users who are not so familiar with sse/avx/-march flags. Also, I am not sure if there is a (easy) way to detect the target architecture and pass it to PCL via -march in all package managers (vcpkg, apt, conan, homebrew, macports, ...).

Absolutely! But you can achieve this by setting -march=native, no other -msse* flags are required, they follow automatically. This will also set -mtune so you get a build that makes even better use of the local CPU. Right now, PCL would by default not take advantage of the instruction set (beyond what SSE/AVX enable) or tune code generation for the local CPU.

Anyhow, I don't have a strong opinion here, also because it seems that it won't affect too many people. Ubuntu, for example, compiles PCL with the SSE detection disabled (https://git.launchpad.net/ubuntu/+source/pcl/tree/debian/rules#n35), so we get SSE2 on x86_64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants