Skip to content

CI: add ASAN, UBSAN#19

Open
thesamesam wants to merge 3 commits intogentoo:masterfrom
thesamesam:ci
Open

CI: add ASAN, UBSAN#19
thesamesam wants to merge 3 commits intogentoo:masterfrom
thesamesam:ci

Conversation

@thesamesam
Copy link
Member

Signed-off-by: Sam James sam@gentoo.org

@thesamesam thesamesam marked this pull request as draft January 6, 2023 07:23
@thesamesam thesamesam force-pushed the ci branch 4 times, most recently from ffaa9b8 to 1950984 Compare January 6, 2023 07:33
@thesamesam
Copy link
Member Author

meh. I can't hit the ASAN failure locally yet. :(

@thesamesam thesamesam requested a review from grobian January 6, 2023 07:44
@grobian
Copy link
Member

grobian commented Jan 6, 2023

You mean the one in qmanifest? From a quick look it seems to happen with openmp only. I need to look at it later when I have some time to see what it complains about. With some luck I can fix it.

Unrelated: I believe there's also a valgrind run, I'm assuming that won't also run with *SAN, right? Wouldn't make much sense I guess :)

@bstaletic
Copy link
Contributor

Looking at one of the UBSAN errors, the call stack looks like the thing I have come across in #26 !

I have also seen ASAN report a leak, but that does not quite look like the ASAN error in this pull request. I'll be looking into it further.

bstaletic added a commit to bstaletic/portage-utils that referenced this pull request Mar 28, 2024
Previously, `values_set(merge_averages, avgs)` would allocate `avgs`,
then it would be used in `array_for_each(atoms, i, atom)`, but a call to
`xarrayfree_int(avgs)` was missing after the loop.

Hopefully, this, along with gentoo#26, will solve the issues from gentoo#19.
@bstaletic
Copy link
Contributor

bstaletic commented Mar 28, 2024

Unfortunately, #26 and #27 are still not enough to satisfy ASAN. One thing remains:

char *endp = STR + strlen(STR) - 1;\

That line does not consider buf could have an embedded \0 character. I don't know much about gpg, but on my machine, the third line read from here is "\n\x000sh: SHA256\x000\x000 SIGNED MESSAGE-----". Even the previous (2nd) line has some embedded null character.
That means strlen(buf) is 1, while qmanifest has clearly read more than that.

Again, I don't know much about gpg, so I don't even know if the above line is ok-ish.
Here's a godbolt link, reproducing the problem reported by ASAN.
https://godbolt.org/z/qTnKdYGbf

EDIT:

This patch satisfies ASAN, but I don't know if it is correct:

diff --git a/qmanifest.c b/qmanifest.c
index 2bb0f11..d04f743 100644
--- a/qmanifest.c
+++ b/qmanifest.c
@@ -1421,7 +1421,7 @@ verify_manifest(
 #define append_list(STR) \
        if (strncmp(STR, "TIMESTAMP ", 10) != 0 || strncmp(STR, "DIST ", 5) != 0) {\
                char *endp = STR + strlen(STR) - 1;\
-               while (isspace(*endp))\
+               while (isspace(*endp) && endp > STR)\
                        *endp-- = '\0';\
                if (elemslen == elemssize) {\
                        elemssize += LISTSZ;\

EDIT 2:

The above analysis is wrong, but there's still a problem in that append_list(STR) macro. It does not handle lines that contain only whitespace characters, such as "\n". In such cases, *endp-- = '\0' will decrement beyond the start of STR.

I have a few proposals:

  1. The patch above, that will leave a single whitespace character, in case the line only has whitespaces.
  2. If we want to actually place a \0 in the first character place, we could change the patch above to end >= STR. However, that will still decrement endp beyond begin. In C++ even that is UB, though not sure about C.
  3. If we need to avoid that last decrement, the patch becomes uglier:
diff --git a/qmanifest.c b/qmanifest.c
index 2bb0f11..00dfea3 100644
--- a/qmanifest.c
+++ b/qmanifest.c
@@ -1421,8 +1421,11 @@ verify_manifest(
 #define append_list(STR) \
        if (strncmp(STR, "TIMESTAMP ", 10) != 0 || strncmp(STR, "DIST ", 5) != 0) {\
                char *endp = STR + strlen(STR) - 1;\
-               while (isspace(*endp))\
-                       *endp-- = '\0';\
+               while (isspace(*endp) && endp >= STR) {\
+                       *endp = '\0';\
+                       if(endp > STR) --endp;\
+                       if(endp == STR) break;\
+               }\
                if (elemslen == elemssize) {\
                        elemssize += LISTSZ;\
                        elems = xrealloc(elems, elemssize * sizeof(elems[0]));\

EDIT 3:

Maybe strcspn would be useful? OpenBSD man page for fgets recommends strcspn for trimming line feeds.

gentoo-bot pushed a commit that referenced this pull request Mar 29, 2024
Previously, `values_set(merge_averages, avgs)` would allocate `avgs`,
then it would be used in `array_for_each(atoms, i, atom)`, but a call to
`xarrayfree_int(avgs)` was missing after the loop.

Hopefully, this, along with #26, will solve the issues from #19.

Signed-off-by: Fabian Groffen <grobian@gentoo.org>
@grobian
Copy link
Member

grobian commented Mar 29, 2024

hey, thanks for your analysis, that's cutting a few corners too much

@grobian
Copy link
Member

grobian commented Mar 29, 2024

As far as I understand, your simplest patch is close to the correct one. Instead of endp > STR can just use endp >= STR to make it never consider before the string, but terminate the string into a 0-length one should everything be whitespace.

Doing that is eventually not really relevant because all strings that are compared to are > 1 char, so while at it, I threw in a small optimisation to not bother with anything less than 5 characters. (least: "AUX" + SPACE + 1-char).

gentoo-bot pushed a commit that referenced this pull request Mar 29, 2024
Empty strings, or those being just whitespace were not handled
correctly.  Thanks bstaletic in PR #19 for pointing this out.  Avoid
running under the original string pointer and skip any checks for
strings that are too short to match anything in particular.  This sweeps
an edgecase of just a single whitespace char under the carpet -- which
is just about fine, for it needs not to be handled for any legitimate
case.

Signed-off-by: Fabian Groffen <grobian@gentoo.org>
@thesamesam
Copy link
Member Author

Once the last PR is in, I'll rebase to get both of your patches in and hopefully it's green then. Thanks a bunch both.

@grobian
Copy link
Member

grobian commented Apr 8, 2024

ok, should be pushed now

Signed-off-by: Sam James <sam@gentoo.org>
This is much easier to test with and covers e.g. updated PRs.

Signed-off-by: Sam James <sam@gentoo.org>
It's useful to be able to see failures in various configurations immediately
and all of our jobs take roughly the same amount of time anyway.

(It's not like we have some very slow jobs which run after.)

Signed-off-by: Sam James <sam@gentoo.org>
@thesamesam thesamesam marked this pull request as ready for review April 8, 2024 19:33
@bstaletic
Copy link
Contributor

All green! Nice!

none)
;;
asan)
export CFLAGS="${CFLAGS} -ggdb3 -fsanitize=address"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to use -O3 here? do we get useful stacks and analysis this way? For valgrind we also run with -g -pipe to get understandable complaints

Copy link
Contributor

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas, since I'm here.

  1. ASAN and UBSSAN can be combined, as in -fsanitize=address,undefined. Those are the only two that can be combined, so it might be worth it to reduce the number of CI jobs getting dispatched.
  2. Consider enabling MSAN as well, as in -fsanitize=memory. It is a clang-only sanitizer meant to catch reads of uninitialized memory. There are things that are only caught by MSAN: https://godbolt.org/z/TKzxYaEhW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants