-
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Describe the bug
When parsing text from a YAML file in ruamel.YAML it it returns a SingleQuotedScalarString object, which does inherit from the str type. Sending this string to the pure-python lark parser seems to work fine, but when sending it to the cython variant it throws a TypeError.
To Reproduce
The following is a MWE that reproduces the issue:
"""
Requirements:
pip install ruamel.yaml lark-cython lark
"""
import io
import ruamel.yaml
NEW_RUAMEL = 1
class _YamlRepresenter:
@staticmethod
def str_presenter(dumper, data):
# https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data
if len(data.splitlines()) > 1 or '\n' in data:
text_list = [line.rstrip() for line in data.splitlines()]
fixed_data = '\n'.join(text_list)
return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
def _custom_new_ruaml_yaml_obj():
"""
References:
https://stackoverflow.com/questions/59635900/ruamel-yaml-custom-commentedmapping-for-custom-tags
https://stackoverflow.com/questions/528281/how-can-i-include-a-yaml-file-inside-another
https://stackoverflow.com/questions/76870413/using-a-custom-loader-with-ruamel-yaml-0-15-0
"""
# make a new instance, although you could get the YAML
# instance from the constructor argument
class CustomConstructor(ruamel.yaml.constructor.RoundTripConstructor):
...
class CustomRepresenter(ruamel.yaml.representer.RoundTripRepresenter):
...
CustomRepresenter.add_representer(str, _YamlRepresenter.str_presenter)
yaml_obj = ruamel.yaml.YAML()
yaml_obj.Constructor = CustomConstructor
yaml_obj.Representer = CustomRepresenter
yaml_obj.preserve_quotes = True
yaml_obj.width = float('inf')
return yaml_obj
def codeblock(text):
"""
Create a block of text that preserves all newlines and relative indentation
"""
import textwrap
return textwrap.dedent(text).strip('\n')
# For common constructs see:
# https://github.com/lark-parser/lark/blob/master/lark/grammars/common.lark
RESOLUTION_GRAMMAR_PARTS = codeblock(
'''
// Resolution parts of the grammar.
magnitude: NUMBER
unit: WORD
numeric_unit: (magnitude WS* unit)
implicit_unit: unit
resolved_unit: numeric_unit | implicit_unit
%import common.NUMBER
%import common.WS
%import common.WORD
''')
RESOLVED_UNIT_GRAMMAR = codeblock(
r'''
// RESOLVED WINDOW GRAMMAR. Eg. 2GSD
?start: resolved_unit
''') + '\n' + RESOLUTION_GRAMMAR_PARTS
def main():
yaml_obj = _custom_new_ruaml_yaml_obj()
file = io.StringIO("{key: '1mGSD'}")
data = yaml_obj.load(file)
text = data['key']
# https://github.com/lark-parser/lark/blob/master/docs/_static/lark_cheatsheet.pdf
import lark
try:
import lark_cython
parser = lark.Lark(RESOLVED_UNIT_GRAMMAR, start='start', parser='lalr', _plugins=lark_cython.plugins)
except ImportError:
parser = lark.Lark(RESOLVED_UNIT_GRAMMAR, start='start', parser='lalr')
print(f'{type(text)=}')
print(f'{text.__class__.__mro__=}')
parser.parse(text)
if __name__ == '__main__':
"""
CommandLine:
python ~/code/lark_cython/tests/test_yaml.py
"""
main()The type information it prints before it fails is:
type(text)=<class 'ruamel.yaml.scalarstring.SingleQuotedScalarString'>
text.__class__.__mro__=(<class 'ruamel.yaml.scalarstring.SingleQuotedScalarString'>, <class 'ruamel.yaml.scalarstring.ScalarString'>, <class 'str'>, <class 'object'>)
I've tested with versions:
3.11.9 (main, May 14 2024, 08:04:54) [GCC 12.2.0]
ruamel.yaml.__version__ = 0.18.6
lark.__version__ = 1.1.9
lark_cython.__version__ = 0.0.15
And
3.11.9 (main, May 13 2024, 14:03:39) [GCC 11.4.0]
ruamel.yaml.__version__ = 0.17.22
lark.__version__ = 1.1.7
lark_cython.__version__ = 0.0.15
My thought is that cython would handle a class that inherits from a str, but perhaps it doesn't I'm not sure if this can be fixed on the lark-cython side, but I figured it was worth reporting.
My current workarond is to do something like this:
try:
tree = parser.parse(text)
except TypeError:
if isinstance(text, str) and type(text) is not str:
# We could be in a case where cython is failing to handle
# overloaded string types. Try casting to a regular str.
tree = parser.parse(str(text))
else:
raiseReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested