-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Description
Entries for words containing semicolons have bad form entries, apparently as a result of having their form/headword split at the location of the semicolon.
pv wiktionary.jsonl \
| jq 'select(.word == "too long; didn'\''t read").forms'Gives
{
"form": "too long",
"tags": [
"canonical"
]
},
{
"form": "didn't read",
"tags": [
"canonical"
]
},
{
"form": "tl;dr",
"tags": [
"alternative"
]
},
{
"form": "tl/dr",
"tags": [
"alternative"
]
},And a bunch more correct forms which I won't include because they aren't relevant.
tl;dr has similar problems:
{
"form": "tl",
"tags": [
"canonical"
]
},
{
"form": "dr",
"tags": [
"canonical"
]
},
{
"form": "tldr",
"tags": [
"alternative"
]
},As well as other entries with semicolons.
Seems like the issue might be related to logic splitting on semicolons here:
| + ")?( or |[,;]+)" |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels