Skip to content

Implement author names parser #5

@andrew2net

Description

@andrew2net

CIE publications list authors' names in one string. I've looked through 10% of the documents and found the following examples:

CIE X048-OP62 "Hemphälä, Hillevi, Glimne Susanne, Heiden, Marina, Lindén, Johannes, Lindberg, Per, Nylén, Per"
CIE X048-PO34 "Chernyak, A.Sh., Fedorishchev, M.A., Kuznetsova, A.B."
CIE X048-OP61 "Baradaran-Razaz, N., Merschbrock, C., Jägerbrand, A.K., Nilsson Tengelin, M."
CIE X048-PO57 "Dotreppe, Guillaume Mario, Coosemans, Jan, Jacobs, Valéry Ann"
CIE X048-OP53 "Ulrika Wänström Lindh, Annika K. Jägerbrand"
CIE X048-PO65 "AbouElhamd, A.R., Saraiji, R."
CIE X048-PO64 "Becak, P., Novak, T. et al."
CIE X048-PO49 "Georgy Boos, Vladimir Budak, Ekaterina Ilyina, Tatyana Meshkova"
CIE X048-PO32 "Lee, H.T.B, Chau, Y.C., Lam, H.S.B."
CIE X048-PO30 "Corell, D.D., Dam- Hansen, C., Thorseth, A."
CIE X048-PO15 "Pan Jiangen, Li Qian, Li Xiaoni"
CIE X048-OP14 "Ruggaber, B., Vollrath, T., Krüger, U., Blattner, P. and Gerloff, T."

The names lists look inconsistent. I've tried to use RegEx names.scan(/(?:and\s)?((?:[\s-]*\p{Lu}\p{L}*){1,2}),?\s((?:\p{Lu}\p{L}*\.?)+)/) but it doesn't work correctly for all the cases. We can get a better result by implementing a names parser with Parslet Gem.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions