Snowballing with Semantic Scholar#3
Conversation
blochberger
left a comment
There was a problem hiding this comment.
I finally found the time to review your pull request. Thanks for your submission! It looks good in general.
I have addressed some concerns, mainly potential database inconsistency that should be addressed with django.db.transaction.atomic and provided additional albeit minor suggestions for further improvements.
|
|
||
| # Assign authors | ||
| for position, author in enumerate(authors): | ||
| publication_author, created = PublicationAuthor.objects.get_or_create( |
There was a problem hiding this comment.
If one of these operations fail, the database will be inconsistent: only some authors get assigned to the publication and the operations from here on will not be executed. Hence, the complete import of a single publication should happen inside a transaction.atomic() context:
with transaction.atomic():
…| title = "Reference" if is_reference else "Citation" | ||
| if 0 < len(objs): | ||
| self.echo(f"--- {title}s ---") | ||
| publications: List[Publication] = [] |
There was a problem hiding this comment.
This is assigned but never used. Only publications are added to the list.
|
|
||
| # Add to Semantic Scholar and link publications | ||
| if doi: | ||
| publication = Publication.objects.get(doi=doi) |
There was a problem hiding this comment.
No need to query the database to fetch the publication, as the publication variable already holds it.
| if title.index(' ') > -1: | ||
| cite_key += title[:title.index(' ')] | ||
| else: | ||
| cite_key += title |
There was a problem hiding this comment.
| if title.index(' ') > -1: | |
| cite_key += title[:title.index(' ')] | |
| else: | |
| cite_key += title | |
| cite_key += title.split(' ')[0] |
| first = True | ||
| cite_key = '' | ||
| for author in data.get('authors', []): | ||
| name = author.get('name', '') | ||
| author, created = Author.objects.get_or_create(name=name) | ||
| if created: | ||
| self.echo(f"Added author: {author}") | ||
| else: | ||
| self.echo(f"Author '{author}' alreay known") | ||
| authors.append(author) | ||
| if first: | ||
| first = False | ||
| if name.rindex(' ') > -1: | ||
| cite_key = name[name.rindex(' '):] | ||
| else: | ||
| cite_key = name |
There was a problem hiding this comment.
The code seems a bit complex. The variable first is not really required, and a check whether cite_key is still empty would suffice. Since authors are collected anyway, one could simply extract the last name of the first author afterwards, e. g.:
| first = True | |
| cite_key = '' | |
| for author in data.get('authors', []): | |
| name = author.get('name', '') | |
| author, created = Author.objects.get_or_create(name=name) | |
| if created: | |
| self.echo(f"Added author: {author}") | |
| else: | |
| self.echo(f"Author '{author}' alreay known") | |
| authors.append(author) | |
| if first: | |
| first = False | |
| if name.rindex(' ') > -1: | |
| cite_key = name[name.rindex(' '):] | |
| else: | |
| cite_key = name | |
| for author in data.get('authors', []): | |
| name = author.get('name', '') | |
| author, created = Author.objects.get_or_create(name=name) | |
| if created: | |
| self.echo(f"Added author: {author}") | |
| else: | |
| self.echo(f"Author '{author}' alreay known") | |
| authors.append(author) | |
| cite_key = '' | |
| if authors: | |
| cite_key = authors[0].name.split(' ')[-1] |
| first = True | ||
| cite_key = '' | ||
| for author in data.get('authors', []): | ||
| name = author.get('name', '') |
There was a problem hiding this comment.
It makes no sense adding authors with empty names. A better approach might be:
| name = author.get('name', '') | |
| if name := author.get('name', None): | |
| … |
Addressing #2