-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathReadMe.Rmd
More file actions
30 lines (25 loc) · 1.31 KB
/
ReadMe.Rmd
File metadata and controls
30 lines (25 loc) · 1.31 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
title: "readMe"
author: "Elizabeth Permina"
date: "1/22/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## RISmed
RISmed is a package to extract data from PUBMED and parse the output. It has functions to process authors, title, abstract, etc. I'm looking for ways of extracting the reference list (which is a part of extracted xml record). Issue - I didn't find a way to parse it yet. The xml record of the reference list is highly nested and has a lot of repeating tags, so xml to dataframe functions don't work (columns with the same name error)
XML package effectively translates full pubmed xml (with references) into a list with 1) non-unique names of variables 2) nested names like
```{r, eval=FALSE}
$PubmedArticle$PubmedData$ReferenceList$Reference$ArticleIdList
$PubmedArticle$PubmedData$ReferenceList$Reference$ArticleIdList$ArticleId
$PubmedArticle$PubmedData$ReferenceList$Reference$ArticleIdList$ArticleId$text
$PubmedArticle$PubmedData$ReferenceList$Reference$ArticleIdList$ArticleId$.attrs
IdType
"pubmed"
$PubmedArticle$PubmedData$ReferenceList$Reference$PubmedArticle$PubmedData$ReferenceList$Reference$Citation
```
corresponding xml looks like that
88dc43de-71be-11ec-8542-d21848e7cc81.xml
to get an xml file out of PMID id (one by one)
https://pubmed2xl.com/xml/