Trouble reading Harvard Open Metadata MARC files (UTF-8 related?)

I am trying to use pymarc to read the [Harvard Open Metadata MARC files](http://library.harvard.edu/open-metadata#Harvard-Library-Bibliographic-Dataset-Use-Terms). 

Most of the files process ok but some (for example ab.bib.14.20160401.full.mrc) produce errors when processing. The error I am getting is: 

```
Traceback (most recent call last):
  File "domark.py", line 21, in <module>
    for record in reader:
  File "/Library/Python/2.7/site-packages/six.py", line 535, in next
    return type(self).__next__(self)
  File "/Users/markwatkins/Sites/pharvard/pymarc/reader.py", line 97, in __next__
    utf8_handling=self.utf8_handling)
  File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 74, in __init__
    utf8_handling=utf8_handling)
  File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 307, in decode_marc
    code = subfield[0:1].decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
```

The driver code I am using is:

```
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys
from pymarc import MARCReader

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

if len(sys.argv) >= 2:
    files = [sys.argv[1]]

for file in files:
    with open(file, 'rb') as fh:
        reader = MARCReader(fh, utf8_handling='ignore')
        for record in reader:
#            print "%s by %s" % (record.title(), record.author())
            print(record.as_json())
```

Other MARC processing tools (e.g. MarcEdit seem to process the file with no issues so I think the file is legitimate).

Am I doing something wrong? Is there an issue with pymarc, possibly UTF-8 processing related?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trouble reading Harvard Open Metadata MARC files (UTF-8 related?) #89

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trouble reading Harvard Open Metadata MARC files (UTF-8 related?) #89

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions