Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Trouble reading Harvard Open Metadata MARC files (UTF-8 related?) #89

@viking2917

Description

@viking2917

I am trying to use pymarc to read the Harvard Open Metadata MARC files.

Most of the files process ok but some (for example ab.bib.14.20160401.full.mrc) produce errors when processing. The error I am getting is:

Traceback (most recent call last):
  File "domark.py", line 21, in <module>
    for record in reader:
  File "/Library/Python/2.7/site-packages/six.py", line 535, in next
    return type(self).__next__(self)
  File "/Users/markwatkins/Sites/pharvard/pymarc/reader.py", line 97, in __next__
    utf8_handling=self.utf8_handling)
  File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 74, in __init__
    utf8_handling=utf8_handling)
  File "/Users/markwatkins/Sites/pharvard/pymarc/record.py", line 307, in decode_marc
    code = subfield[0:1].decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

The driver code I am using is:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys
from pymarc import MARCReader

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

if len(sys.argv) >= 2:
    files = [sys.argv[1]]

for file in files:
    with open(file, 'rb') as fh:
        reader = MARCReader(fh, utf8_handling='ignore')
        for record in reader:
#            print "%s by %s" % (record.title(), record.author())
            print(record.as_json())

Other MARC processing tools (e.g. MarcEdit seem to process the file with no issues so I think the file is legitimate).

Am I doing something wrong? Is there an issue with pymarc, possibly UTF-8 processing related?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions