Skip to content

Commit e83cabd

Browse files
author
Felix Igelbrink
committed
initial commit
1 parent c3af5c4 commit e83cabd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+759
-1590
lines changed

.coveragerc

Lines changed: 0 additions & 3 deletions
This file was deleted.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: Python package build and publish
2+
3+
on:
4+
release:
5+
types: [created]
6+
7+
jobs:
8+
deploy:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- uses: actions/checkout@v2
12+
- name: Set up Python
13+
uses: actions/setup-python@v1
14+
with:
15+
python-version: 3.8
16+
- name: Install dependencies
17+
run: |
18+
python -m pip install --upgrade pip
19+
pip install twine build
20+
- name: Lint with flake8 for syntax errors
21+
run: |
22+
pip install flake8
23+
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
24+
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
25+
- name: Build Python source distribution and wheel
26+
run: |
27+
python -m build --sdist --wheel --outdir dist/ .
28+
- name: Publish wheels to PyPI
29+
env:
30+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
31+
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
32+
run: |
33+
twine upload dist/*.tar.gz
34+
twine upload dist/*.whl

.travis.yml

Lines changed: 0 additions & 24 deletions
This file was deleted.

MANIFEST.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
recursive-include tests *.json *.py
1+
include LICENSE
2+
recursive-include hdfpath LICENSE README.rst

Makefile

Lines changed: 0 additions & 56 deletions
This file was deleted.

README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# HDFPath
2+
A python package to enable quick searches of Hierarchical Data Format (HDF)-Files based on JSONPath.
3+
These files include the HDF5-Format provided by [`h5py`](https://github.com/h5py/h5py) as well as more recent libraries like
4+
[`zarr`](https://github.com/zarr-developers/zarr-python). Like `jsonpath-ng` any data structure
5+
consisting of dict-like objects conforming to the `collections.abc.Mapping`-interface and lists is also supported.
6+
7+
This package is derived from the [`jsonpath-ng`](https://github.com/h2non/jsonpath-ng) library.
8+
For the query syntax and capabilities please refer to the original documentation at
9+
[https://github.com/h2non/jsonpath-ng](https://github.com/h2non/jsonpath-ng)
10+
11+
## Installation
12+
13+
### Using pip
14+
```
15+
pip install hdfpath
16+
```
17+
18+
### From source
19+
```
20+
git clone https://github.com/mortacious/hdfpath.git
21+
cd hdfpath
22+
python setup.py install
23+
```
24+
25+
## Usage
26+
27+
As HDF-Files are organized as groups containing datasets with both containing optional metadata attributes, this package
28+
adds support to use these attributes directly inside queries. With the optional `metadata_attribute` parameter
29+
to the parse function, the attribute to retrieve the metadata can be chosen.
30+
31+
```python
32+
from hdfpath import parse
33+
import h5py as h5
34+
35+
with h5.File("<HDF5-File>") as f:
36+
# query for all groups/datasets of type "scan" with the num_points attribute being larger than 40_000_000
37+
expr = parse('$..*[?(@._type == "scan" & @._num_points > 40000000)]', metadata_attribute='attrs')
38+
val = [match.value for match in expr.find(f)]
39+
print(val)
40+
```
41+
42+
The metadata attributes are accessible inside the query using the `_` prefix.
43+
Additionally, the use of regular expressions to match the fields is available through the `` `regex()` `` function.
44+
For example `` `regex(\\\d+)` `` will only match groups/datasets that can be parsed into an integer number.
45+
46+
## TODOs
47+
48+
- code examples
49+
50+
## Copyright and License
51+
52+
Copyright 2021 - Felix Igelbrink
53+
54+
Licensed under the Apache License, Version 2.0 (the "License"); you may
55+
not use this file except in compliance with the License. You may obtain
56+
a copy of the License at
57+
58+
http://www.apache.org/licenses/LICENSE-2.0
59+
60+
Unless required by applicable law or agreed to in writing, software
61+
distributed under the License is distributed on an "AS IS" BASIS,
62+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
63+
See the License for the specific language governing permissions and
64+
limitations under the License.

hdfpath/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .hdfpath import parse

hdfpath/_version.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__version__ = "0.1.0"

hdfpath/hdfpath.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
from .jsonpath_ng import Fields, DatumInContext, auto_id_field, AutoIdForDatum, NOT_SET, Child
2+
from .jsonpath_ng.ext.parser import ExtentedJsonPathParser
3+
from .jsonpath_ng.ext.string import DefinitionInvalid
4+
import re
5+
from itertools import chain
6+
7+
8+
class AttributedFields(Fields):
9+
"""
10+
Support for Fields with additional (metadata) attributes in a dict-like structure.
11+
12+
Parameters
13+
----------
14+
fields: list(str) or str
15+
The fields to consider. If "*" is passed, all fields at this specific level are used.
16+
attribute: str
17+
Additional attribute of the objects at this level to also consider (default: 'attrs'). HDF-Files
18+
usually have metadata attached at each group or dataset, which can be used for queries this way.
19+
"""
20+
def __init__(self, *fields, attribute='attrs'):
21+
super().__init__(*fields)
22+
self.attribute = attribute
23+
24+
def get_field_datum(self, datum, field, create):
25+
if field == auto_id_field:
26+
return AutoIdForDatum(datum)
27+
try:
28+
if field.startswith("_"):
29+
try:
30+
field = field[1:]
31+
attr = getattr(datum.value, self.attribute)
32+
field_value = attr.get(field, NOT_SET)
33+
except AttributeError:
34+
field_value = NOT_SET
35+
else:
36+
field_value = datum.value.get(field, NOT_SET)
37+
38+
if field_value is NOT_SET:
39+
if create:
40+
datum.value[field] = field_value = {}
41+
else:
42+
return None
43+
return DatumInContext(field_value, path=Fields(field), context=datum)
44+
except (TypeError, AttributeError):
45+
return None
46+
47+
def reified_fields(self, datum):
48+
if '*' not in self.fields:
49+
return self.fields
50+
else:
51+
try:
52+
iterables = [datum.value.keys()]
53+
try:
54+
attr = getattr(datum.value, self.attribute)
55+
iterables.append(("_" + k for k in attr.keys()))
56+
except AttributeError:
57+
pass
58+
59+
fields = tuple(chain(*iterables))
60+
return fields if auto_id_field is None else fields + (auto_id_field,)
61+
except AttributeError:
62+
return ()
63+
64+
65+
REGEX = re.compile("regex\((.*)\)")
66+
67+
68+
class Regex(AttributedFields):
69+
"""
70+
Only consider fields that match the given regular expression. Different from the Fields-class only
71+
one expression is allowed here.
72+
73+
Parameters
74+
----------
75+
method: str
76+
String containing a regular expression in the form: 'regex(<regex>)'.
77+
Backslashes an other regex-specific characters ('\', etc.) have to be escaped properly.
78+
"""
79+
def __init__(self, method=None):
80+
m = REGEX.match(method)
81+
if m is None:
82+
raise DefinitionInvalid("%s is not valid" % method)
83+
expr = m.group(1).strip()
84+
self.regex = re.compile(expr)
85+
super().__init__("*")
86+
87+
def reified_fields(self, datum):
88+
fields = [field for field in super().reified_fields(datum) if self.regex.fullmatch(field)]
89+
return tuple(fields)
90+
91+
def __str__(self):
92+
return f'regex({self.regex.pattern})'
93+
94+
def __repr__(self):
95+
return f'{self.__class__.__name__}({self.regex.pattern})'
96+
97+
def __eq__(self, other):
98+
return isinstance(other, Regex) and self.regex == other.regex
99+
100+
101+
class HDFPathParser(ExtentedJsonPathParser):
102+
"""Custom LALR-parser for HDF5 files based on JsonPath"""
103+
def __init__(self, metadata_attribute='attrs', debug=False, lexer_class=None):
104+
super().__init__(debug=debug, lexer_class=lexer_class)
105+
self.metadata_attribute = metadata_attribute
106+
107+
def p_jsonpath_named_operator(self, p):
108+
"jsonpath : NAMED_OPERATOR"
109+
if p[1].startswith("regex("):
110+
p[0] = Regex(p[1])
111+
else:
112+
super().p_jsonpath_named_operator(p)
113+
114+
def p_jsonpath_fields(self, p):
115+
"jsonpath : fields_or_any"
116+
p[0] = AttributedFields(*p[1], attribute=self.metadata_attribute)
117+
118+
def p_jsonpath_fieldbrackets(self, p):
119+
"jsonpath : '[' fields ']'"
120+
p[0] = AttributedFields(*p[2], attribute=self.metadata_attribute)
121+
122+
def p_jsonpath_child_fieldbrackets(self, p):
123+
"jsonpath : jsonpath '[' fields ']'"
124+
p[0] = Child(p[1], AttributedFields(*p[3], attribute=self.metadata_attribute))
125+
126+
127+
def parse(path, metadata_attribute='attrs', debug=False):
128+
return HDFPathParser(metadata_attribute=metadata_attribute, debug=debug).parse(path)
File renamed without changes.

0 commit comments

Comments
 (0)