-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Example:
from typing import NewType
import sciline as sl
import pandas as pd
_fake_filesytem = {
'file102.txt': [1, 2, float('nan'), 3],
'file103.txt': [4, 5, 6, 7],
'file104.txt': [8, 9, 10, 11, 12],
'file105.txt': [13, 14, 15],
}
Filename = NewType('Filename', str)
RawData = NewType('RawData', dict)
CleanedData = NewType('CleanedData', list)
ScaleFactor = NewType('ScaleFactor', float)
Result = NewType('Result', float)
# 2. Define providers
def load(filename: Filename) -> RawData:
"""Load the data from the filename."""
data = _fake_filesytem[filename]
return RawData({'data': data, 'meta': {'filename': filename}})
def clean(raw_data: RawData) -> CleanedData:
"""Clean the data, removing NaNs."""
import math
return CleanedData([x for x in raw_data['data'] if not math.isnan(x)])
def process(data: CleanedData, param: ScaleFactor) -> Result:
"""Process the data, multiplying the sum by the scale factor."""
return Result(sum(data) * param)
# 3. Create pipeline
providers = [load, clean, process]
params = {ScaleFactor: 2.0}
base = sl.Pipeline(providers, params=params)
# 4. Create mapped workflow
run_ids = [102, 103, 104, 105]
filenames = [f'file{i}.txt' for i in run_ids]
param_table = pd.DataFrame(
{Filename: filenames}, index=run_ids
).rename_axis(index='run_id')
mapped = base.map(param_table)
mapped.visualize(sl.get_mapped_node_names(mapped, Result))
# Get result mapped node names
names = sl.get_mapped_node_names(mapped, Result)
# Compute result for first file
mapped.compute(names[102]) # yields 12.0
# Overwrite result for first file
mapped[names[102]] = 488
mapped.compute(names[102]) # yields 488So far so good.
But if I try to overwrite the value for a source node (Filename), it has no effect:
mapped = base.map(param_table)
names = sl.get_mapped_node_names(mapped, Filename)
mapped.compute(names[102]) # yields 'file102.txt'
mapped[names[102]] = "NEWFILE.txt"
mapped.compute(names[102]) # yields 'file102.txt' instead of "NEWFILE.txt"It would appear that the setitem makes a new node in the graph, but the compute continues to use the value inside the Pandas DataFrame.
If we modify the DataFrame in-place param_table.iloc[0, 0] = "NEWFILE.txt", and mapped.compute(names[102]) we then get the correct result.
Maybe we are not meant to be touching the mapped nodes, but because we are able to overwrite other nodes in the graph, it seems the behaviour is inconsistent.
We should either prevent editing mapped nodes (probably difficult and/or not what we want?), or allow changing the source node values.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels