Skip to content

Using __setitem__ on source node in mapped pipeline has no effect #224

@nvaytet

Description

@nvaytet

Example:

from typing import NewType
import sciline as sl
import pandas as pd

_fake_filesytem = {
    'file102.txt': [1, 2, float('nan'), 3],
    'file103.txt': [4, 5, 6, 7],
    'file104.txt': [8, 9, 10, 11, 12],
    'file105.txt': [13, 14, 15],
}

Filename = NewType('Filename', str)
RawData = NewType('RawData', dict)
CleanedData = NewType('CleanedData', list)
ScaleFactor = NewType('ScaleFactor', float)
Result = NewType('Result', float)


# 2. Define providers


def load(filename: Filename) -> RawData:
    """Load the data from the filename."""

    data = _fake_filesytem[filename]
    return RawData({'data': data, 'meta': {'filename': filename}})


def clean(raw_data: RawData) -> CleanedData:
    """Clean the data, removing NaNs."""
    import math

    return CleanedData([x for x in raw_data['data'] if not math.isnan(x)])

def process(data: CleanedData, param: ScaleFactor) -> Result:
    """Process the data, multiplying the sum by the scale factor."""
    return Result(sum(data) * param)



# 3. Create pipeline

providers = [load, clean, process]
params = {ScaleFactor: 2.0}
base = sl.Pipeline(providers, params=params)

# 4. Create mapped workflow

run_ids = [102, 103, 104, 105]
filenames = [f'file{i}.txt' for i in run_ids]
param_table = pd.DataFrame(
    {Filename: filenames}, index=run_ids
).rename_axis(index='run_id')

mapped = base.map(param_table)
mapped.visualize(sl.get_mapped_node_names(mapped, Result))
Image
# Get result mapped node names
names = sl.get_mapped_node_names(mapped, Result)

# Compute result for first file
mapped.compute(names[102])  # yields 12.0

# Overwrite result for first file
mapped[names[102]] = 488
mapped.compute(names[102])  # yields 488

So far so good.
But if I try to overwrite the value for a source node (Filename), it has no effect:

mapped = base.map(param_table)
names = sl.get_mapped_node_names(mapped, Filename)

mapped.compute(names[102])  # yields 'file102.txt'
mapped[names[102]] = "NEWFILE.txt"
mapped.compute(names[102])  # yields 'file102.txt' instead of "NEWFILE.txt"

It would appear that the setitem makes a new node in the graph, but the compute continues to use the value inside the Pandas DataFrame.
If we modify the DataFrame in-place param_table.iloc[0, 0] = "NEWFILE.txt", and mapped.compute(names[102]) we then get the correct result.

Maybe we are not meant to be touching the mapped nodes, but because we are able to overwrite other nodes in the graph, it seems the behaviour is inconsistent.
We should either prevent editing mapped nodes (probably difficult and/or not what we want?), or allow changing the source node values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions