Write tests for/add NaN handling of dataframe input case of `metapool.format_pooling_echo_pick_list()`

This entire new branch was added, and according to comments is now be the current preferred functionality (old functionality is labelled "legacy") and there are no for it.  When fixing this, **also note** that this branch does NOT sanitize NaNs (unlike the "legacy" case).

https://github.com/biocore/kl-metapool/blob/6ebb09426ccd10e6f5785edeac6455f65b05b211/metapool/metapool.py#L1084-L1136

AFAICT this new case is only used for the very last pooling (i.e. the iseqnormed pooling) in the metatranscriptomics notebook: 

https://github.com/biocore/kl-metapool/blob/6ebb09426ccd10e6f5785edeac6455f65b05b211/notebooks/metatranscriptomics_matrix_pipeline_seqcount_norm.ipynb#L1787-L1789)

I found it because the dataframe merge upstream of this is apparently unstable on different platforms (local vs github CI) so the picklist created by this new case comes out in a different order on the different platforms (whereas the legacy case order is stable).


	if isinstance(main_input, pd.DataFrame):
	required_columns = ['Compressed Plate Name',
	'Library Well',
	pooling_vol_column]
	if not all(column in main_input.columns for
	column in required_columns):
	raise ValueError(
	"Your input dataframe does not have the "
	"required columns ['Compressed Plate Name'",
	"'Library Well','%s']. Perhaps you are running "
	"this module out of sequential order."
	% pooling_vol_column
	)
	formatted_df = main_input[['Compressed Plate Name',
	'Library Well',
	pooling_vol_column,
	]]

	# Writing picklist headers
	contents = [
	"Source Plate Name,Source Plate Type,Source Well,"
	"Concentration,Transfer Volume,Destination Plate Name,"
	"Destination Well"
	]

	# Destination well cycling logic
	running_tot = 0
	d = 1

	if dest_plate_shape is None:
	dest_plate_shape = (16, 24)

	for i, pool_row in formatted_df[[pooling_vol_column]].iterrows():
	pool_vol = pool_row[pooling_vol_column]
	# test to see if we will exceed total vol per well
	if running_tot + pool_vol > max_vol_per_well:
	d += 1
	running_tot = pool_vol
	else:
	running_tot += pool_vol

	dest = "%s%d" % (
	chr(ord("A") + int(np.floor(d / dest_plate_shape[0]))),
	(d % dest_plate_shape[1]))

	# writing picklist from row iterations
	contents.append(",".join([formatted_df.loc[i, 'Compressed' +
	' Plate Name'],
	"384LDV_AQ_B2",
	formatted_df.loc[i, 'Library Well'],
	"",
	"%.2f" % pool_vol,
	"NormalizedDNA",

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write tests for/add NaN handling of dataframe input case of `metapool.format_pooling_echo_pick_list()` #340

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	"picklist = format_pooling_echo_pick_list(plate_df_normalized,\n",
	" pooling_vol_column='iSeq normpool volume',\n",
	" max_vol_per_well=30000)\n",

Write tests for/add NaN handling of dataframe input case of metapool.format_pooling_echo_pick_list() #340

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Write tests for/add NaN handling of dataframe input case of `metapool.format_pooling_echo_pick_list()` #340