Skip to content

Commit d4188d6

Browse files
committed
Add docs
1 parent e358ce0 commit d4188d6

File tree

3 files changed

+46
-11
lines changed

3 files changed

+46
-11
lines changed

python/CHANGELOG.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,20 @@
1212
associated with each individual as a numpy array.
1313
(:user:`benjeffery`, :pr:`3153`)
1414

15+
- Add ``TreeSequence.map_to_vcf_model`` method to return a mapping of
16+
the tree sequence to the VCF model.
17+
(:user:`benjeffery`, :pr:`3163`)
1518

1619
**Fixes**
1720

1821
- Correct assertion message when tables are compared with metadata ignored.
1922
(:user:`benjeffery`, :pr:`3162`, :issue:`3161`)
2023

24+
**Changes**
25+
26+
- ``TreeSequence.write_vcf`` now warns instead of errors if an individual has a mix
27+
of sample and non-sample nodes when no specific individuals are provided.
28+
2129
--------------------
2230
[0.6.3] - 2025-04-28
2331
--------------------

python/tskit/trees.py

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6435,16 +6435,15 @@ def write_vcf(
64356435
data model (see :ref:`sec_individual_table_definition`), the genotypes
64366436
for each of the individual's samples are combined into a phased
64376437
multiploid values at each site. By default, all individuals associated
6438-
with sample nodes are included in increasing order of individual ID.
6438+
with only sample nodes are included in increasing order of individual ID.
64396439
64406440
Subsets or permutations of the sample individuals may be specified
64416441
using the ``individuals`` argument. It is an error to specify any
6442-
individuals that are not associated with any nodes, or whose
6443-
nodes are not all samples.
6442+
individuals that are not associated with any nodes.
64446443
64456444
Mixed-sample individuals (e.g., those associated with one node
64466445
that is a sample and another that is not) in the data model will
6447-
result in an error by default. However, such individuals can be
6446+
be ignored by default. However, such individuals can be
64486447
excluded using the ``individuals`` argument.
64496448
64506449
If there are no individuals in the tree sequence,
@@ -10536,11 +10535,41 @@ def map_to_vcf_model(
1053610535
individual_names=None,
1053710536
):
1053810537
"""
10539-
Returns a list of lists of node IDs, where each sublist contains the
10540-
sample nodes associated with the same individual.
10541-
If `individuals` is not specified (the default) nodes for all individuals where
10542-
all of their nodes are samples are returned. If `individuals` is specified,
10543-
only the nodes for the specified individuals are returned.
10538+
Maps the sample nodes in this tree sequence to a representation suitable for
10539+
VCF output, using the individuals if present.
10540+
10541+
Creates a VcfModelMapping object that contains both the nodes-to-individual
10542+
mapping as a 2D array of (individuals, nodes) and the individual names. The
10543+
mapping is created by first checking if the tree sequence contains individuals.
10544+
If it does, the mapping is created using the individuals in the tree sequence.
10545+
If it does not, the mapping is created using the sample nodes and the
10546+
specified ploidy.
10547+
10548+
If neither `name_metadata_key` nor `individual_names` is not specified, the
10549+
individual names are set to "tsk_{individual_id}" for each individual.
10550+
10551+
Warnings are emmitted if any sample nodes do not have an individual ID, or if
10552+
individuals are not specified and the tree sequence contains individuals
10553+
that have no nodes associated with them, or individuals have a mix of sample
10554+
and non-sample nodes.
10555+
10556+
:param list individuals: Specific individual IDs to include in the VCF. If not
10557+
specified and the tree sequence contains individuals, all individuals whose
10558+
nodes all have the flag NODE_IS_SAMPLE set are included.
10559+
:param int ploidy: The ploidy, or number of nodes per individual. Only used when
10560+
the tree sequence does not contain individuals. Cannot be used if the tree
10561+
sequence contains individuals. Defaults to 1 if not specified.
10562+
:param str name_metadata_key: The key in the individual metadata to use
10563+
for individual names. Cannot be specified simultaneously with
10564+
individual_names.
10565+
:param list individual_names: The names to use for each individual. Cannot
10566+
be specified simultaneously with name_metadata_key.
10567+
:return: A VcfModelMapping containing the node-to-individual mapping and
10568+
individual names.
10569+
:raises ValueError: If both name_metadata_key and individual_names are specified,
10570+
if ploidy is specified when individuals are present, if an invalid individual
10571+
ID is specified, if a specified individual has no nodes, or if the number of
10572+
individuals doesn't match the number of names.
1054410573
"""
1054510574

1054610575
if name_metadata_key is not None and individual_names is not None:

python/tskit/vcf.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,6 @@ def __init__(
135135
'"position_transform = lambda x: np.fmax(1, x)"'
136136
)
137137

138-
139-
140138
def __write_header(self, output):
141139
print("##fileformat=VCFv4.2", file=output)
142140
print(f"##source=tskit {provenance.__version__}", file=output)

0 commit comments

Comments
 (0)