✨ Construct

Tree Construct Methods

Construct Tree from list, dictionary, and pandas DataFrame.

To decide which method to use, consider your data type and data values.

Construct tree from	Using full path	Using parent-child relation	Using notation	Add node attributes
String	`str_to_tree`	NA	`newick_to_tree`	No (for `str_to_tree`) Yes (for `newick_to_tree`)
List	`list_to_tree`	`list_to_tree_by_relation`	NA	No
Dictionary	`dict_to_tree`	`nested_dict_to_tree`	NA	Yes
DataFrame	`dataframe_to_tree`	`dataframe_to_tree_by_relation`	NA	Yes

Tree Add Attributes Methods

To add attributes to an existing tree,

Add attributes from	Using full path	Using node name
String	`add_path_to_tree`	NA
Dictionary	`add_dict_to_tree_by_path`	`add_dict_to_tree_by_name`
DataFrame	`add_dataframe_to_tree_by_path`	`add_dataframe_to_tree_by_name`

Note

If attributes are added to existing tree using full path, paths that previously did not exist will be added.
If attributes are added to existing tree using node name, names that previously did not exist will not be created.

These functions are not standalone functions. Under the hood, they have the following dependency,

bigtree.tree.construct

add_path_to_tree

add_path_to_tree(
    tree,
    path,
    sep="/",
    duplicate_name_allowed=True,
    node_attrs={},
)

Add nodes and attributes to existing tree in-place, return node of path added. Adds to existing tree from list of path strings.

Path should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").
Path separator sep is for the input path and can differ from existing tree.

Path can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc., and should not start with another root node.

All attributes in node_attrs will be added to the tree, including attributes with null values.

Examples:

>>> from bigtree import add_path_to_tree, Node
>>> root = Node("a")
>>> add_path_to_tree(root, "a/b/c")
Node(/a/b/c, )
>>> root.show()
a
└── b
    └── c

Parameters:

Name	Type	Description	Default
`tree`	`Node`	existing tree	required
`path`	`str`	path to be added to tree	required
`sep`	`str`	path separator for input `path`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`
`node_attrs`	`Dict[str, Any]`	attributes to add to node, key: attribute name, value: attribute value, optional	`{}`

Returns:

Type	Description
`Node`	(Node)

add_dict_to_tree_by_path

add_dict_to_tree_by_path(
    tree, path_attrs, sep="/", duplicate_name_allowed=True
)

Add nodes and attributes to tree in-place, return root of tree. Adds to existing tree from nested dictionary, key: path, value: dict of attribute name and attribute value.

All attributes in path_attrs will be added to the tree, including attributes with null values.

Path should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").
Path separator sep is for the input path and can differ from existing tree.

Path can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> from bigtree import Node, add_dict_to_tree_by_path
>>> root = Node("a")
>>> path_dict = {
...     "a": {"age": 90},
...     "a/b": {"age": 65},
...     "a/c": {"age": 60},
...     "a/b/d": {"age": 40},
...     "a/b/e": {"age": 35},
...     "a/c/f": {"age": 38},
...     "a/b/e/g": {"age": 10},
...     "a/b/e/h": {"age": 6},
... }
>>> root = add_dict_to_tree_by_path(root, path_dict)
>>> root.show()
a
├── b
│   ├── d
│   └── e
│       ├── g
│       └── h
└── c
    └── f

Parameters:

Name	Type	Description	Default
`tree`	`Node`	existing tree	required
`path_attrs`	`Dict[str, Dict[str, Any]]`	dictionary containing node path and attribute information, key: node path, value: dict of node attribute name and attribute value	required
`sep`	`str`	path separator for input `path_attrs`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`

Returns:

Type	Description
`Node`	(Node)

add_dict_to_tree_by_name

add_dict_to_tree_by_name(tree, name_attrs)

Add attributes to existing tree in-place. Adds to existing tree from nested dictionary, key: name, value: dict of attribute name and attribute value.

All attributes in name_attrs will be added to the tree, including attributes with null values.

Input dictionary keys that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing the same name.

Examples:

>>> from bigtree import Node, add_dict_to_tree_by_name
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_dict = {
...     "a": {"age": 90},
...     "b": {"age": 65},
... }
>>> root = add_dict_to_tree_by_name(root, name_dict)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]

Parameters:

Name	Type	Description	Default
`tree`	`Node`	existing tree	required
`name_attrs`	`Dict[str, Dict[str, Any]]`	dictionary containing node name and attribute information, key: node name, value: dict of node attribute name and attribute value	required

Returns:

Type	Description
`Node`	(Node)

add_dataframe_to_tree_by_path

add_dataframe_to_tree_by_path(
    tree,
    data,
    path_col="",
    attribute_cols=[],
    sep="/",
    duplicate_name_allowed=True,
)

Add nodes and attributes to tree in-place, return root of tree. Adds to existing tree from pandas DataFrame.

Only attributes in attribute_cols with non-null values will be added to the tree.

path_col and attribute_cols specify columns for node path and attributes to add to existing tree. If columns are not specified, path_col takes first column and all other columns are attribute_cols

Path in path column should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").
Path separator sep is for the input path and can differ from existing tree.

Path in path column can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_path, Node
>>> root = Node("a")
>>> path_data = pd.DataFrame([
...     ["a", 90],
...     ["a/b", 65],
...     ["a/c", 60],
...     ["a/b/d", 40],
...     ["a/b/e", 35],
...     ["a/c/f", 38],
...     ["a/b/e/g", 10],
...     ["a/b/e/h", 6],
... ],
...     columns=["PATH", "age"]
... )
>>> root = add_dataframe_to_tree_by_path(root, path_data)
>>> root.show(attr_list=["age"])
a [age=90]
├── b [age=65]
│   ├── d [age=40]
│   └── e [age=35]
│       ├── g [age=10]
│       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name	Type	Description	Default
`tree`	`Node`	existing tree	required
`data`	`DataFrame`	data containing node path and attribute information	required
`path_col`	`str`	column of data containing `path_name` information, if not set, it will take the first column of data	`''`
`attribute_cols`	`List[str]`	columns of data containing node attribute information, if not set, it will take all columns of data except `path_col`	`[]`
`sep`	`str`	path separator for input `path_col`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`

Returns:

Type	Description
`Node`	(Node)

add_dataframe_to_tree_by_name

add_dataframe_to_tree_by_name(
    tree, data, name_col="", attribute_cols=[]
)

Add attributes to existing tree in-place. Adds to existing tree from pandas DataFrame.

Only attributes in attribute_cols with non-null values will be added to the tree.

name_col and attribute_cols specify columns for node name and attributes to add to existing tree. If columns are not specified, the first column will be taken as name column and all other columns as attributes.

Input data node names that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing same name.

Examples:

>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_name, Node
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_data = pd.DataFrame([
...     ["a", 90],
...     ["b", 65],
... ],
...     columns=["NAME", "age"]
... )
>>> root = add_dataframe_to_tree_by_name(root, name_data)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]

Parameters:

Name	Type	Description	Default
`tree`	`Node`	existing tree	required
`data`	`DataFrame`	data containing node name and attribute information	required
`name_col`	`str`	column of data containing `name` information, if not set, it will take the first column of data	`''`
`attribute_cols`	`List[str]`	column(s) of data containing node attribute information, if not set, it will take all columns of data except `path_col`	`[]`

Returns:

Type	Description
`Node`	(Node)

str_to_tree

str_to_tree(
    tree_string, tree_prefix_list=[], node_type=Node
)

Construct tree from tree string

Examples:

>>> from bigtree import str_to_tree
>>> tree_str = 'a\n├── b\n│   ├── d\n│   └── e\n│       ├── g\n│       └── h\n└── c\n    └── f'
>>> root = str_to_tree(tree_str, tree_prefix_list=["├──", "└──"])
>>> root.show()
a
├── b
│   ├── d
│   └── e
│       ├── g
│       └── h
└── c
    └── f

Parameters:

Name	Type	Description	Default
`tree_string`	`str`	String to construct tree	required
`tree_prefix_list`	`List[str]`	List of prefix to mark the end of tree branch/stem and start of node name, optional. If not specified, it will infer unicode characters and whitespace as prefix.	`[]`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

list_to_tree

list_to_tree(
    paths,
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from list of path strings.

Path should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").

Path can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> from bigtree import list_to_tree
>>> path_list = ["a/b", "a/c", "a/b/d", "a/b/e", "a/c/f", "a/b/e/g", "a/b/e/h"]
>>> root = list_to_tree(path_list)
>>> root.show()
a
├── b
│   ├── d
│   └── e
│       ├── g
│       └── h
└── c
    └── f

Parameters:

Name	Type	Description	Default
`paths`	`List[str]`	list containing path strings	required
`sep`	`str`	path separator for input `paths` and created tree, defaults to `/`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

list_to_tree_by_relation

list_to_tree_by_relation(
    relations, allow_duplicates=False, node_type=Node
)

Construct tree from list of tuple containing parent-child names.

Root node is inferred when parent is empty, or when name appears as parent but not as child.

Since tree is created from parent-child names, only names of leaf nodes may be repeated. Error will be thrown if names of intermediate nodes are repeated as there will be confusion. This error can be ignored by setting allow_duplicates to be True.

Examples:

>>> from bigtree import list_to_tree_by_relation
>>> relations_list = [("a", "b"), ("a", "c"), ("b", "d"), ("b", "e"), ("c", "f"), ("e", "g"), ("e", "h")]
>>> root = list_to_tree_by_relation(relations_list)
>>> root.show()
a
├── b
│   ├── d
│   └── e
│       ├── g
│       └── h
└── c
    └── f

Parameters:

Name	Type	Description	Default
`relations`	`List[Tuple[str, str]]`	list containing tuple containing parent-child names	required
`allow_duplicates`	`bool`	allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False	`False`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

dict_to_tree

dict_to_tree(
    path_attrs,
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from nested dictionary using path, key: path, value: dict of attribute name and attribute value.

Path should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").

Path can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

All attributes in path_attrs will be added to the tree, including attributes with null values.

Examples:

>>> from bigtree import dict_to_tree
>>> path_dict = {
...     "a": {"age": 90},
...     "a/b": {"age": 65},
...     "a/c": {"age": 60},
...     "a/b/d": {"age": 40},
...     "a/b/e": {"age": 35},
...     "a/c/f": {"age": 38},
...     "a/b/e/g": {"age": 10},
...     "a/b/e/h": {"age": 6},
... }
>>> root = dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
├── b [age=65]
│   ├── d [age=40]
│   └── e [age=35]
│       ├── g [age=10]
│       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name	Type	Description	Default
`path_attrs`	`Dict[str, Any]`	dictionary containing path and node attribute information, key: path, value: dict of tree attribute and attribute value	required
`sep`	`str`	path separator of input `path_attrs` and created tree, defaults to `/`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

nested_dict_to_tree

nested_dict_to_tree(
    node_attrs,
    name_key="name",
    child_key="children",
    node_type=Node,
)

Construct tree from nested recursive dictionary.

key: name_key, child_key, or any attributes key.
value of name_key (str): node name.
value of child_key (List[Dict[str, Any]]): list of dict containing name_key and child_key (recursive).

Examples:

>>> from bigtree import nested_dict_to_tree
>>> path_dict = {
...     "name": "a",
...     "age": 90,
...     "children": [
...         {"name": "b",
...          "age": 65,
...          "children": [
...              {"name": "d", "age": 40},
...              {"name": "e", "age": 35, "children": [
...                  {"name": "g", "age": 10},
...              ]},
...          ]},
...     ],
... }
>>> root = nested_dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]
    ├── d [age=40]
    └── e [age=35]
        └── g [age=10]

Parameters:

Name	Type	Description	Default
`node_attrs`	`Dict[str, Any]`	dictionary containing node, children, and node attribute information, key: `name_key` and `child_key` value of `name_key` (str): node name value of `child_key` (List[Dict[str, Any]]): list of dict containing `name_key` and `child_key` (recursive)	required
`name_key`	`str`	key of node name, value is type str	`'name'`
`child_key`	`str`	key of child list, value is type list	`'children'`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

dataframe_to_tree

dataframe_to_tree(
    data,
    path_col="",
    attribute_cols=[],
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from pandas DataFrame using path, return root of tree.

path_col and attribute_cols specify columns for node path and attributes to construct tree. If columns are not specified, path_col takes first column and all other columns are attribute_cols.

Only attributes in attribute_cols with non-null values will be added to the tree.

Path in path column can start from root node name, or start with sep.

For example: Path string can be "/a/b" or "a/b", if sep is "/".

Path in path column should contain Node name, separated by sep.

For example: Path string "a/b" refers to Node("b") with parent Node("a").

All paths should start from the same root node.

For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> import pandas as pd
>>> from bigtree import dataframe_to_tree
>>> path_data = pd.DataFrame([
...     ["a", 90],
...     ["a/b", 65],
...     ["a/c", 60],
...     ["a/b/d", 40],
...     ["a/b/e", 35],
...     ["a/c/f", 38],
...     ["a/b/e/g", 10],
...     ["a/b/e/h", 6],
... ],
...     columns=["PATH", "age"]
... )
>>> root = dataframe_to_tree(path_data)
>>> root.show(attr_list=["age"])
a [age=90]
├── b [age=65]
│   ├── d [age=40]
│   └── e [age=35]
│       ├── g [age=10]
│       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	data containing path and node attribute information	required
`path_col`	`str`	column of data containing `path_name` information, if not set, it will take the first column of data	`''`
`attribute_cols`	`List[str]`	columns of data containing node attribute information, if not set, it will take all columns of data except `path_col`	`[]`
`sep`	`str`	path separator of input `path_col` and created tree, defaults to `/`	`'/'`
`duplicate_name_allowed`	`bool`	indicator if nodes with duplicate `Node` name is allowed, defaults to True	`True`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

dataframe_to_tree_by_relation

dataframe_to_tree_by_relation(
    data,
    child_col="",
    parent_col="",
    attribute_cols=[],
    allow_duplicates=False,
    node_type=Node,
)

Construct tree from pandas DataFrame using parent and child names, return root of tree.

Root node is inferred when parent name is empty, or when name appears in parent column but not in child column.

Since tree is created from parent-child names, only names of leaf nodes may be repeated. Error will be thrown if names of intermediate nodes are repeated as there will be confusion. This error can be ignored by setting allow_duplicates to be True.

child_col and parent_col specify columns for child name and parent name to construct tree. attribute_cols specify columns for node attribute for child name. If columns are not specified, child_col takes first column, parent_col takes second column, and all other columns are attribute_cols.

Only attributes in attribute_cols with non-null values will be added to the tree.

Examples:

>>> import pandas as pd
>>> from bigtree import dataframe_to_tree_by_relation
>>> relation_data = pd.DataFrame([
...     ["a", None, 90],
...     ["b", "a", 65],
...     ["c", "a", 60],
...     ["d", "b", 40],
...     ["e", "b", 35],
...     ["f", "c", 38],
...     ["g", "e", 10],
...     ["h", "e", 6],
... ],
...     columns=["child", "parent", "age"]
... )
>>> root = dataframe_to_tree_by_relation(relation_data)
>>> root.show(attr_list=["age"])
a [age=90]
├── b [age=65]
│   ├── d [age=40]
│   └── e [age=35]
│       ├── g [age=10]
│       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	data containing path and node attribute information	required
`child_col`	`str`	column of data containing child name information, defaults to None if not set, it will take the first column of data	`''`
`parent_col`	`str`	column of data containing parent name information, defaults to None if not set, it will take the second column of data	`''`
`attribute_cols`	`List[str]`	columns of data containing node attribute information, if not set, it will take all columns of data except `child_col` and `parent_col`	`[]`
`allow_duplicates`	`bool`	allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False	`False`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)

newick_to_tree

newick_to_tree(
    tree_string,
    length_attr="length",
    attr_prefix="&&NHX:",
    node_type=Node,
)

Construct tree from Newick notation, return root of tree.

In the Newick Notation (or New Hampshire Notation)

Tree is represented in round brackets i.e., (child1,child2,child3)parent.
If there are nested tree, they will be in nested round brackets i.e., ((grandchild1)child1,(grandchild2,grandchild3)child2)parent.
If there is length attribute, they will be beside the name i.e., (child1:0.5,child2:0.1)parent.
If there are other attributes, attributes are represented in square brackets i.e., (child1:0.5[S:human],child2:0.1[S:human])parent[S:parent].

Variations supported

Support special characters ([, ], (, ), :, ,) in node name, attribute name, and attribute values if they are enclosed in single quotes i.e., '(name:!)'.
If there are no node names, it will be auto-filled with convention nodeN with N representing a number.

Examples:

>>> from bigtree import newick_to_tree
>>> root = newick_to_tree("((d,e)b,c)a")
>>> root.show()
a
├── b
│   ├── d
│   └── e
└── c

>>> root = newick_to_tree("((d:40,e:35)b:65,c:60)a", length_attr="age")
>>> root.show(attr_list=["age"])
a
├── b [age=65]
│   ├── d [age=40]
│   └── e [age=35]
└── c [age=60]

>>> root = newick_to_tree(
...     "((d:40[&&NHX:species=human],e:35[&&NHX:species=human])b:65[&&NHX:species=human],c:60[&&NHX:species=human])a[&&NHX:species=human]",
...     length_attr="age",
... )
>>> root.show(all_attrs=True)
a [species=human]
├── b [age=65, species=human]
│   ├── d [age=40, species=human]
│   └── e [age=35, species=human]
└── c [age=60, species=human]

Parameters:

Name	Type	Description	Default
`tree_string`	`str`	Newick notation to construct tree	required
`length_attr`	`str`	attribute name to store node length, optional, defaults to 'length'	`'length'`
`attr_prefix`	`str`	prefix before all attributes, within square bracket, used to detect attributes, defaults to "&&NHX:"	`'&&NHX:'`
`node_type`	`Type[Node]`	node type of tree to be created, defaults to `Node`	`Node`

Returns:

Type	Description
`Node`	(Node)