Skip to content

✨ Construct

Tree Construct Methods

Construct Tree from list, dictionary, and pandas DataFrame.

To decide which method to use, consider your data type and data values.

Construct tree from Using full path Using parent-child relation Using notation Add node attributes
String str_to_tree NA newick_to_tree No (for str_to_tree)
Yes (for newick_to_tree)
List list_to_tree list_to_tree_by_relation NA No
Dictionary dict_to_tree nested_dict_to_tree NA Yes
DataFrame dataframe_to_tree dataframe_to_tree_by_relation NA Yes

Tree Add Attributes Methods

To add attributes to an existing tree,

Add attributes from Using full path Using node name
String add_path_to_tree NA
Dictionary add_dict_to_tree_by_path add_dict_to_tree_by_name
DataFrame add_dataframe_to_tree_by_path add_dataframe_to_tree_by_name

Note

If attributes are added to existing tree using full path, paths that previously did not exist will be added.
If attributes are added to existing tree using node name, names that previously did not exist will not be created.

These functions are not standalone functions. Under the hood, they have the following dependency,

Tree Constructor Dependency Diagram


bigtree.tree.construct

add_path_to_tree

add_path_to_tree(
    tree,
    path,
    sep="/",
    duplicate_name_allowed=True,
    node_attrs={},
)

Add nodes and attributes to existing tree in-place, return node of path added. Adds to existing tree from list of path strings.

Path should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").
  • Path separator sep is for the input path and can differ from existing tree.

Path can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc., and should not start with another root node.

All attributes in node_attrs will be added to the tree, including attributes with null values.

Examples:

>>> from bigtree import add_path_to_tree, Node
>>> root = Node("a")
>>> add_path_to_tree(root, "a/b/c")
Node(/a/b/c, )
>>> root.show()
a
└── b
    └── c

Parameters:

Name Type Description Default
tree Node

existing tree

required
path str

path to be added to tree

required
sep str

path separator for input path

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True
node_attrs Dict[str, Any]

attributes to add to node, key: attribute name, value: attribute value, optional

{}

Returns:

Type Description
Node

(Node)

add_dict_to_tree_by_path

add_dict_to_tree_by_path(
    tree, path_attrs, sep="/", duplicate_name_allowed=True
)

Add nodes and attributes to tree in-place, return root of tree. Adds to existing tree from nested dictionary, key: path, value: dict of attribute name and attribute value.

All attributes in path_attrs will be added to the tree, including attributes with null values.

Path should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").
  • Path separator sep is for the input path and can differ from existing tree.

Path can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> from bigtree import Node, add_dict_to_tree_by_path
>>> root = Node("a")
>>> path_dict = {
...     "a": {"age": 90},
...     "a/b": {"age": 65},
...     "a/c": {"age": 60},
...     "a/b/d": {"age": 40},
...     "a/b/e": {"age": 35},
...     "a/c/f": {"age": 38},
...     "a/b/e/g": {"age": 10},
...     "a/b/e/h": {"age": 6},
... }
>>> root = add_dict_to_tree_by_path(root, path_dict)
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ d
β”‚   └── e
β”‚       β”œβ”€β”€ g
β”‚       └── h
└── c
    └── f

Parameters:

Name Type Description Default
tree Node

existing tree

required
path_attrs Dict[str, Dict[str, Any]]

dictionary containing node path and attribute information, key: node path, value: dict of node attribute name and attribute value

required
sep str

path separator for input path_attrs

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True

Returns:

Type Description
Node

(Node)

add_dict_to_tree_by_name

add_dict_to_tree_by_name(tree, name_attrs)

Add attributes to existing tree in-place. Adds to existing tree from nested dictionary, key: name, value: dict of attribute name and attribute value.

All attributes in name_attrs will be added to the tree, including attributes with null values.

Input dictionary keys that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing the same name.

Examples:

>>> from bigtree import Node, add_dict_to_tree_by_name
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_dict = {
...     "a": {"age": 90},
...     "b": {"age": 65},
... }
>>> root = add_dict_to_tree_by_name(root, name_dict)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]

Parameters:

Name Type Description Default
tree Node

existing tree

required
name_attrs Dict[str, Dict[str, Any]]

dictionary containing node name and attribute information, key: node name, value: dict of node attribute name and attribute value

required

Returns:

Type Description
Node

(Node)

add_dataframe_to_tree_by_path

add_dataframe_to_tree_by_path(
    tree,
    data,
    path_col="",
    attribute_cols=[],
    sep="/",
    duplicate_name_allowed=True,
)

Add nodes and attributes to tree in-place, return root of tree. Adds to existing tree from pandas DataFrame.

Only attributes in attribute_cols with non-null values will be added to the tree.

path_col and attribute_cols specify columns for node path and attributes to add to existing tree. If columns are not specified, path_col takes first column and all other columns are attribute_cols

Path in path column should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").
  • Path separator sep is for the input path and can differ from existing tree.

Path in path column can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_path, Node
>>> root = Node("a")
>>> path_data = pd.DataFrame([
...     ["a", 90],
...     ["a/b", 65],
...     ["a/c", 60],
...     ["a/b/d", 40],
...     ["a/b/e", 35],
...     ["a/c/f", 38],
...     ["a/b/e/g", 10],
...     ["a/b/e/h", 6],
... ],
...     columns=["PATH", "age"]
... )
>>> root = add_dataframe_to_tree_by_path(root, path_data)
>>> root.show(attr_list=["age"])
a [age=90]
β”œβ”€β”€ b [age=65]
β”‚   β”œβ”€β”€ d [age=40]
β”‚   └── e [age=35]
β”‚       β”œβ”€β”€ g [age=10]
β”‚       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name Type Description Default
tree Node

existing tree

required
data DataFrame

data containing node path and attribute information

required
path_col str

column of data containing path_name information, if not set, it will take the first column of data

''
attribute_cols List[str]

columns of data containing node attribute information, if not set, it will take all columns of data except path_col

[]
sep str

path separator for input path_col

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True

Returns:

Type Description
Node

(Node)

add_dataframe_to_tree_by_name

add_dataframe_to_tree_by_name(
    tree, data, name_col="", attribute_cols=[]
)

Add attributes to existing tree in-place. Adds to existing tree from pandas DataFrame.

Only attributes in attribute_cols with non-null values will be added to the tree.

name_col and attribute_cols specify columns for node name and attributes to add to existing tree. If columns are not specified, the first column will be taken as name column and all other columns as attributes.

Input data node names that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing same name.

Examples:

>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_name, Node
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_data = pd.DataFrame([
...     ["a", 90],
...     ["b", 65],
... ],
...     columns=["NAME", "age"]
... )
>>> root = add_dataframe_to_tree_by_name(root, name_data)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]

Parameters:

Name Type Description Default
tree Node

existing tree

required
data DataFrame

data containing node name and attribute information

required
name_col str

column of data containing name information, if not set, it will take the first column of data

''
attribute_cols List[str]

column(s) of data containing node attribute information, if not set, it will take all columns of data except path_col

[]

Returns:

Type Description
Node

(Node)

str_to_tree

str_to_tree(
    tree_string, tree_prefix_list=[], node_type=Node
)

Construct tree from tree string

Examples:

>>> from bigtree import str_to_tree
>>> tree_str = 'a\nβ”œβ”€β”€ b\nβ”‚   β”œβ”€β”€ d\nβ”‚   └── e\nβ”‚       β”œβ”€β”€ g\nβ”‚       └── h\n└── c\n    └── f'
>>> root = str_to_tree(tree_str, tree_prefix_list=["β”œβ”€β”€", "└──"])
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ d
β”‚   └── e
β”‚       β”œβ”€β”€ g
β”‚       └── h
└── c
    └── f

Parameters:

Name Type Description Default
tree_string str

String to construct tree

required
tree_prefix_list List[str]

List of prefix to mark the end of tree branch/stem and start of node name, optional. If not specified, it will infer unicode characters and whitespace as prefix.

[]
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

list_to_tree

list_to_tree(
    paths,
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from list of path strings.

Path should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").

Path can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> from bigtree import list_to_tree
>>> path_list = ["a/b", "a/c", "a/b/d", "a/b/e", "a/c/f", "a/b/e/g", "a/b/e/h"]
>>> root = list_to_tree(path_list)
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ d
β”‚   └── e
β”‚       β”œβ”€β”€ g
β”‚       └── h
└── c
    └── f

Parameters:

Name Type Description Default
paths List[str]

list containing path strings

required
sep str

path separator for input paths and created tree, defaults to /

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

list_to_tree_by_relation

list_to_tree_by_relation(
    relations, allow_duplicates=False, node_type=Node
)

Construct tree from list of tuple containing parent-child names.

Root node is inferred when parent is empty, or when name appears as parent but not as child.

Since tree is created from parent-child names, only names of leaf nodes may be repeated. Error will be thrown if names of intermediate nodes are repeated as there will be confusion. This error can be ignored by setting allow_duplicates to be True.

Examples:

>>> from bigtree import list_to_tree_by_relation
>>> relations_list = [("a", "b"), ("a", "c"), ("b", "d"), ("b", "e"), ("c", "f"), ("e", "g"), ("e", "h")]
>>> root = list_to_tree_by_relation(relations_list)
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ d
β”‚   └── e
β”‚       β”œβ”€β”€ g
β”‚       └── h
└── c
    └── f

Parameters:

Name Type Description Default
relations List[Tuple[str, str]]

list containing tuple containing parent-child names

required
allow_duplicates bool

allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False

False
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

dict_to_tree

dict_to_tree(
    path_attrs,
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from nested dictionary using path, key: path, value: dict of attribute name and attribute value.

Path should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").

Path can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

All attributes in path_attrs will be added to the tree, including attributes with null values.

Examples:

>>> from bigtree import dict_to_tree
>>> path_dict = {
...     "a": {"age": 90},
...     "a/b": {"age": 65},
...     "a/c": {"age": 60},
...     "a/b/d": {"age": 40},
...     "a/b/e": {"age": 35},
...     "a/c/f": {"age": 38},
...     "a/b/e/g": {"age": 10},
...     "a/b/e/h": {"age": 6},
... }
>>> root = dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
β”œβ”€β”€ b [age=65]
β”‚   β”œβ”€β”€ d [age=40]
β”‚   └── e [age=35]
β”‚       β”œβ”€β”€ g [age=10]
β”‚       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name Type Description Default
path_attrs Dict[str, Any]

dictionary containing path and node attribute information, key: path, value: dict of tree attribute and attribute value

required
sep str

path separator of input path_attrs and created tree, defaults to /

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

nested_dict_to_tree

nested_dict_to_tree(
    node_attrs,
    name_key="name",
    child_key="children",
    node_type=Node,
)

Construct tree from nested recursive dictionary.

  • key: name_key, child_key, or any attributes key.
  • value of name_key (str): node name.
  • value of child_key (List[Dict[str, Any]]): list of dict containing name_key and child_key (recursive).

Examples:

>>> from bigtree import nested_dict_to_tree
>>> path_dict = {
...     "name": "a",
...     "age": 90,
...     "children": [
...         {"name": "b",
...          "age": 65,
...          "children": [
...              {"name": "d", "age": 40},
...              {"name": "e", "age": 35, "children": [
...                  {"name": "g", "age": 10},
...              ]},
...          ]},
...     ],
... }
>>> root = nested_dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
└── b [age=65]
    β”œβ”€β”€ d [age=40]
    └── e [age=35]
        └── g [age=10]

Parameters:

Name Type Description Default
node_attrs Dict[str, Any]

dictionary containing node, children, and node attribute information, key: name_key and child_key value of name_key (str): node name value of child_key (List[Dict[str, Any]]): list of dict containing name_key and child_key (recursive)

required
name_key str

key of node name, value is type str

'name'
child_key str

key of child list, value is type list

'children'
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

dataframe_to_tree

dataframe_to_tree(
    data,
    path_col="",
    attribute_cols=[],
    sep="/",
    duplicate_name_allowed=True,
    node_type=Node,
)

Construct tree from pandas DataFrame using path, return root of tree.

path_col and attribute_cols specify columns for node path and attributes to construct tree. If columns are not specified, path_col takes first column and all other columns are attribute_cols.

Only attributes in attribute_cols with non-null values will be added to the tree.

Path in path column can start from root node name, or start with sep.

  • For example: Path string can be "/a/b" or "a/b", if sep is "/".

Path in path column should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a").

All paths should start from the same root node.

  • For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.

Examples:

>>> import pandas as pd
>>> from bigtree import dataframe_to_tree
>>> path_data = pd.DataFrame([
...     ["a", 90],
...     ["a/b", 65],
...     ["a/c", 60],
...     ["a/b/d", 40],
...     ["a/b/e", 35],
...     ["a/c/f", 38],
...     ["a/b/e/g", 10],
...     ["a/b/e/h", 6],
... ],
...     columns=["PATH", "age"]
... )
>>> root = dataframe_to_tree(path_data)
>>> root.show(attr_list=["age"])
a [age=90]
β”œβ”€β”€ b [age=65]
β”‚   β”œβ”€β”€ d [age=40]
β”‚   └── e [age=35]
β”‚       β”œβ”€β”€ g [age=10]
β”‚       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name Type Description Default
data DataFrame

data containing path and node attribute information

required
path_col str

column of data containing path_name information, if not set, it will take the first column of data

''
attribute_cols List[str]

columns of data containing node attribute information, if not set, it will take all columns of data except path_col

[]
sep str

path separator of input path_col and created tree, defaults to /

'/'
duplicate_name_allowed bool

indicator if nodes with duplicate Node name is allowed, defaults to True

True
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

dataframe_to_tree_by_relation

dataframe_to_tree_by_relation(
    data,
    child_col="",
    parent_col="",
    attribute_cols=[],
    allow_duplicates=False,
    node_type=Node,
)

Construct tree from pandas DataFrame using parent and child names, return root of tree.

Root node is inferred when parent name is empty, or when name appears in parent column but not in child column.

Since tree is created from parent-child names, only names of leaf nodes may be repeated. Error will be thrown if names of intermediate nodes are repeated as there will be confusion. This error can be ignored by setting allow_duplicates to be True.

child_col and parent_col specify columns for child name and parent name to construct tree. attribute_cols specify columns for node attribute for child name. If columns are not specified, child_col takes first column, parent_col takes second column, and all other columns are attribute_cols.

Only attributes in attribute_cols with non-null values will be added to the tree.

Examples:

>>> import pandas as pd
>>> from bigtree import dataframe_to_tree_by_relation
>>> relation_data = pd.DataFrame([
...     ["a", None, 90],
...     ["b", "a", 65],
...     ["c", "a", 60],
...     ["d", "b", 40],
...     ["e", "b", 35],
...     ["f", "c", 38],
...     ["g", "e", 10],
...     ["h", "e", 6],
... ],
...     columns=["child", "parent", "age"]
... )
>>> root = dataframe_to_tree_by_relation(relation_data)
>>> root.show(attr_list=["age"])
a [age=90]
β”œβ”€β”€ b [age=65]
β”‚   β”œβ”€β”€ d [age=40]
β”‚   └── e [age=35]
β”‚       β”œβ”€β”€ g [age=10]
β”‚       └── h [age=6]
└── c [age=60]
    └── f [age=38]

Parameters:

Name Type Description Default
data DataFrame

data containing path and node attribute information

required
child_col str

column of data containing child name information, defaults to None if not set, it will take the first column of data

''
parent_col str

column of data containing parent name information, defaults to None if not set, it will take the second column of data

''
attribute_cols List[str]

columns of data containing node attribute information, if not set, it will take all columns of data except child_col and parent_col

[]
allow_duplicates bool

allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False

False
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)

newick_to_tree

newick_to_tree(
    tree_string,
    length_attr="length",
    attr_prefix="&&NHX:",
    node_type=Node,
)

Construct tree from Newick notation, return root of tree.

In the Newick Notation (or New Hampshire Notation)

  • Tree is represented in round brackets i.e., (child1,child2,child3)parent.
  • If there are nested tree, they will be in nested round brackets i.e., ((grandchild1)child1,(grandchild2,grandchild3)child2)parent.
  • If there is length attribute, they will be beside the name i.e., (child1:0.5,child2:0.1)parent.
  • If there are other attributes, attributes are represented in square brackets i.e., (child1:0.5[S:human],child2:0.1[S:human])parent[S:parent].

Variations supported

  • Support special characters ([, ], (, ), :, ,) in node name, attribute name, and attribute values if they are enclosed in single quotes i.e., '(name:!)'.
  • If there are no node names, it will be auto-filled with convention nodeN with N representing a number.

Examples:

>>> from bigtree import newick_to_tree
>>> root = newick_to_tree("((d,e)b,c)a")
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ d
β”‚   └── e
└── c
>>> root = newick_to_tree("((d:40,e:35)b:65,c:60)a", length_attr="age")
>>> root.show(attr_list=["age"])
a
β”œβ”€β”€ b [age=65]
β”‚   β”œβ”€β”€ d [age=40]
β”‚   └── e [age=35]
└── c [age=60]
>>> root = newick_to_tree(
...     "((d:40[&&NHX:species=human],e:35[&&NHX:species=human])b:65[&&NHX:species=human],c:60[&&NHX:species=human])a[&&NHX:species=human]",
...     length_attr="age",
... )
>>> root.show(all_attrs=True)
a [species=human]
β”œβ”€β”€ b [age=65, species=human]
β”‚   β”œβ”€β”€ d [age=40, species=human]
β”‚   └── e [age=35, species=human]
└── c [age=60, species=human]

Parameters:

Name Type Description Default
tree_string str

Newick notation to construct tree

required
length_attr str

attribute name to store node length, optional, defaults to 'length'

'length'
attr_prefix str

prefix before all attributes, within square bracket, used to detect attributes, defaults to "&&NHX:"

'&&NHX:'
node_type Type[Node]

node type of tree to be created, defaults to Node

Node

Returns:

Type Description
Node

(Node)