β¨ Construct
Tree Construct Methods
Construct Tree from list, dictionary, and pandas DataFrame.
To decide which method to use, consider your data type and data values.
Construct tree from | Using full path | Using parent-child relation | Using notation | Add node attributes |
---|---|---|---|---|
String | str_to_tree |
NA | newick_to_tree |
No (for str_to_tree )Yes (for newick_to_tree ) |
List | list_to_tree |
list_to_tree_by_relation |
NA | No |
Dictionary | dict_to_tree |
nested_dict_to_tree |
NA | Yes |
DataFrame | dataframe_to_tree |
dataframe_to_tree_by_relation |
NA | Yes |
Tree Add Attributes Methods
To add attributes to an existing tree,
Add attributes from | Using full path | Using node name |
---|---|---|
String | add_path_to_tree |
NA |
Dictionary | add_dict_to_tree_by_path |
add_dict_to_tree_by_name |
DataFrame | add_dataframe_to_tree_by_path |
add_dataframe_to_tree_by_name |
Note
If attributes are added to existing tree using full path, paths that previously did not exist will be added.
If attributes are added to existing tree using node name, names that previously did not exist will not be created.
These functions are not standalone functions. Under the hood, they have the following dependency,
bigtree.tree.construct
add_path_to_tree
Add nodes and attributes to existing tree in-place, return node of path added. Adds to existing tree from list of path strings.
Path should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
- Path separator
sep
is for the inputpath
and can differ from existing tree.
Path can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc., and should not start with another root node.
All attributes in node_attrs
will be added to the tree, including attributes with null values.
Examples:
>>> from bigtree import add_path_to_tree, Node
>>> root = Node("a")
>>> add_path_to_tree(root, "a/b/c")
Node(/a/b/c, )
>>> root.show()
a
βββ b
βββ c
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree |
Node
|
existing tree |
required |
path |
str
|
path to be added to tree |
required |
sep |
str
|
path separator for input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
node_attrs |
Dict[str, Any]
|
attributes to add to node, key: attribute name, value: attribute value, optional |
{}
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
add_dict_to_tree_by_path
Add nodes and attributes to tree in-place, return root of tree.
Adds to existing tree from nested dictionary, key
: path, value
: dict of attribute name and attribute value.
All attributes in path_attrs
will be added to the tree, including attributes with null values.
Path should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
- Path separator
sep
is for the inputpath
and can differ from existing tree.
Path can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.
Examples:
>>> from bigtree import Node, add_dict_to_tree_by_path
>>> root = Node("a")
>>> path_dict = {
... "a": {"age": 90},
... "a/b": {"age": 65},
... "a/c": {"age": 60},
... "a/b/d": {"age": 40},
... "a/b/e": {"age": 35},
... "a/c/f": {"age": 38},
... "a/b/e/g": {"age": 10},
... "a/b/e/h": {"age": 6},
... }
>>> root = add_dict_to_tree_by_path(root, path_dict)
>>> root.show()
a
βββ b
β βββ d
β βββ e
β βββ g
β βββ h
βββ c
βββ f
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree |
Node
|
existing tree |
required |
path_attrs |
Dict[str, Dict[str, Any]]
|
dictionary containing node path and attribute information, key: node path, value: dict of node attribute name and attribute value |
required |
sep |
str
|
path separator for input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
add_dict_to_tree_by_name
Add attributes to existing tree in-place.
Adds to existing tree from nested dictionary, key
: name, value
: dict of attribute name and attribute value.
All attributes in name_attrs
will be added to the tree, including attributes with null values.
Input dictionary keys that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing the same name.
Examples:
>>> from bigtree import Node, add_dict_to_tree_by_name
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_dict = {
... "a": {"age": 90},
... "b": {"age": 65},
... }
>>> root = add_dict_to_tree_by_name(root, name_dict)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree |
Node
|
existing tree |
required |
name_attrs |
Dict[str, Dict[str, Any]]
|
dictionary containing node name and attribute information, key: node name, value: dict of node attribute name and attribute value |
required |
Returns:
Type | Description |
---|---|
Node
|
(Node) |
add_dataframe_to_tree_by_path
add_dataframe_to_tree_by_path(
tree,
data,
path_col="",
attribute_cols=[],
sep="/",
duplicate_name_allowed=True,
)
Add nodes and attributes to tree in-place, return root of tree. Adds to existing tree from pandas DataFrame.
Only attributes in attribute_cols
with non-null values will be added to the tree.
path_col
and attribute_cols
specify columns for node path and attributes to add to existing tree.
If columns are not specified, path_col
takes first column and all other columns are attribute_cols
Path in path column should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
- Path separator
sep
is for the inputpath
and can differ from existing tree.
Path in path column can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.
Examples:
>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_path, Node
>>> root = Node("a")
>>> path_data = pd.DataFrame([
... ["a", 90],
... ["a/b", 65],
... ["a/c", 60],
... ["a/b/d", 40],
... ["a/b/e", 35],
... ["a/c/f", 38],
... ["a/b/e/g", 10],
... ["a/b/e/h", 6],
... ],
... columns=["PATH", "age"]
... )
>>> root = add_dataframe_to_tree_by_path(root, path_data)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
β βββ d [age=40]
β βββ e [age=35]
β βββ g [age=10]
β βββ h [age=6]
βββ c [age=60]
βββ f [age=38]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree |
Node
|
existing tree |
required |
data |
DataFrame
|
data containing node path and attribute information |
required |
path_col |
str
|
column of data containing |
''
|
attribute_cols |
List[str]
|
columns of data containing node attribute information,
if not set, it will take all columns of data except |
[]
|
sep |
str
|
path separator for input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
add_dataframe_to_tree_by_name
Add attributes to existing tree in-place. Adds to existing tree from pandas DataFrame.
Only attributes in attribute_cols
with non-null values will be added to the tree.
name_col
and attribute_cols
specify columns for node name and attributes to add to existing tree.
If columns are not specified, the first column will be taken as name column and all other columns as attributes.
Input data node names that are not existing node names will be ignored. Note that if multiple nodes have the same name, attributes will be added to all nodes sharing same name.
Examples:
>>> import pandas as pd
>>> from bigtree import add_dataframe_to_tree_by_name, Node
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> name_data = pd.DataFrame([
... ["a", 90],
... ["b", 65],
... ],
... columns=["NAME", "age"]
... )
>>> root = add_dataframe_to_tree_by_name(root, name_data)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree |
Node
|
existing tree |
required |
data |
DataFrame
|
data containing node name and attribute information |
required |
name_col |
str
|
column of data containing |
''
|
attribute_cols |
List[str]
|
column(s) of data containing node attribute information,
if not set, it will take all columns of data except |
[]
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
str_to_tree
Construct tree from tree string
Examples:
>>> from bigtree import str_to_tree
>>> tree_str = 'a\nβββ b\nβ βββ d\nβ βββ e\nβ βββ g\nβ βββ h\nβββ c\n βββ f'
>>> root = str_to_tree(tree_str, tree_prefix_list=["βββ", "βββ"])
>>> root.show()
a
βββ b
β βββ d
β βββ e
β βββ g
β βββ h
βββ c
βββ f
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_string |
str
|
String to construct tree |
required |
tree_prefix_list |
List[str]
|
List of prefix to mark the end of tree branch/stem and start of node name, optional. If not specified, it will infer unicode characters and whitespace as prefix. |
[]
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
list_to_tree
Construct tree from list of path strings.
Path should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
Path can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.
Examples:
>>> from bigtree import list_to_tree
>>> path_list = ["a/b", "a/c", "a/b/d", "a/b/e", "a/c/f", "a/b/e/g", "a/b/e/h"]
>>> root = list_to_tree(path_list)
>>> root.show()
a
βββ b
β βββ d
β βββ e
β βββ g
β βββ h
βββ c
βββ f
Parameters:
Name | Type | Description | Default |
---|---|---|---|
paths |
List[str]
|
list containing path strings |
required |
sep |
str
|
path separator for input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
list_to_tree_by_relation
Construct tree from list of tuple containing parent-child names.
Root node is inferred when parent is empty, or when name appears as parent but not as child.
Since tree is created from parent-child names, only names of leaf nodes may be repeated.
Error will be thrown if names of intermediate nodes are repeated as there will be confusion.
This error can be ignored by setting allow_duplicates
to be True.
Examples:
>>> from bigtree import list_to_tree_by_relation
>>> relations_list = [("a", "b"), ("a", "c"), ("b", "d"), ("b", "e"), ("c", "f"), ("e", "g"), ("e", "h")]
>>> root = list_to_tree_by_relation(relations_list)
>>> root.show()
a
βββ b
β βββ d
β βββ e
β βββ g
β βββ h
βββ c
βββ f
Parameters:
Name | Type | Description | Default |
---|---|---|---|
relations |
List[Tuple[str, str]]
|
list containing tuple containing parent-child names |
required |
allow_duplicates |
bool
|
allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False |
False
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
dict_to_tree
Construct tree from nested dictionary using path,
key
: path, value
: dict of attribute name and attribute value.
Path should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
Path can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.
All attributes in path_attrs
will be added to the tree, including attributes with null values.
Examples:
>>> from bigtree import dict_to_tree
>>> path_dict = {
... "a": {"age": 90},
... "a/b": {"age": 65},
... "a/c": {"age": 60},
... "a/b/d": {"age": 40},
... "a/b/e": {"age": 35},
... "a/c/f": {"age": 38},
... "a/b/e/g": {"age": 10},
... "a/b/e/h": {"age": 6},
... }
>>> root = dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
β βββ d [age=40]
β βββ e [age=35]
β βββ g [age=10]
β βββ h [age=6]
βββ c [age=60]
βββ f [age=38]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_attrs |
Dict[str, Any]
|
dictionary containing path and node attribute information, key: path, value: dict of tree attribute and attribute value |
required |
sep |
str
|
path separator of input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
nested_dict_to_tree
Construct tree from nested recursive dictionary.
key
:name_key
,child_key
, or any attributes key.value
ofname_key
(str): node name.value
ofchild_key
(List[Dict[str, Any]]): list of dict containingname_key
andchild_key
(recursive).
Examples:
>>> from bigtree import nested_dict_to_tree
>>> path_dict = {
... "name": "a",
... "age": 90,
... "children": [
... {"name": "b",
... "age": 65,
... "children": [
... {"name": "d", "age": 40},
... {"name": "e", "age": 35, "children": [
... {"name": "g", "age": 10},
... ]},
... ]},
... ],
... }
>>> root = nested_dict_to_tree(path_dict)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
βββ d [age=40]
βββ e [age=35]
βββ g [age=10]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_attrs |
Dict[str, Any]
|
dictionary containing node, children, and node attribute information,
key: |
required |
name_key |
str
|
key of node name, value is type str |
'name'
|
child_key |
str
|
key of child list, value is type list |
'children'
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
dataframe_to_tree
dataframe_to_tree(
data,
path_col="",
attribute_cols=[],
sep="/",
duplicate_name_allowed=True,
node_type=Node,
)
Construct tree from pandas DataFrame using path, return root of tree.
path_col
and attribute_cols
specify columns for node path and attributes to construct tree.
If columns are not specified, path_col
takes first column and all other columns are attribute_cols
.
Only attributes in attribute_cols
with non-null values will be added to the tree.
Path in path column can start from root node name
, or start with sep
.
- For example: Path string can be "/a/b" or "a/b", if sep is "/".
Path in path column should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a").
All paths should start from the same root node.
- For example: Path strings should be "a/b", "a/c", "a/b/d" etc. and should not start with another root node.
Examples:
>>> import pandas as pd
>>> from bigtree import dataframe_to_tree
>>> path_data = pd.DataFrame([
... ["a", 90],
... ["a/b", 65],
... ["a/c", 60],
... ["a/b/d", 40],
... ["a/b/e", 35],
... ["a/c/f", 38],
... ["a/b/e/g", 10],
... ["a/b/e/h", 6],
... ],
... columns=["PATH", "age"]
... )
>>> root = dataframe_to_tree(path_data)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
β βββ d [age=40]
β βββ e [age=35]
β βββ g [age=10]
β βββ h [age=6]
βββ c [age=60]
βββ f [age=38]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
data containing path and node attribute information |
required |
path_col |
str
|
column of data containing |
''
|
attribute_cols |
List[str]
|
columns of data containing node attribute information,
if not set, it will take all columns of data except |
[]
|
sep |
str
|
path separator of input |
'/'
|
duplicate_name_allowed |
bool
|
indicator if nodes with duplicate |
True
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
dataframe_to_tree_by_relation
dataframe_to_tree_by_relation(
data,
child_col="",
parent_col="",
attribute_cols=[],
allow_duplicates=False,
node_type=Node,
)
Construct tree from pandas DataFrame using parent and child names, return root of tree.
Root node is inferred when parent name is empty, or when name appears in parent column but not in child column.
Since tree is created from parent-child names, only names of leaf nodes may be repeated.
Error will be thrown if names of intermediate nodes are repeated as there will be confusion.
This error can be ignored by setting allow_duplicates
to be True.
child_col
and parent_col
specify columns for child name and parent name to construct tree.
attribute_cols
specify columns for node attribute for child name.
If columns are not specified, child_col
takes first column, parent_col
takes second column, and all other
columns are attribute_cols
.
Only attributes in attribute_cols
with non-null values will be added to the tree.
Examples:
>>> import pandas as pd
>>> from bigtree import dataframe_to_tree_by_relation
>>> relation_data = pd.DataFrame([
... ["a", None, 90],
... ["b", "a", 65],
... ["c", "a", 60],
... ["d", "b", 40],
... ["e", "b", 35],
... ["f", "c", 38],
... ["g", "e", 10],
... ["h", "e", 6],
... ],
... columns=["child", "parent", "age"]
... )
>>> root = dataframe_to_tree_by_relation(relation_data)
>>> root.show(attr_list=["age"])
a [age=90]
βββ b [age=65]
β βββ d [age=40]
β βββ e [age=35]
β βββ g [age=10]
β βββ h [age=6]
βββ c [age=60]
βββ f [age=38]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
data containing path and node attribute information |
required |
child_col |
str
|
column of data containing child name information, defaults to None if not set, it will take the first column of data |
''
|
parent_col |
str
|
column of data containing parent name information, defaults to None if not set, it will take the second column of data |
''
|
attribute_cols |
List[str]
|
columns of data containing node attribute information,
if not set, it will take all columns of data except |
[]
|
allow_duplicates |
bool
|
allow duplicate intermediate nodes such that child node will be tagged to multiple parent nodes, defaults to False |
False
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |
newick_to_tree
Construct tree from Newick notation, return root of tree.
In the Newick Notation (or New Hampshire Notation)
- Tree is represented in round brackets i.e.,
(child1,child2,child3)parent
. - If there are nested tree, they will be in nested round brackets i.e.,
((grandchild1)child1,(grandchild2,grandchild3)child2)parent
. - If there is length attribute, they will be beside the name i.e.,
(child1:0.5,child2:0.1)parent
. - If there are other attributes, attributes are represented in square brackets i.e.,
(child1:0.5[S:human],child2:0.1[S:human])parent[S:parent]
.
Variations supported
- Support special characters (
[
,]
,(
,)
,:
,,
) in node name, attribute name, and attribute values if they are enclosed in single quotes i.e., '(name:!)'. - If there are no node names, it will be auto-filled with convention
nodeN
with N representing a number.
Examples:
>>> from bigtree import newick_to_tree
>>> root = newick_to_tree("((d,e)b,c)a")
>>> root.show()
a
βββ b
β βββ d
β βββ e
βββ c
>>> root = newick_to_tree("((d:40,e:35)b:65,c:60)a", length_attr="age")
>>> root.show(attr_list=["age"])
a
βββ b [age=65]
β βββ d [age=40]
β βββ e [age=35]
βββ c [age=60]
>>> root = newick_to_tree(
... "((d:40[&&NHX:species=human],e:35[&&NHX:species=human])b:65[&&NHX:species=human],c:60[&&NHX:species=human])a[&&NHX:species=human]",
... length_attr="age",
... )
>>> root.show(all_attrs=True)
a [species=human]
βββ b [age=65, species=human]
β βββ d [age=40, species=human]
β βββ e [age=35, species=human]
βββ c [age=60, species=human]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_string |
str
|
Newick notation to construct tree |
required |
length_attr |
str
|
attribute name to store node length, optional, defaults to 'length' |
'length'
|
attr_prefix |
str
|
prefix before all attributes, within square bracket, used to detect attributes, defaults to "&&NHX:" |
'&&NHX:'
|
node_type |
Type[Node]
|
node type of tree to be created, defaults to |
Node
|
Returns:
Type | Description |
---|---|
Node
|
(Node) |