π§ Helper
Helper functions that can come in handy.
bigtree.tree.helper
clone_tree
Clone tree to another Node
type. If the same type is needed, simply do a tree.copy().
Examples:
>>> from bigtree import BaseNode, Node, clone_tree
>>> root = BaseNode(name="a")
>>> b = BaseNode(name="b", parent=root)
>>> clone_tree(root, Node)
Node(/a, )
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
BaseNode
|
tree to be cloned, must inherit from BaseNode |
required |
node_type
|
Type[BaseNodeT]
|
type of cloned tree |
required |
Returns:
Type | Description |
---|---|
BaseNodeT
|
Cloned tree of another Node type |
get_subtree
Get subtree based on node name or node path, and/or maximum depth of tree. Subtrees are smaller trees with different root. Returns a copy of the tree; does not affect original tree.
Examples:
>>> from bigtree import Node, get_subtree
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> c = Node("c", parent=b)
>>> d = Node("d", parent=b)
>>> e = Node("e", parent=root)
>>> root.show()
a
βββ b
β βββ c
β βββ d
βββ e
Get subtree
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
NodeT
|
existing tree |
required |
node_name_or_path
|
Optional[str]
|
node name or path to get subtree |
None
|
max_depth
|
int
|
maximum depth of subtree, based on |
0
|
Returns:
Type | Description |
---|---|
NodeT
|
Subtree |
prune_tree
Prune tree by path or depth. Pruned trees are smaller trees with same root. Returns a copy of the tree; does not affect original tree.
For pruning by prune_path
,
- All siblings along the prune path will be removed. All descendants will be kept by default
- If
exact=True
, all descendants of prune path will be removed - Prune path can be string (only one path) or a list of strings (multiple paths)
- Prune path name should be unique, can be full path, partial path (trailing part of path), or node name
For pruning by max_depth
,
- All nodes that are beyond
max_depth
will be removed
Path should contain Node
name, separated by sep
.
- For example: Path string "a/b" refers to Node("b") with parent Node("a")
Examples:
>>> from bigtree import Node, prune_tree
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> c = Node("c", parent=b)
>>> d = Node("d", parent=b)
>>> e = Node("e", parent=root)
>>> root.show()
a
βββ b
β βββ c
β βββ d
βββ e
Prune tree
>>> root_pruned = prune_tree(root, "a/b")
>>> root_pruned.show()
a
βββ b
βββ c
βββ d
Prune by exact path
Prune by multiple paths
>>> root_pruned = prune_tree(root, ["a/b/d", "a/e"])
>>> root_pruned.show()
a
βββ b
β βββ d
βββ e
Prune by depth
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
Union[BinaryNodeT, NodeT]
|
existing tree |
required |
prune_path
|
Optional[Union[Iterable[str], str]]
|
prune path(s), all siblings along the prune path(s) will be removed |
None
|
exact
|
bool
|
prune path(s) to be exactly the path and remove descendants |
False
|
sep
|
str
|
path separator of |
'/'
|
max_depth
|
int
|
maximum depth of pruned tree, based on |
0
|
Returns:
Type | Description |
---|---|
Union[BinaryNodeT, NodeT]
|
Pruned tree |
get_tree_diff_dataframe
get_tree_diff_dataframe(
tree,
other_tree,
only_diff=True,
detail=False,
aggregate=False,
attr_list=None,
fallback_sep="/",
name_col="name",
path_col="path",
parent_col="parent",
indicator_col="Exists",
old_suffix="_old",
new_suffix="_new",
suffix_col="suffix",
)
Get difference of tree
to other_tree
, changes are relative to tree
. This function exports both trees to
pandas DataFrame, merge them, and adds new suffix column to indicate the type of differences in both trees.
Comparing tree structure
By default, suffix will be '+' and '-' for the tree differences, and np.nan for the others.
- If
detail=True
, 'added' and 'moved to' will be used instead of '+', and 'removed' and 'moved from' will be used instead of '-' - If
aggregate=True
, suffix will only be indicated at the parent-level. This is useful when a subtree is shifted and we want the differences shown only at the top node
Compare tree attribute
Attributes indicated in attr_list
will be exported in the pandas DataFrame with suffixes representing attributes
from tree
and other_tree
respectively.
Examples:
>>> # Create original tree
>>> from bigtree import Node, get_tree_diff_dataframe, list_to_tree
>>> root = list_to_tree(["Downloads/Pictures/photo1.jpg", "Downloads/file1.doc", "Downloads/Trip/photo2.jpg"])
>>> root.show()
Downloads
βββ Pictures
β βββ photo1.jpg
βββ file1.doc
βββ Trip
βββ photo2.jpg
>>> # Create other tree
>>> root_other = list_to_tree(
... ["Downloads/Pictures/photo1.jpg", "Downloads/Pictures/Trip/photo2.jpg", "Downloads/file1.doc", "Downloads/file2.doc"]
... )
>>> root_other.show()
Downloads
βββ Pictures
β βββ photo1.jpg
β βββ Trip
β βββ photo2.jpg
βββ file1.doc
βββ file2.doc
Comparing tree structure
>>> get_tree_diff_dataframe(root, root_other, detail=True)
path name parent Exists suffix
0 /Downloads Downloads None both NaN
1 /Downloads/Pictures Pictures Downloads both NaN
2 /Downloads/Pictures/Trip Trip Pictures right_only moved to
3 /Downloads/Pictures/Trip/photo2.jpg photo2.jpg Trip right_only moved to
4 /Downloads/Pictures/photo1.jpg photo1.jpg Pictures both NaN
5 /Downloads/Trip Trip Downloads left_only moved from
6 /Downloads/Trip/photo2.jpg photo2.jpg Trip left_only moved from
7 /Downloads/file1.doc file1.doc Downloads both NaN
8 /Downloads/file2.doc file2.doc Downloads right_only added
Note
- tree and other_tree must have the same
sep
symbol, otherwise this will raise ValueError - If the
sep
symbol contains one of+
/-
/~
character, a fallback sep will be used - Node names in tree and other_tree must not contain the
sep
(or fallback sep) symbol
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
Node
|
tree to be compared against |
required |
other_tree
|
Node
|
tree to be compared with |
required |
only_diff
|
bool
|
if aggregate and only_diff are True, child nodes that are moved from tree will be removed |
True
|
detail
|
bool
|
by default, suffix column will display "+" and "-". If detail is True, suffix column will be more detailed, displaying "moved from" / "moved to" / "added" / "removed" instead |
False
|
aggregate
|
bool
|
by default, all nodes that are different will have suffix specified. If aggregate is True, only parent-level node have suffixes and nodes that have different paths but same parent will not have suffix |
False
|
attr_list
|
Optional[List[str]]
|
tree attributes to retrieve from tree and other_tree |
None
|
fallback_sep
|
str
|
sep to fall back to if tree and other_tree has sep that clashes with symbols "+" / "-" / "~". All node names in tree and other_tree should not contain this fallback_sep |
'/'
|
name_col
|
str
|
name column of return dataframe, indicates the name of node |
'name'
|
path_col
|
str
|
path column of return dataframe, indicates the full path of node |
'path'
|
parent_col
|
str
|
parent column of return dataframe, indicates the parent name of node |
'parent'
|
indicator_col
|
str
|
indicator column of return dataframe, indicates whether node appears in left_only, right_only or both tree |
'Exists'
|
old_suffix
|
str
|
suffix given to attributes from tree of return dataframe, relevant if attr_list is specified |
'_old'
|
new_suffix
|
str
|
suffix given to attributes from other_tree of return dataframe, relevant if attr_list is specified |
'_new'
|
suffix_col
|
str
|
suffix column of return dataframe, indicates the type of diff whether it is added, removed, or moved |
'suffix'
|
Returns:
Type | Description |
---|---|
DataFrame
|
Dataframe of tree differences |
get_tree_diff
get_tree_diff(
tree,
other_tree,
only_diff=True,
detail=False,
aggregate=False,
attr_list=None,
fallback_sep="/",
)
Get difference of tree
to other_tree
, changes are relative to tree
.
Compares the difference in tree structure (default), but can also compare tree attributes using attr_list
.
Function can return only the differences (default), or all original tree nodes and differences.
Comparing tree structure
- (+) and (-) will be added to node name relative to
tree
- For example: (+) refers to nodes that are in
other_tree
but nottree
- For example: (-) refers to nodes that are in
tree
but notother_tree
If detail=True
, (added) and (moved to) will be used instead of (+), (removed) and (moved from) will be used
instead of (-).
If aggregate=True
, differences (+)/(added)/(moved to) and (-)/(removed)/(moved from) will only be indicated at
the parent-level. This is useful when a subtree is shifted, and we want the differences shown only at the top node.
Examples:
>>> # Create original tree
>>> from bigtree import Node, get_tree_diff, list_to_tree
>>> root = list_to_tree(["Downloads/Pictures/photo1.jpg", "Downloads/file1.doc", "Downloads/Trip/photo2.jpg"])
>>> root.show()
Downloads
βββ Pictures
β βββ photo1.jpg
βββ file1.doc
βββ Trip
βββ photo2.jpg
>>> # Create other tree
>>> root_other = list_to_tree(
... ["Downloads/Pictures/photo1.jpg", "Downloads/Pictures/Trip/photo2.jpg", "Downloads/file1.doc", "Downloads/file2.doc"]
... )
>>> root_other.show()
Downloads
βββ Pictures
β βββ photo1.jpg
β βββ Trip
β βββ photo2.jpg
βββ file1.doc
βββ file2.doc
Comparing tree structure
>>> tree_diff = get_tree_diff(root, root_other)
>>> tree_diff.show()
Downloads
βββ Pictures
β βββ Trip (+)
β βββ photo2.jpg (+)
βββ Trip (-)
β βββ photo2.jpg (-)
βββ file2.doc (+)
All differences
>>> tree_diff = get_tree_diff(root, root_other, only_diff=False)
>>> tree_diff.show()
Downloads
βββ Pictures
β βββ Trip (+)
β β βββ photo2.jpg (+)
β βββ photo1.jpg
βββ Trip (-)
β βββ photo2.jpg (-)
βββ file1.doc
βββ file2.doc (+)
All differences with details
>>> tree_diff = get_tree_diff(
... root, root_other, only_diff=False, detail=True
... )
>>> tree_diff.show()
Downloads
βββ Pictures
β βββ Trip (moved to)
β β βββ photo2.jpg (moved to)
β βββ photo1.jpg
βββ Trip (moved from)
β βββ photo2.jpg (moved from)
βββ file1.doc
βββ file2.doc (added)
All differences with details on aggregated level
>>> tree_diff = get_tree_diff(
... root, root_other, only_diff=False, detail=True, aggregate=True
... )
>>> tree_diff.show()
Downloads
βββ Pictures
β βββ Trip (moved to)
β β βββ photo2.jpg
β βββ photo1.jpg
βββ Trip (moved from)
β βββ photo2.jpg
βββ file1.doc
βββ file2.doc (added)
Only differences with details on aggregated level
>>> tree_diff = get_tree_diff(root, root_other, detail=True, aggregate=True)
>>> tree_diff.show()
Downloads
βββ Pictures
β βββ Trip (moved to)
β βββ photo2.jpg
βββ Trip (moved from)
βββ file2.doc (added)
Comparing tree attribute
- (~) will be added to node name if there are differences in tree attributes defined in
attr_list
- The node's attributes will be a list of [value in
tree
, value inother_tree
]
>>> # Create original tree
>>> root = Node("Downloads")
>>> picture_folder = Node("Pictures", parent=root)
>>> photo2 = Node("photo1.jpg", tags="photo1", parent=picture_folder)
>>> file1 = Node("file1.doc", tags="file1", parent=root)
>>> root.show(attr_list=["tags"])
Downloads
βββ Pictures
β βββ photo1.jpg [tags=photo1]
βββ file1.doc [tags=file1]
>>> # Create other tree
>>> root_other = Node("Downloads")
>>> picture_folder = Node("Pictures", parent=root_other)
>>> photo1 = Node("photo1.jpg", tags="photo1-edited", parent=picture_folder)
>>> photo2 = Node("photo2.jpg", tags="photo2-new", parent=picture_folder)
>>> file1 = Node("file1.doc", tags="file1", parent=root_other)
>>> root_other.show(attr_list=["tags"])
Downloads
βββ Pictures
β βββ photo1.jpg [tags=photo1-edited]
β βββ photo2.jpg [tags=photo2-new]
βββ file1.doc [tags=file1]
>>> # Get tree attribute differences
>>> tree_diff = get_tree_diff(root, root_other, attr_list=["tags"])
>>> tree_diff.show(attr_list=["tags"])
Downloads
βββ Pictures
βββ photo1.jpg (~) [tags=('photo1', 'photo1-edited')]
βββ photo2.jpg (+)
Note
- tree and other_tree must have the same
sep
symbol, otherwise this will raise ValueError - If the
sep
symbol contains one of+
/-
/~
character, a fallback sep will be used - Node names in tree and other_tree must not contain the
sep
(or fallback sep) symbol
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
Node
|
tree to be compared against |
required |
other_tree
|
Node
|
tree to be compared with |
required |
only_diff
|
bool
|
indicator to show all nodes or only nodes that are different (+/-) |
True
|
detail
|
bool
|
indicator to differentiate between different types of diff e.g., added or removed or moved |
False
|
aggregate
|
bool
|
indicator to only add difference indicator to parent-level e.g., when shifting subtrees |
False
|
attr_list
|
Optional[Iterable[str]]
|
tree attributes to check for difference |
None
|
fallback_sep
|
str
|
sep to fall back to if tree and other_tree has sep that clashes with symbols "+" / "-" / "~". All node names in tree and other_tree should not contain this fallback_sep |
'/'
|
Returns:
Type | Description |
---|---|
Node
|
Tree highlighting the difference between tree and other_tree |