Skip to content

πŸ”§ Helper

Helper functions that can come in handy.


bigtree.tree.helper

clone_tree

clone_tree(tree, node_type)

Clone tree to another Node type. If the same type is needed, simply do a tree.copy().

Examples:

>>> from bigtree import BaseNode, Node, clone_tree
>>> root = BaseNode(name="a")
>>> b = BaseNode(name="b", parent=root)
>>> clone_tree(root, Node)
Node(/a, )

Parameters:

Name Type Description Default
tree BaseNode

tree to be cloned, must inherit from BaseNode

required
node_type Type[BaseNodeT]

type of cloned tree

required

Returns:

Type Description
BaseNodeT

Cloned tree of another Node type

get_subtree

get_subtree(tree, node_name_or_path=None, max_depth=0)

Get subtree based on node name or node path, and/or maximum depth of tree. Subtrees are smaller trees with different root. Returns a copy of the tree; does not affect original tree.

Examples:

>>> from bigtree import Node, get_subtree
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> c = Node("c", parent=b)
>>> d = Node("d", parent=b)
>>> e = Node("e", parent=root)
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ c
β”‚   └── d
└── e

Get subtree

>>> root_subtree = get_subtree(root, "b")
>>> root_subtree.show()
b
β”œβ”€β”€ c
└── d

Parameters:

Name Type Description Default
tree NodeT

existing tree

required
node_name_or_path Optional[str]

node name or path to get subtree

None
max_depth int

maximum depth of subtree, based on depth attribute

0

Returns:

Type Description
NodeT

Subtree

prune_tree

prune_tree(
    tree, prune_path=None, exact=False, sep="/", max_depth=0
)

Prune tree by path or depth. Pruned trees are smaller trees with same root. Returns a copy of the tree; does not affect original tree.

For pruning by prune_path,

  • All siblings along the prune path will be removed. All descendants will be kept by default
  • If exact=True, all descendants of prune path will be removed
  • Prune path can be string (only one path) or a list of strings (multiple paths)
  • Prune path name should be unique, can be full path, partial path (trailing part of path), or node name

For pruning by max_depth,

  • All nodes that are beyond max_depth will be removed

Path should contain Node name, separated by sep.

  • For example: Path string "a/b" refers to Node("b") with parent Node("a")

Examples:

>>> from bigtree import Node, prune_tree
>>> root = Node("a")
>>> b = Node("b", parent=root)
>>> c = Node("c", parent=b)
>>> d = Node("d", parent=b)
>>> e = Node("e", parent=root)
>>> root.show()
a
β”œβ”€β”€ b
β”‚   β”œβ”€β”€ c
β”‚   └── d
└── e

Prune tree

>>> root_pruned = prune_tree(root, "a/b")
>>> root_pruned.show()
a
└── b
    β”œβ”€β”€ c
    └── d

Prune by exact path

>>> root_pruned = prune_tree(root, "a/b", exact=True)
>>> root_pruned.show()
a
└── b

Prune by multiple paths

>>> root_pruned = prune_tree(root, ["a/b/d", "a/e"])
>>> root_pruned.show()
a
β”œβ”€β”€ b
β”‚   └── d
└── e

Prune by depth

>>> root_pruned = prune_tree(root, max_depth=2)
>>> root_pruned.show()
a
β”œβ”€β”€ b
└── e

Parameters:

Name Type Description Default
tree Union[BinaryNodeT, NodeT]

existing tree

required
prune_path Optional[Union[Iterable[str], str]]

prune path(s), all siblings along the prune path(s) will be removed

None
exact bool

prune path(s) to be exactly the path and remove descendants

False
sep str

path separator of prune_path

'/'
max_depth int

maximum depth of pruned tree, based on depth attribute

0

Returns:

Type Description
Union[BinaryNodeT, NodeT]

Pruned tree

get_tree_diff_dataframe

get_tree_diff_dataframe(
    tree,
    other_tree,
    only_diff=True,
    detail=False,
    aggregate=False,
    attr_list=None,
    fallback_sep="/",
    name_col="name",
    path_col="path",
    parent_col="parent",
    indicator_col="Exists",
    old_suffix="_old",
    new_suffix="_new",
    suffix_col="suffix",
)

Get difference of tree to other_tree, changes are relative to tree. This function exports both trees to pandas DataFrame, merge them, and adds new suffix column to indicate the type of differences in both trees.

Comparing tree structure

By default, suffix will be '+' and '-' for the tree differences, and np.nan for the others.

  • If detail=True, 'added' and 'moved to' will be used instead of '+', and 'removed' and 'moved from' will be used instead of '-'
  • If aggregate=True, suffix will only be indicated at the parent-level. This is useful when a subtree is shifted and we want the differences shown only at the top node

Compare tree attribute

Attributes indicated in attr_list will be exported in the pandas DataFrame with suffixes representing attributes from tree and other_tree respectively.

Examples:

>>> # Create original tree
>>> from bigtree import Node, get_tree_diff_dataframe, list_to_tree
>>> root = list_to_tree(["Downloads/Pictures/photo1.jpg", "Downloads/file1.doc", "Downloads/Trip/photo2.jpg"])
>>> root.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   └── photo1.jpg
β”œβ”€β”€ file1.doc
└── Trip
    └── photo2.jpg
>>> # Create other tree
>>> root_other = list_to_tree(
...     ["Downloads/Pictures/photo1.jpg", "Downloads/Pictures/Trip/photo2.jpg", "Downloads/file1.doc", "Downloads/file2.doc"]
... )
>>> root_other.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ photo1.jpg
β”‚   └── Trip
β”‚       └── photo2.jpg
β”œβ”€β”€ file1.doc
└── file2.doc

Comparing tree structure

>>> get_tree_diff_dataframe(root, root_other, detail=True)
                                  path        name     parent      Exists      suffix
0                           /Downloads   Downloads       None        both         NaN
1                  /Downloads/Pictures    Pictures  Downloads        both         NaN
2             /Downloads/Pictures/Trip        Trip   Pictures  right_only    moved to
3  /Downloads/Pictures/Trip/photo2.jpg  photo2.jpg       Trip  right_only    moved to
4       /Downloads/Pictures/photo1.jpg  photo1.jpg   Pictures        both         NaN
5                      /Downloads/Trip        Trip  Downloads   left_only  moved from
6           /Downloads/Trip/photo2.jpg  photo2.jpg       Trip   left_only  moved from
7                 /Downloads/file1.doc   file1.doc  Downloads        both         NaN
8                 /Downloads/file2.doc   file2.doc  Downloads  right_only       added

Note

  • tree and other_tree must have the same sep symbol, otherwise this will raise ValueError
  • If the sep symbol contains one of + / - / ~ character, a fallback sep will be used
  • Node names in tree and other_tree must not contain the sep (or fallback sep) symbol

Parameters:

Name Type Description Default
tree Node

tree to be compared against

required
other_tree Node

tree to be compared with

required
only_diff bool

if aggregate and only_diff are True, child nodes that are moved from tree will be removed

True
detail bool

by default, suffix column will display "+" and "-". If detail is True, suffix column will be more detailed, displaying "moved from" / "moved to" / "added" / "removed" instead

False
aggregate bool

by default, all nodes that are different will have suffix specified. If aggregate is True, only parent-level node have suffixes and nodes that have different paths but same parent will not have suffix

False
attr_list Optional[List[str]]

tree attributes to retrieve from tree and other_tree

None
fallback_sep str

sep to fall back to if tree and other_tree has sep that clashes with symbols "+" / "-" / "~". All node names in tree and other_tree should not contain this fallback_sep

'/'
name_col str

name column of return dataframe, indicates the name of node

'name'
path_col str

path column of return dataframe, indicates the full path of node

'path'
parent_col str

parent column of return dataframe, indicates the parent name of node

'parent'
indicator_col str

indicator column of return dataframe, indicates whether node appears in left_only, right_only or both tree

'Exists'
old_suffix str

suffix given to attributes from tree of return dataframe, relevant if attr_list is specified

'_old'
new_suffix str

suffix given to attributes from other_tree of return dataframe, relevant if attr_list is specified

'_new'
suffix_col str

suffix column of return dataframe, indicates the type of diff whether it is added, removed, or moved

'suffix'

Returns:

Type Description
DataFrame

Dataframe of tree differences

get_tree_diff

get_tree_diff(
    tree,
    other_tree,
    only_diff=True,
    detail=False,
    aggregate=False,
    attr_list=None,
    fallback_sep="/",
)

Get difference of tree to other_tree, changes are relative to tree.

Compares the difference in tree structure (default), but can also compare tree attributes using attr_list. Function can return only the differences (default), or all original tree nodes and differences.

Comparing tree structure

  • (+) and (-) will be added to node name relative to tree
  • For example: (+) refers to nodes that are in other_tree but not tree
  • For example: (-) refers to nodes that are in tree but not other_tree

If detail=True, (added) and (moved to) will be used instead of (+), (removed) and (moved from) will be used instead of (-).

If aggregate=True, differences (+)/(added)/(moved to) and (-)/(removed)/(moved from) will only be indicated at the parent-level. This is useful when a subtree is shifted, and we want the differences shown only at the top node.

Examples:

>>> # Create original tree
>>> from bigtree import Node, get_tree_diff, list_to_tree
>>> root = list_to_tree(["Downloads/Pictures/photo1.jpg", "Downloads/file1.doc", "Downloads/Trip/photo2.jpg"])
>>> root.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   └── photo1.jpg
β”œβ”€β”€ file1.doc
└── Trip
    └── photo2.jpg
>>> # Create other tree
>>> root_other = list_to_tree(
...     ["Downloads/Pictures/photo1.jpg", "Downloads/Pictures/Trip/photo2.jpg", "Downloads/file1.doc", "Downloads/file2.doc"]
... )
>>> root_other.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ photo1.jpg
β”‚   └── Trip
β”‚       └── photo2.jpg
β”œβ”€β”€ file1.doc
└── file2.doc

Comparing tree structure

>>> tree_diff = get_tree_diff(root, root_other)
>>> tree_diff.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   └── Trip (+)
β”‚       └── photo2.jpg (+)
β”œβ”€β”€ Trip (-)
β”‚   └── photo2.jpg (-)
└── file2.doc (+)

All differences

>>> tree_diff = get_tree_diff(root, root_other, only_diff=False)
>>> tree_diff.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ Trip (+)
β”‚   β”‚   └── photo2.jpg (+)
β”‚   └── photo1.jpg
β”œβ”€β”€ Trip (-)
β”‚   └── photo2.jpg (-)
β”œβ”€β”€ file1.doc
└── file2.doc (+)

All differences with details

>>> tree_diff = get_tree_diff(
...     root, root_other, only_diff=False, detail=True
... )
>>> tree_diff.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ Trip (moved to)
β”‚   β”‚   └── photo2.jpg (moved to)
β”‚   └── photo1.jpg
β”œβ”€β”€ Trip (moved from)
β”‚   └── photo2.jpg (moved from)
β”œβ”€β”€ file1.doc
└── file2.doc (added)

All differences with details on aggregated level

>>> tree_diff = get_tree_diff(
...     root, root_other, only_diff=False, detail=True, aggregate=True
... )
>>> tree_diff.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ Trip (moved to)
β”‚   β”‚   └── photo2.jpg
β”‚   └── photo1.jpg
β”œβ”€β”€ Trip (moved from)
β”‚   └── photo2.jpg
β”œβ”€β”€ file1.doc
└── file2.doc (added)

Only differences with details on aggregated level

>>> tree_diff = get_tree_diff(root, root_other, detail=True, aggregate=True)
>>> tree_diff.show()
Downloads
β”œβ”€β”€ Pictures
β”‚   └── Trip (moved to)
β”‚       └── photo2.jpg
β”œβ”€β”€ Trip (moved from)
└── file2.doc (added)

Comparing tree attribute

  • (~) will be added to node name if there are differences in tree attributes defined in attr_list
  • The node's attributes will be a list of [value in tree, value in other_tree]
>>> # Create original tree
>>> root = Node("Downloads")
>>> picture_folder = Node("Pictures", parent=root)
>>> photo2 = Node("photo1.jpg", tags="photo1", parent=picture_folder)
>>> file1 = Node("file1.doc", tags="file1", parent=root)
>>> root.show(attr_list=["tags"])
Downloads
β”œβ”€β”€ Pictures
β”‚   └── photo1.jpg [tags=photo1]
└── file1.doc [tags=file1]
>>> # Create other tree
>>> root_other = Node("Downloads")
>>> picture_folder = Node("Pictures", parent=root_other)
>>> photo1 = Node("photo1.jpg", tags="photo1-edited", parent=picture_folder)
>>> photo2 = Node("photo2.jpg", tags="photo2-new", parent=picture_folder)
>>> file1 = Node("file1.doc", tags="file1", parent=root_other)
>>> root_other.show(attr_list=["tags"])
Downloads
β”œβ”€β”€ Pictures
β”‚   β”œβ”€β”€ photo1.jpg [tags=photo1-edited]
β”‚   └── photo2.jpg [tags=photo2-new]
└── file1.doc [tags=file1]
>>> # Get tree attribute differences
>>> tree_diff = get_tree_diff(root, root_other, attr_list=["tags"])
>>> tree_diff.show(attr_list=["tags"])
Downloads
└── Pictures
    β”œβ”€β”€ photo1.jpg (~) [tags=('photo1', 'photo1-edited')]
    └── photo2.jpg (+)

Note

  • tree and other_tree must have the same sep symbol, otherwise this will raise ValueError
  • If the sep symbol contains one of + / - / ~ character, a fallback sep will be used
  • Node names in tree and other_tree must not contain the sep (or fallback sep) symbol

Parameters:

Name Type Description Default
tree Node

tree to be compared against

required
other_tree Node

tree to be compared with

required
only_diff bool

indicator to show all nodes or only nodes that are different (+/-)

True
detail bool

indicator to differentiate between different types of diff e.g., added or removed or moved

False
aggregate bool

indicator to only add difference indicator to parent-level e.g., when shifting subtrees

False
attr_list Optional[Iterable[str]]

tree attributes to check for difference

None
fallback_sep str

sep to fall back to if tree and other_tree has sep that clashes with symbols "+" / "-" / "~". All node names in tree and other_tree should not contain this fallback_sep

'/'

Returns:

Type Description
Node

Tree highlighting the difference between tree and other_tree