NAME

Data::Reach - Walk down or iterate through a nested Perl datastructure

SYNOPSIS

# reach a subtree or a leaf under a nested datastructure
use Data::Reach;
my $node = reach $data_tree, @path; # @path may contain a mix of hash keys and array indices

# do something with all paths through the datastructure ..
my @result = map_paths {do_something_with(\@_, $_)} $data_tree;

# .. or loop through all paths
my $next_path = each_path $data_tree;
while (my ($path, $val) = $next_path->()) {
  do_something_with($path, $val);
}

# import under a different name
use Data::Reach reach => as => 'walk_down';
my $node = walk_down $data_tree, @path;

# optional changes of algorithm, lexically scoped
{ no Data::Reach  qw/peek_blessed use_overloads/;
  use Data::Reach reach_method => [qw/foo bar/];
  my $node = reach $object_tree, @path;
}
# after end of scope, back to the regular algorithm

DESCRIPTION

Perl supports nested datastructures : a hash may contain references to other hashes or to arrays, which in turn may contain further references to deeper structures -- see perldsc. Walking down through such structures usually involves nested loops, and possibly some tests on ref $subtree for finding out if the next level is an arrayref or a hashref.

The present module offers some utilities for easier handling of nested datastructures :

  • the reach function finds a subtree or a leaf according to a given @path -- a list of hash keys or array indices. If there is no data corresponding to that path, undef is returned, without any autovivification within the tree.

  • the map_paths function applies a given code reference to all paths within the nested datastructure.

  • the each_path function returns an iterator over the nested datastructure; it can be used in the same spirit as a regular each statement over a simple hash, except that it will walk down multiple levels until finding leaf nodes

The "SEE ALSO" section below discusses some alternative implementations.

FUNCTIONS

reach

my $node = reach $data_tree, @path;

Tries to find a node under root $data_tree, walking down the tree and choosing subnodes according to values given in @path (which should be a list of scalar values). At each step :

  • if the root is undef, then undef is returned (even if there are remaining items in @path)

  • if @path is empty, then the root $data_tree is returned

  • if the first item in @path is undef, then undef is returned (even if there are remaining items in @path).

  • if $data_tree is a hashref or can behave as a hashref, then $data_tree->{$path[0]} becomes the new root, and the first item from @path is removed. No distinction is made between a missing or an undefined $data_tree->{$path[0]} : in both cases the result will be undef.

  • if $data_tree is an arrayref or can behave as an arrayref, then $data_tree->[$path[0]] becomes the new root, and the first item from @path is removed. The value in $path[0] must be an integer; otherwise it is improper as an array index and an error is generated. No distinction is made between a missing or an undefined $data_tree->[$path[0]] : in both cases the result will be undef.

  • if $data_tree is any other kind of data (scalar, reference to a scalar, reference to a reference, etc.), an error is generated.

No autovivification nor any writing into the datastructure is ever performed. Missing data merely returns undef, while wrong use of data (for example looking into an arrayref with a non-numerical index) generates an exception.

By default, blessed objects are treated just like raw, unblessed datastructures; however that behaviour can be changed through pragma options, as described below.

map_paths

my @result = map_paths { ... } $data_tree [, $max_depth];

Applies the given block to each path within $data_tree, returning the list of collected results. Within the block, @_ contains the sequence of hash keys or array indices that were traversed, and $_ is aliased to the leaf node. Hence, for a $data_tree of shape :

{ foo => [ undef,
           'abc',
           {bar => {buz => 987}},
           1234,
          ],
  empty_slot  => undef,
  qux         => 'qux',  }

the block will be called six times, with the following values

# value of @_              value of $_
# ===========              ===========
 ('empty_slot,')             undef
 ('foo', 0)                  undef
 ('foo', 1)                  'abc'
 ('foo', 2, 'bar', 'buz')     987
 ('foo', 3)                  1234
 ('qux')                     'qux'

The optional $max_depth argument limits the depth of tree traversal : subtrees below that depth will be treated as leaves.

The $data_tree argument is usually a reference to a hash or to an array; but it can also be supplied directly as a hash or array -- this will be automatically converted into a reference.

By default, blessed objects are treated just like raw, unblessed datastructures; however that behaviour can be changed through pragma options, as described below.

each_path

my $next_path_iterator = each_path $data_tree [, $max_depth];
while (my ($path, $val) = $next_path_iterator->()) {
  do_something_with($path, $val);
}

Returns an iterator function that will walk through the datastructure. Each call to the iterator will return a pair ($path, $val), where $path is an arrayref that contains the sequence of hash keys or array indices that were traversed, and $val is the leaf node.

By default, blessed objects are treated just like raw, unblessed datastructures; however that behaviour can be changed through pragma options, as described below.

IMPORT INTERFACE

Exporting the 'reach', 'map_paths' and 'each_path' functions

The 'reach', 'map_paths' and 'each_path' functions are exported by default when useing this module :

use Data::Reach;
use Data::Reach qw/reach map_paths each_path/; # equivalent to the line above

However the exported names can be changed through the as option :

use Data::Reach reach => as => 'walk_down', map_paths => as => 'find_subtrees';
my $node = walk_down $data, @path;

Pragma options for reaching within objects

Arguments to the import method may also change the algorithm used to deal with objects met while traversing the datastructure. These options can be turned on or off as lexical pragmata; this means that the effect of change of algorithm is valid until the end of the current scope (see "use" in perlfunc, "no" in perlfunc and perlpragma).

reach_method
use Data::Reach reach_method => $method_name;

If the target object possesses a method corresponding to the name specified, that method will be called, with a single argument corresponding to the current value in path. The method is supposed to reach down one step into the datastructure and return the next data subtree or leaf.

The presence this method is the first choice for reaching within an object. If this cannot be applied, either because there was no required method, or because the target object has no such method, then the second choice is to use overloads, as described below.

use_overloads
use Data::Reach qw/use_overloads/; # turn the option on
no  Data::Reach qw/use_overloads/; # turn the option off

This option is true by default; it means that if the object has an overloaded hash or array dereferencing function, that function will be called (see overload). This feature distinguishes Data::Reach from other similar modules listed in the "SEE ALSO" section.

peek_blessed
use Data::Reach qw/peek_blessed/; # turn the option on
no  Data::Reach qw/peek_blessed/; # turn the option off

This option is true by default; it means that the reach functions will go down into object implementations (i.e. reach internal attributes within the object's hashref). Turn it off if you want objects to stay opaque, with public methods as the only way to reach internal information.

paths_method
use Data::Reach paths_method => $method_name;

If the target object possesses a method corresponding to the name specified, that method will be called for for finding the list of path items under the current tree (like the list of keys for a hash, or the list of indices for an array).

Note that several options can be tuned in one single statement :

no  Data::Reach qw/use_overloads peek_blessed/; # turn both options off

SEE ALSO

For reaching data subtrees, there are many similar modules on CPAN, each of them having some variations in the set of features. Here are a few pointers, and the reasons why I didn't use them :

Data::Diver

Does quite a similar job, with a richer API (can also write into the datastructure or use it as a lvalue). Return values may be complex to decode (distinctions between an empty list, an undef, or a singleton containing an undef). It uses eval internally, without taking care of eval pitfalls (see "BACKGROUND" in Try::Tiny for explanations).

Data::DRef

An old module (last update was in 1999), still relevant for modern Perl, except that it does not handle overloads, which were not available at that time. The API is a bit too rich to my taste (many different ways to get or set data).

Data::DPath or Data::Path

Two competing modules for accessing nested data through expressions similar to XPath. Very interesting, but a bit overkill for the needs I wanted to cover here.

Data::Focus

Creates a "focus" object that walks through the data using various "lenses". An interesting approach, inspired by Haskell, but also a bit overkill.

Data::PathSimple

Very concise. The path is expressed as a '/'-separated string instead of an array of values. Does not handle overloads.

AUTHOR

Laurent Dami, <dami at cpan.org>

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Data::Reach

You can also look for information at https://metacpan.org/pod/Data::Reach

The source code is at https://github.com/damil/Data-Reach. Bug reports or feature requests can be addressed at https://github.com/damil/Data-Reach/issues.

LICENSE AND COPYRIGHT

Copyright 2015, 2022 Laurent Dami.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0