hycusampling — Uniform sampling of the unit hypercube

This module provides functions for super-uniform sampling of the unit hypercube. ‘Super-uniform’ in this context means that the obtained point sample is more uniform than a random uniform sample, which is a desirable property in many applications. After creation, the samples can be transformed from the unit hypercube to arbitrary cuboids.

Sampling algorithms

diversipy.hycusampling.maximin_reconstruction(num_points, dimension, num_steps=None, initial_points=None, existing_points=None, use_reflection_edge_correction=False, dist_matrix_function=None, callback=None)

Maximize the minimal distance in the unit hypercube with extensions.

This algorithm carries out a user-specified number of iterations to maximize the minimal distance of a point in the set to 1) other points in the set, 2) existing (fixed) points, and 3) the boundary of the hypercube. Details can be found in [Wessing2015].

Parameters:
  • num_points (int) – The number of points to generate.
  • dimension (int) – The dimension of the space.
  • num_steps (int, optional) – The number of iterations to carry out. Default is 100 * num_points.
  • initial_points (array_like, optional) – The point set to improve (if None, a sample is drawn with stratified_sampling()).
  • existing_points (array_like, optional) – Points that cannot be modified anymore, but should be considered in the distance computations.
  • use_reflection_edge_correction (bool, optional) – If True, selection pressure in boundary regions will be increased by considering additional distances to virtual points, which are created by mirroring the real points at the boundary.
  • dist_matrix_function (callable, optional) – The function to compute the distances. Default is Manhattan distance on a torus.
  • callback (callable, optional) – If provided, it is called in each iteration with the current point set as argument for monitoring progress.
Returns:

points

Return type:

(num_points, dimension) numpy array

References

[Wessing2015](1, 2) Wessing, Simon (2015). Two-stage Methods for Multimodal Optimization. PhD Thesis, Technische Universität Dortmund. http://hdl.handle.net/2003/34148
diversipy.hycusampling.random_k_means(num_points, dimension, num_steps=None, initial_points=None, dist_matrix_function=None, callback=None)

MacQueen’s method.

In its default setup, this algorithm converges to a centroidal Voronoi tesselation of the unit hypercube. Further information is given in [MacQueen1967].

Parameters:
  • num_points (int) – The number of points to generate.
  • dimension (int) – The dimension of the space.
  • num_steps (int, optional) – The number of iterations to carry out. Default is 100 * num_points.
  • initial_points (array_like, optional) – The point set to improve (if None, a sample is drawn with stratified_sampling()).
  • dist_matrix_function (callable, optional) – The function to compute the distances. Default is Euclidean distance.
  • callback (callable, optional) – If provided, it is called in each iteration with the current point set as argument for monitoring progress.
Returns:

cluster_centers

Return type:

(num_points, dimension) numpy array

References

[MacQueen1967]MacQueen, J. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297, University of California Press, Berkeley, Calif., 1967. http://projecteuclid.org/euclid.bsmsp/1200512992.
diversipy.hycusampling.conventional_stratified_sampling(num_points, dimension, full_output=False, bates_param=1)

Stratified sampling in the unit hypercube.

This algorithm divides the hypercube into num_points subcells and draws a random uniform point from each cell. Thus, the result is stochastic, but more uniform than a random uniform sample. For further information see [McKay1979].

Parameters:
  • num_points (int) – The number of points to generate. num_points ** (1/dimension) must be integer.
  • dimension (int) – The dimension of the space.
  • full_output (bool) – If True, also the strata are returned as second argument. Default is False.
  • bates_param (int, optional) – Each coordinate of a point sampled in a stratum is determined as the mean of this number of independent random uniform variables. Thus, the coordinates follow the Bates distribution.
Returns:

points

Return type:

(num_points, dimension) numpy array

diversipy.hycusampling.stratified_sampling(num_points, dimension, cuboid=None, full_output=False, detect_special_case=True, bates_param=1, avoid_odd_numbers=True)

Generalized stratified sampling.

Parameters:
  • num_points (int) – The number of points to be sampled. An arbitrary number of points is possible for this algorithm.
  • dimension (int) – The dimension of the search space.
  • cuboid (tuple of list, optional) – Optionally specify the hypercube to be sampled in. If None, the unit hypercube is chosen.
  • full_output (bool, optional) – If True, also the strata are returned as second argument.
  • detect_special_case (bool, optional) – If True, num_points ** (1/dimension) is integer, and we are sampling the unit cube, use the original stratified sampling.
  • bates_param (int, optional) – Each coordinate of a point sampled in a stratum is determined as the mean of this number of independent random uniform variables. Thus, the coordinates follow the Bates distribution.
  • avoid_odd_numbers (bool, optional) – If this value is True, splits are chosen so that the resulting numbers are even, whenever possible. E.g., if a stratum with six points is splitted, it is not split into three and three, but two and four points.
Returns:

points

Return type:

(num_points, dimension) numpy array

diversipy.hycusampling.halton(num_points, dimension, skip=0)

Generate a Halton point set.

Quasirandom sequence using the default initialization with the first dimension prime numbers.

Parameters:
  • num_points (int) – The number of points to generate.
  • dimension (int) – The dimension of the space.
  • skip (int, optional) – The first skip points of the sequence will be left out.
Returns:

points

Return type:

(num_points, dimension) numpy array

diversipy.hycusampling.random_uniform(num_points, dimension)

Syntactic sugar for numpy.random.rand().

diversipy.hycusampling.grid(num_points, dimension)

Create conventional grid in unit hypercube.

Also related to full factorial designs.

Parameters:
  • num_points (int) – The number of points to generate. num_points ** (1/dimension) must be integer.
  • dimension (int) – The dimension of the space.
Returns:

points

Return type:

(num_points, dimension) numpy array

diversipy.hycusampling.sukharev_grid(num_points, dimension)

Create Sukharev grid in unit hypercube.

Special property of this grid is that points are not placed on the boundaries of the hypercube, but at centroids of the num_points subcells. This design offers optimal results for the covering radius regarding distances based on the max-norm.

Parameters:
  • num_points (int) – The number of points to generate. num_points ** (1/dimension) must be integer.
  • dimension (int) – The dimension of the space.
Returns:

points

Return type:

(num_points, dimension) numpy array

Latin hypercube designs

diversipy.hycusampling.lhd_matrix(num_points, dimension)

Generate a random latin hypercube design matrix.

Latin hypercube designs sometimes give an advantage over random uniform samples due to their perfect uniformity of one-dimensional projections. For further information see [McKay1979]. This algorithm has linear run time.

Parameters:
  • num_points (int) – The number of points to generate.
  • dimension (int) – The dimension of the space.
Returns:

design – Contains integers corresponding to the bins of a virtual grid.

Return type:

(num_points, dimension) numpy array

References

[McKay1979](1, 2, 3) McKay, M.D.; Beckman, R.J.; Conover, W.J. (May 1979). A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics 21(2): 239-245.
diversipy.hycusampling.improved_lhd_matrix(num_points, dimension, num_candidates=100, target_value=None, dist_matrix_function=None)

Generate an ‘improved’ latin hypercube design matrix.

This implementation uses an algorithm with quadratic run time. It is a greedy construction heuristic starting with a randomly chosen point. In each iteration, a number of random candidates is evaluated by a criterion that considers a candidate’s distance to the previously chosen points. The best point according to this criterion is included in the LHD. The concept originally stems from [Beachkofski2002]. The algorithm implemented here was proposed in [Wessing2015].

Parameters:
  • num_points (int) – The number of points to generate.
  • dimension (int) – The dimension of the space.
  • num_candidates (int, optional) – The number of random candidates considered for every point to be added to the LHD.
  • target_value (float, optional) – The distance a candidate should ideally have to the already chosen points of the LHD.
  • dist_matrix_function (callable, optional) – Defines the distance used. Default is Manhattan distance on a torus (maximum distance is num_points - 1, well-suited for edge_lhs()).
Returns:

design – Contains integers corresponding to the bins of a virtual grid.

Return type:

(num_points, dimension) numpy array

References

[Beachkofski2002]Beachkofski, B.; Grandhi, R. (2002). Improved Distributed Hypercube Sampling. American Institute of Aeronautics and Astronautics Paper 1274.
diversipy.hycusampling.has_lhd_property(design_matrix)

Check if design matrix has the LHD property.

It is assumed that counting starts from zero and points are arranged row-wise. So, each column must consist of a permutation of range(num_points).

Parameters:design_matrix (array_like) – 2-D array of integer coordinates.
Returns:is_lhd
Return type:bool
diversipy.hycusampling.perturbed_lhs(design_matrix)

Transform a design matrix into a sample in the unit hypercube.

Applies random perturbations so that each point is distributed randomly uniform in its grid cell. This is the variant proposed by [McKay1979]. It is not checked if design_matrix actually is a LHD.

Parameters:design_matrix (array_like) – Array containing integers to indicate the bins occupied by each point.
Returns:points
Return type:(num_points, dimension) numpy array
diversipy.hycusampling.centered_lhs(design_matrix)

Transform a design matrix into a sample in the unit hypercube.

Each point is placed at the centroid of a subcell in the assumed grid over the cube. It is not checked if design_matrix actually is a LHD.

Parameters:design_matrix (array_like) – Array containing integers to indicate the bins occupied by each point.
Returns:points
Return type:(num_points, dimension) numpy array
diversipy.hycusampling.edge_lhs(design_matrix)

Transform a design matrix into a sample in the unit hypercube.

The transformation is so that each face of the hypercube is sampled by exactly one point. Use this transformation if you want to maximize the minimal distance between points in the design. It is not checked if design_matrix actually is a LHD.

Parameters:design_matrix (array_like) – Array containing integers to indicate the bins occupied by each point.
Returns:points
Return type:(num_points, dimension) numpy array

Helper functions

diversipy.hycusampling.unitcube(dimension)

Shortcut to generate a tuple of bounds of the unit hypercube.

diversipy.hycusampling.scaled(points, from_cuboid, to_cuboid)

Linear transformation between arbitrary cuboids.

This function does not check if the points are actually inside from_cuboid.

Parameters:
  • points (array_like) – 2-D array of points.
  • from_cuboid (tuple) – Contains lower and upper boundaries.
  • to_cuboid (tuple) – Contains lower and upper boundaries.
Returns:

scaled_points – A new array containing the scaled points.

Return type:

numpy array