DBHT、OWA权重、Gerber 统计、CPP 和辅助功能

https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86
Buy Me a Coffee at ko-fi.com

DBHT 模块的功能允许我们使用有向气泡层次树(DBHT) [D2] ,一种新的连接方法;以及 j-LoGo [D3] 协方差估计方法。

OwaWeights 模块有一些函数,帮助我们建立OWA组合优化模型的一些特殊情况的权重 [D4]

GerberStatistic 模块中的函数允许我们使用 Gerber 统计 [D5]

cppfunctions 模块有一些函数,帮助我们建立一些在 [D6][D7] 定义的特殊矩阵。

AuxFunctions 模块有一些辅助函数,在其他模块中使用。

DBHT 函数

DBHT.DBHTs(D, S, leaf_order=True)[源代码]

Perform Direct Bubble Hierarchical Tree (DBHT) clustering, a deterministic technique which only requires a similarity matrix S, and related dissimilarity matrix D. For more information see “Hierarchical information clustering by means of topologically embedded graphs.” [D2]. This version makes extensive use of graph-theoretic filtering technique called Triangulated Maximally Filtered Graph (TMFG).

参数:
  • D (nd-array) – N x N dissimilarity matrix - e.g. a distance: D=pdist(data,’euclidean’) and then D=squareform(D).

  • S (nd-array) – N x N similarity matrix (non-negative)- e.g. correlation coefficient+1: S = 2-D**2/2 or another possible choice can be S = exp(-D).

返回:

  • T8 (DataFrame) – N x 1 cluster membership vector.

  • Rpm (nd-array) – N x N adjacency matrix of Plannar Maximally Filtered Graph (PMFG).

  • Adjv (nd-array) – Bubble cluster membership matrix from BubbleCluster8.

  • Dpm (nd-array) – N x N shortest path length matrix of PMFG

  • Mv (nd-array) – N x Nb bubble membership matrix. Nb(n,bi)=1 indicates vertex n is a vertex of bubble bi.

  • Z (nd-array) – Linkage matrix using DBHT hierarchy.

computes sparse inverse covariance, J, from a clique tree made of cliques and separators. For more information see: [D3].

参数:
  • S (ndarray) – It is the complete covariance matrix.

  • separators (nd-array) – It is the list of separators.

  • clique (nd-array) – It is the list of cliques.

返回:

JLogo – Inverse covariance.

返回类型:

nd-array

备注

separators and cliques can be the outputs of TMFG function

DBHT.PMFG_T2s(W, nargout=3)[源代码]

Computes a Triangulated Maximally Filtered Graph (TMFG) [D8] starting from a tetrahedron and inserting recursively vertices inside existing triangles (T2 move) in order to approximate a maximal planar graph with the largest total weight - non negative weights.

参数:
  • W (nd-array) – An N x N matrix of non-negative weights.

  • nargout (int, optional) – Number of results, Possible values are 3, 4 and 5.

返回:

  • A (nd-array) – Adjacency matrix of the PMFG (with weights)

  • tri (nd-array) – Matrix of triangles (triangular faces) of size 2N - 4 x 3

  • separators (nd-array) – Matrix of 3-cliques that are not triangular faces (all 3-cliques are given by: [tri;separators]).

  • clique4 (nd-array, optional) – List of all 4-cliques.

  • cliqueTree (nd-array, optional) – 4-cliques tree structure (adjacency matrix).

DBHT.distance_wei(L)[源代码]

The distance matrix contains lengths of shortest paths between all pairs of nodes. An entry (u,v) represents the length of shortest path from node u to node v. The average shortest path length is the characteristic path length of the network.

参数:

L (nd-array) – Directed/undirected connection-length matrix.

返回:

  • D (nd-array) – Distance (shortest weighted path) matrix

  • B (nd-array) – Number of edges in shortest weighted path matrix

备注

The input matrix must be a connection-length matrix, typically obtained via a mapping from weight to length. For instance, in a weighted correlation network higher correlations are more naturally interpreted as shorter distances and the input matrix should consequently be some inverse of the connectivity matrix. The number of edges in shortest weighted paths may in general exceed the number of edges in shortest binary paths (i.e. shortest paths computed on the binarized connectivity matrix), because shortest weighted paths have the minimal weighted distance, but not necessarily the minimal number of edges.

Lengths between disconnected nodes are set to Inf. Lengths on the main diagonal are set to 0.

Algorithm: Dijkstra’s algorithm.

Mika Rubinov, UNSW/U Cambridge, 2007-2012. Rick Betzel and Andrea Avena, IU, 2012 Modification history : 2007: original (MR) 2009-08-04: min() function vectorized (MR) 2012: added number of edges in shortest path as additional output (RB/AA) 2013: variable names changed for consistency with other functions (MR)

DBHT.CliqHierarchyTree2s(Apm, method1)[源代码]

ClqHierarchyTree2 looks for 3-cliques of a maximal planar graph, then construct hierarchy of the cliques with the definition of ‘inside’ a clique to be a subgraph with smaller size, when the entire graph is made disjoint by removing the clique [D9].

参数:
  • Apm (N) – N x N Adjacency matrix of a maximal planar graph.

  • method1 (str) – Choose between ‘uniqueroot’ and ‘equalroot’. Assigns connections between final root cliques. Uses Voronoi tesselation between tiling triangles.

返回:

  • H1 (nd-array) – Nc x Nc adjacency matrix for 3-clique hierarchical tree where Nc is the number of 3-cliques.

  • H2 (nd-array) – Nb x Nb adjacency matrix for bubble hierarchical tree where Nb is the number of bubbles.

  • Mb (nd-array) – Nc x Nb matrix bubble membership matrix. Mb(n,bi)=1 indicates that 3-clique n belongs to bi bubble.

  • CliqList (nd-array) – Nc x 3 matrix. Each row vector lists three vertices consisting a 3-clique in the maximal planar graph.

  • Sb (nd-array) – Nc x 1 vector. Sb(n)=1 indicates nth 3-clique is separating.

DBHT.clique3(A)[源代码]

Computes the list of 3-cliques.

参数:

A (nd-array) – N x N sparse adjacency matrix.

返回:

clique – Nc x 3 matrix. Each row vector contains the list of vertices for a 3-clique.

返回类型:

nd-array

DBHT.breadth(CIJ, source)[源代码]

Implementation of breadth-first search.

参数:
  • CIJ (nd-array) – Binary (directed/undirected) connection matrix

  • source (nd-array) – Source vertex

返回:

  • distance (nd-array) – Distance between ‘source’ and i’th vertex (0 for source vertex).

  • branch (nd-array) – Vertex that precedes i in the breadth-first search tree (-1 for source vertex)

备注

Breadth-first search tree does not contain all paths (or all shortest paths), but allows the determination of at least one path with minimum distance. The entire graph is explored, starting from source vertex ‘source’.

Olaf Sporns, Indiana University, 2002/2007/2008

DBHT.BubbleCluster8s(Rpm, Dpm, Hb, Mb, Mv, CliqList)[源代码]

Obtains non-discrete and discrete clusterings from the bubble topology of PMFG.

参数:
  • Rpm (nd-array) – N x N sparse weighted adjacency matrix of PMFG.

  • Dpm (nd-array) – N x N shortest path lengths matrix of PMFG

  • Hb (nd-array) – Undirected bubble tree of PMFG

  • Mb (nd-array) – Nc x Nb bubble membership matrix for 3-cliques. Mb(n,bi)=1 indicates that 3-clique n belongs to bi bubble.

  • Mv (nd-array) – N x Nb bubble membership matrix for vertices.

  • CliqList (nd-array) – Nc x 3 matrix of list of 3-cliques. Each row vector contains the list of vertices for a particular 3-clique.

返回:

  • Adjv (nd-array) – N x Nk cluster membership matrix for vertices for non-discrete clustering via the bubble topology. Adjv(n,k)=1 indicates cluster membership of vertex n to kth non-discrete cluster.

  • Tc (nd-array) – N x 1 cluster membership vector. Tc(n)=k indicates cluster membership of vertex n to kth discrete cluster.

DBHT.DirectHb(Rpm, Hb, Mb, Mv, CliqList)[源代码]

Computes directions on each separating 3-clique of a maximal planar graph, hence computes Directed Bubble Hierarchical Tree (DBHT).

参数:
  • Rpm (nd-array) – N x N sparse weighted adjacency matrix of PMFG

  • Hb (nd-array) – Undirected bubble tree of PMFG

  • Mb (nd-array) – Nc x Nb bubble membership matrix for 3-cliques. Mb(n,bi)=1 indicates that 3-clique n belongs to bi bubble.

  • Mv (nd-array) – N x Nb bubble membership matrix for vertices.

  • CliqList (nd-array) – Nc x 3 matrix of list of 3-cliques. Each row vector contains the list of vertices for a particular 3-clique.

返回:

Hc – Nb x Nb unweighted directed adjacency matrix of DBHT. Hc(i,j)=1 indicates a directed edge from bubble i to bubble j.

返回类型:

nd-array

DBHT.HierarchyConstruct4s(Rpm, Dpm, Tc, Adjv, Mv)[源代码]

Constructs intra- and inter-cluster hierarchy by utilizing Bubble hierarchy structure of a maximal planar graph, namely Planar Maximally Filtered Graph (PMFG).

参数:
  • Rpm (nd-array) – N x N Weighted adjacency matrix of PMFG.

  • Dpm (nd-array) – N x N shortest path length matrix of PMFG.

  • Tc (nd-array) – N x 1 cluster membership vector from DBHT clustering. Tc(n)=z_i indicate cluster of nth vertex.

  • Adjv (nd-array) – Bubble cluster membership matrix from BubbleCluster8s.

  • Mv (nd-array) – Bubble membership of vertices from BubbleCluster8s.

返回:

Z – (N-1) x 4 linkage matrix, in the same format as the output from matlab function ‘linkage’.

返回类型:

nd-array

OWA Weights 函数

OwaWeights.owa_l_moment(T, k=2)[源代码]

Calculate the OWA weights to calculate the kth linear moment (l-moment) of a returns series as shown in [D10].

参数:
  • T (int) – Number of observations of the returns series.

  • k (int) – Order of the l-moment. Must be an integer higher or equal than 1.

返回:

value – An OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_gmd(T)[源代码]

Calculate the OWA weights to calculate the Gini mean difference (GMD) of a returns series as shown in [D4].

参数:

T (int) – Number of observations of the returns series.

返回:

value – An OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_cvar(T, alpha=0.05)[源代码]

Calculate the OWA weights to calculate the Conditional Value at Risk (CVaR) of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alpha (float, optional) – Significance level of CVaR. The default is 0.05.

返回:

value – An OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_wcvar(T, alphas, weights)[源代码]

Calculate the OWA weights to calculate the Weighted Conditional Value at Risk (WCVaR) of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alphas (list) – List of significance levels of each CVaR model.

  • weights (list) – List of weights of each CVaR model.

返回:

value – An OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_tg(T, alpha=0.05, a_sim=100)[源代码]

Calculate the OWA weights to calculate the Tail Gini of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alpha (float, optional) – Significance level of TaiL Gini. The default is 0.05.

  • a_sim (float, optional) – Number of CVaRs used to approximate the Tail Gini. The default is 100.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_wr(T)[源代码]

Calculate the OWA weights to calculate the Worst realization (minimum) of a returns series as shown in [D4].

参数:

T (int) – Number of observations of the returns series.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_rg(T)[源代码]

Calculate the OWA weights to calculate the range of a returns series as shown in [D4].

参数:

T (int) – Number of observations of the returns series.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_cvrg(T, alpha=0.05, beta=None)[源代码]

Calculate the OWA weights to calculate the CVaR range of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alpha (float, optional) – Significance level of CVaR of losses. The default is 0.05.

  • beta (float, optional) – Significance level of CVaR of gains. If None it duplicates alpha. The default is None.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_wcvrg(T, alphas, weights_a, betas=None, weights_b=None)[源代码]

Calculate the OWA weights to calculate the WCVaR range of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alphas (list) – List of significance levels of each CVaR of losses model.

  • weights_a (list) – List of weights of each CVaR of losses model.

  • betas (list, optional) – List of significance levels of each CVaR of gains model. If None it duplicates alpha. The default is None.

  • weights_b (list, optional) – List of weights of each CVaR of gains model. If None it duplicates weights_a. The default is None.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_tgrg(T, alpha=0.05, a_sim=100, beta=None, b_sim=None)[源代码]

Calculate the OWA weights to calculate the Tail Gini range of a returns series as shown in [D4].

参数:
  • T (int) – Number of observations of the returns series.

  • alpha (float, optional) – Significance level of Tail Gini of losses. The default is 0.05.

  • a_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of losses. The default is 100.

  • beta (float, optional) – Significance level of Tail Gini of gains. If None it duplicates alpha value. The default is None.

  • b_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of gains. If None it duplicates a_sim value. The default is None.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

OwaWeights.owa_l_moment_crm(T, k=4, method='MSD', g=0.5, max_phi=0.5, solver=None)[源代码]

Calculate the OWA weights to calculate a convex risk measure that considers higher linear moments or L-moments as shown in [D10].

参数:
  • T (int) – Number of observations of the returns series.

  • k (int) – Order of the l-moment. Must be an integer higher or equal than 2.

  • method (str, optional) –

    Method to calculate the weights used to combine the l-moments with order higher than 2. The default value is ‘MSD’. Possible values are:

    • ’CRRA’: Normalized Constant Relative Risk Aversion coefficients.

    • ’ME’: Maximum Entropy.

    • ’MSS’: Minimum Sum Squares.

    • ’MSD’: Minimum Square Distance.

  • g (float, optional) – Risk aversion coefficient of CRRA utility function. The default is 0.5.

  • max_phi (float, optional) – Maximum weight constraint of L-moments. The default is 0.5.

  • solver (str, optional) – Solver available for CVXPY. Used to calculate ‘ME’, ‘MSS’ and ‘MSD’ weights. The default value is None.

返回:

value – A OWA weights vector of size Tx1.

返回类型:

1d-array

Gerber 统计函数

GerberStatistic.gerber_cov_stat0(X, threshold=0.5)[源代码]

Compute Gerber covariance Statistics 0 or original Gerber statistics :cite: d-Gerber2021, not always PSD, however this function fixes the covariance matrix finding the nearest covariance matrix that is positive semidefinite.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • threshold (float) – threshold: threshold is between 0 and 1

返回:

value – Gerber covariance matrix of shape (n_features, n_features), where n_features is the number of features.

返回类型:

bool

抛出:

ValueError when the value cannot be calculated.

GerberStatistic.gerber_cov_stat1(X, threshold=0.5)[源代码]

Compute Gerber covariance Statistics 1 :cite: d-Gerber2021.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • threshold (float) – threshold: threshold is between 0 and 1

返回:

value – Gerber covariance matrix of shape (n_features, n_features), where n_features is the number of features.

返回类型:

bool

抛出:

ValueError when the value cannot be calculated.

GerberStatistic.gerber_cov_stat2(X, threshold=0.5)[源代码]

Compute Gerber covariance Statistics 2 :cite: d-Gerber2021.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • threshold (float) – threshold: threshold is between 0 and 1

返回:

value – Gerber covariance mtrix of shape (n_features, n_features), where n_features is the number of features.

返回类型:

bool

抛出:

ValueError when the value cannot be calculated.

CPP 函数

cppfunctions.duplication_matrix(n: int)[源代码]

Calculate duplication matrix of size “n” as shown in [D6].

参数:

n (int) – Number of assets.

返回:

D – Duplication matrix

返回类型:

np.ndarray

cppfunctions.duplication_elimination_matrix(n: int)[源代码]

Calculate duplication elimination matrix of size “n” as shown in [D6].

参数:

n (int) – Number of assets.

返回:

L – Duplication matrix

返回类型:

np.ndarray

cppfunctions.duplication_summation_matrix(n: int)[源代码]

Calculate duplication summation matrix of size “n” as shown in [D7].

参数:

n (int) – Number of assets.

返回:

S – Duplication summation matrix.

返回类型:

np.ndarray

辅助函数

AuxFunctions.is_pos_def(cov, threshold=1e-08)[源代码]

Indicate if a matrix is positive (semi)definite.

参数:

cov (ndarray) – Covariance matrix of shape (n_features, n_features), where n_features is the number of features.

返回:

value – True if matrix is positive (semi)definite.

返回类型:

bool

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.cov2corr(cov)[源代码]

Generate a correlation matrix from a covariance matrix cov.

参数:

cov (ndarray) – Covariance matrix of shape n_features x n_features, where n_features is the number of features.

返回:

corr – A correlation matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.corr2cov(corr, std)[源代码]

Generate a covariance matrix from a correlation matrix corr and a standard deviation vector std.

参数:
  • corr (ndarray) – Assets correlation matrix of shape n_features x n_features, where n_features is the number of features.

  • std (1darray) – Assets standard deviation vector of size n_features, where n_features is the number of features.

返回:

cov – A covariance matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.cov_fix(cov, method='clipped', threshold=1e-08)[源代码]

Fix a covariance matrix to a positive definite matrix.

参数:
  • cov (ndarray) – Covariance matrix of shape n_features x n_features, where n_features is the number of features.

  • method (str) – The default value is ‘clipped’, see more in cov_nearest.

  • **kwargs

    Other parameters from cov_nearest.

返回:

cov_ – A positive definite covariance matrix.

返回类型:

bool

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.cov_returns(cov, seed=0)[源代码]

Generate a matrix of returns that have a covariance matrix cov.

参数:

cov (ndarray) – Covariance matrix of shape n_features x n_features, where n_features is the number of features.

返回:

a – A matrix of returns that have a covariance matrix cov.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.commutation_matrix(cov)[源代码]

Generate the commutation matrix of the covariance matrix cov.

参数:

cov (ndarray) – Covariance matrix of shape n_features x n_features, where n_features is the number of features.

返回:

K – The commutation matrix of the covariance matrix cov.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.cokurtosis_matrix(Y)[源代码]

Calculates cokurtosis square matrix as shown in [D7].

参数:

Y (ndarray) – Returns series of shape n_sample x n_features.

返回:

K – The cokurtosis square matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.semi_cokurtosis_matrix(Y)[源代码]

Calculates semi cokurtosis square matrix as shown in [D7].

参数:

Y (ndarray) – Returns series of shape n_sample x n_features.

返回:

SK – The semi cokurtosis square matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.block_vec_pq(A, p, q)[源代码]

Calculates block vectorization operator as shown in [D11] and [D12].

参数:
  • A (ndarray) – Matrix that will be block vectorized.

  • p (int) – Order p of block vectorization operator.

  • q (int) – Order q of block vectorization operator.

返回:

bvec_A – The block vectorized matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.dcorr(X, Y)[源代码]

Calculate the distance correlation between two variables [D13].

参数:
  • X (1d-array) – Returns series, must have of shape n_sample x 1.

  • Y (1d-array) – Returns series, must have of shape n_sample x 1.

返回:

value – The distance correlation between variables X and Y.

返回类型:

float

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.dcorr_matrix(X)[源代码]

Calculate the distance correlation matrix of n variables.

参数:

X (ndarray) – Returns series of shape n_sample x n_features.

返回:

corr – The distance correlation matrix of shape n_features x n_features.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.numBins(n_samples, corr=None)[源代码]

Calculate the optimal number of bins for discretization of mutual information and variation of information.

参数:
  • n_samples (integer) – Number of samples.

  • corr (float, optional) – Correlation coefficient of variables. The default value is None.

返回:

bins – The optimal number of bins.

返回类型:

int

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.mutual_info_matrix(X, bins_info='KN', normalize=True)[源代码]

Calculate the mutual information matrix of n variables.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • bins_info (int or str) –

    Number of bins used to calculate mutual information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

  • normalize (bool) – If normalize variation of information. The default value is True.

返回:

corr – The mutual information matrix of shape n_features x n_features.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.var_info_matrix(X, bins_info='KN', normalize=True)[源代码]

Calculate the variation of information matrix of n variables.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • bins_info (int or str) –

    Number of bins used to calculate variation of information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

  • normalize (bool) – If normalize variation of information. The default value is True.

返回:

corr – The mutual information matrix of shape n_features x n_features.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.ltdi_matrix(X, alpha=0.05)[源代码]

Calculate the lower tail dependence index matrix using the empirical approach.

参数:
  • X (ndarray) – Returns series of shape n_sample x n_features.

  • alpha (float, optional) – Significance level for lower tail dependence index. The default is 0.05.

返回:

corr – The lower tail dependence index matrix of shape n_features x n_features.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.two_diff_gap_stat(codep, dist, clusters, max_k=10)[源代码]

Calculate the optimal number of clusters based on the two difference gap statistic [D14].

参数:
  • codep (DataFrame) – A codependence matrix.

  • dist (str, optional) – A distance measure based on the codependence matrix.

  • clusters (str, optional) – The hierarchical clustering encoded as a linkage matrix, see linkage for more details.

  • max_k (int, optional) – Max number of clusters used by the two difference gap statistic to find the optimal number of clusters. The default is 10.

返回:

k – The optimal number of clusters based on the two difference gap statistic.

返回类型:

int

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.fitKDE(obs, bWidth=0.01, kernel='gaussian', x=None)[源代码]

Fit kernel to a series of obs, and derive the prob of obs x is the array of values on which the fit KDE will be evaluated. It is the empirical Probability Density Function (PDF). For more information see chapter 2 of [D1].

参数:
  • obs (ndarray) – Observations to fit. Commonly is the diagonal of Eigenvalues.

  • bWidth (float, optional) – The bandwidth of the kernel. The default value is 0.01.

  • kernel (string, optional) –

    The kernel to use. The default value is ‘gaussian’. For more information see: kernel-density. Possible values are:

    • ’gaussian’: gaussian kernel.

    • ’tophat’: tophat kernel.

    • ’epanechnikov’: epanechnikov kernel.

    • ’exponential’: exponential kernel.

    • ’linear’: linear kernel.

    • ’cosine’: cosine kernel.

  • x (ndarray, optional) – It is the array of values on which the fit KDE will be evaluated.

返回:

pdf – Empirical PDF.

返回类型:

pd.series

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.mpPDF(var, q, pts)[源代码]

Creates a Marchenko-Pastur Probability Density Function (PDF). For more information see chapter 2 of [D1].

参数:
  • var (float) – Variance.

  • q (float) – T/N where T is the number of rows and N the number of columns

  • pts (int) – Number of points used to construct the PDF.

返回:

pdf – Marchenko-Pastur PDF.

返回类型:

pd.series

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.errPDFs(var, eVal, q, bWidth=0.01, pts=1000)[源代码]

Fit error of Empirical PDF (uses Marchenko-Pastur PDF). For more information see chapter 2 of [D1].

参数:
  • var (float) – Variance.

  • eVal (ndarray) – Eigenvalues to fit.

  • q (float) – T/N where T is the number of rows and N the number of columns.

  • bWidth (float, optional) – The bandwidth of the kernel. The default value is 0.01.

  • pts (int) – Number of points used to construct the PDF. The default value is 1000.

返回:

pdf – Sum squared error.

返回类型:

float

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.findMaxEval(eVal, q, bWidth=0.01)[源代码]

Find max random eVal by fitting Marchenko’s dist (i.e) everything else larger than this, is a signal eigenvalue. For more information see chapter 2 of [D1].

参数:
  • eVal (ndarray) – Eigenvalues to fit.

  • q (float) – T/N where T is the number of rows and N the number of columns.

  • bWidth (float, optional) – The bandwidth of the kernel.

返回:

pdf – First value is the maximum random eigenvalue and second is the variance attributed to noise (1-result) is one way to measure signal-to-noise.

返回类型:

tuple (float, float)

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.getPCA(matrix)[源代码]

Gets the Eigenvalues and Eigenvector values from a Hermitian Matrix. For more information see chapter 2 of [D1].

参数:

matrix (ndarray or pd.DataFrame) – Correlation matrix.

返回:

pdf – First value are the eigenvalues of correlation matrix and second are the Eigenvectors of correlation matrix.

返回类型:

tuple (float, float)

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.denoisedCorr(eVal, eVec, nFacts, kind='fixed')[源代码]

Remove noise from correlation matrix using fixing random eigenvalues and spectral method. For more information see chapter 2 of [D1].

参数:
  • eVal (ndarray) – Eigenvalues.

  • eVal – Eigenvectors.

  • nFacts (float) – The number of factors.

  • kind (str, optional) –

    The denoise method. The default value is ‘fixed’. Possible values are:

    • ’fixed’: takes average of eigenvalues above max Marchenko Pastour limit.

    • ’spectral’: makes zero eigenvalues above max Marchenko Pastour limit.

返回:

corr – Denoised correlation matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.shrinkCorr(eVal, eVec, nFacts, alpha=0)[源代码]

Remove noise from correlation using target shrinkage. For more information see chapter 2 of [D1].

参数:
  • eVal (ndarray) – Eigenvalues.

  • eVal – Eigenvectors.

  • nFacts (float) – The number of factors.

  • alpha (float, optional) – Shrinkage factor.

返回:

corr – Denoised correlation matrix.

返回类型:

ndarray

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.denoiseCov(cov, q, kind='fixed', bWidth=0.01, detone=False, mkt_comp=1, alpha=0)[源代码]

Remove noise from cov by fixing random eigenvalues of their correlation matrix. For more information see chapter 2 of [D1].

参数:
  • cov (ndarray or pd.DataFrame) – Covariance matrix of shape n_features x n_features, where n_features is the number of features.

  • q (float) – T/N where T is the number of rows and N the number of columns.

  • bWidth (float) – The bandwidth of the kernel.

  • kind (str, optional) –

    The denoise method. The default value is ‘fixed’. Possible values are:

    • ’fixed’: takes average of eigenvalues above max Marchenko Pastour limit.

    • ’spectral’: makes zero eigenvalues above max Marchenko Pastour limit.

    • ’shrink’: uses target shrinkage method.

  • detone (bool, optional) – If remove the firs mkt_comp of correlation matrix. The detone correlation matrix is singular, so it cannot be inverted.

  • mkt_comp (float, optional) – Number of first components that will be removed using the detone method.

  • alpha (float, optional) – Shrinkage factor.

返回:

cov_ – Denoised covariance matrix.

返回类型:

ndarray or pd.DataFrame

抛出:

ValueError when the value cannot be calculated.

AuxFunctions.round_values(data, decimals=4, wider=False)[源代码]

This function help us to round values to values close or away from zero.

参数:
  • data (np.ndarray, pd.Series or pd.DataFrame) – Data that are going to be rounded.

  • decimals (integer) – Number of decimals to round.

  • wider (float) – False if round to values close to zero, True if round to values away from zero.

返回:

value – Data rounded using selected method.

返回类型:

np.ndarray, pd.Series or pd.DataFrame

抛出:

ValueError – When the value cannot be calculated.

AuxFunctions.weights_discretizetion(weights, prices, capital=1000000, w_decimal=6, ascending=False)[源代码]

This function help us to find the number of shares that must be bought or sold to achieve portfolio weights according the prices of assets and the invested capital.

参数:
  • weights (pd.Series or pd.DataFrame) – Vector of weights of size n_assets x 1.

  • prices (pd.Series or pd.DataFrame) – Vector of prices of size n_assets x 1.

  • capital (float, optional) – Capital invested. The default value is 1000000.

  • w_decimal (int, optional) – Number of decimals use to round the portfolio weights. The default value is 6.

  • ascending (bool, optional) – If True assigns excess capital to assets with lower weights, else, to assets with higher weights. The default value is False.

返回:

n_shares – Number of shares that must be bought or sold to achieve portfolio weights.

返回类型:

pd.DataFrame

抛出:

ValueError – When the value cannot be calculated.

AuxFunctions.color_list(k)[源代码]

This function creates a list of colors.

参数:

k (int) – Number of colors.

返回:

colors – A list of colors.

返回类型:

list

参考文献

[D1] (1,2,3,4,5,6,7,8)

Marcos M. López de Prado. Machine Learning for Asset Managers. Elements in Quantitative Finance. Cambridge University Press, 2020. doi:10.1017/9781108883658.

[D2] (1,2)

Won-Min Song, T. Di Matteo, and Tomaso Aste. Hierarchical information clustering by means of topologically embedded graphs. PLOS ONE, 7(3):1–14, 03 2012. URL: https://doi.org/10.1371/journal.pone.0031929, doi:10.1371/journal.pone.0031929.

[D3] (1,2)

Wolfram Barfuss, Guido Previde Massara, T. Di Matteo, and Tomaso Aste. Parsimonious modeling with information filtering networks. Physical Review E, Dec 2016. URL: http://dx.doi.org/10.1103/PhysRevE.94.062306, doi:10.1103/physreve.94.062306.

[D4] (1,2,3,4,5,6,7,8,9,10)

Dany Cajas. Owa portfolio optimization: a disciplined convex programming framework. SSRN Electronic Journal, 2021. URL: https://doi.org/10.2139/ssrn.3988927, doi:10.2139/ssrn.3988927.

[D5]

Sander Gerber, Harry Markowitz, Philip Ernst, Yinsen Miao, Babak Javid, and Paul Sargen. The gerber statistic: a robust co-movement measure for portfolio optimization. SSRN Electronic Journal, 2021. URL: https://doi.org/10.2139/ssrn.3880054, doi:10.2139/ssrn.3880054.

[D6] (1,2,3)

Jan R. Magnus and H. Neudecker. The elimination matrix: some lemmas and applications. SIAM Journal on Algebraic Discrete Methods, 1(4):422–449, 1980. URL: https://doi.org/10.1137/0601049, arXiv:https://doi.org/10.1137/0601049, doi:10.1137/0601049.

[D7] (1,2,3,4)

Dany Cajas. Convex optimization of portfolio kurtosis. SSRN Electronic Journal, 2022. URL: https://doi.org/10.2139/ssrn.4202967, doi:10.2139/ssrn.4202967.

[D8]

Guido Previde Massara, T. D. Matteo, and T. Aste. Network filtering for big data: triangulated maximally filtered graph. J. Complex Networks, 5:161–178, 2017.

[D9]

Won-Min Song, T. Di Matteo, and Tomaso Aste. Nested hierarchies in planar graphs. Discrete Applied Mathematics, 159(17):2135–2146, 2011. URL: https://www.sciencedirect.com/science/article/pii/S0166218X11002794, doi:https://doi.org/10.1016/j.dam.2011.07.018.

[D10] (1,2)

Dany Cajas. Higher order moment portfolio optimization with l-moments. SSRN Electronic Journal, 2023. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4393155.

[D11]

missing booktitle in Loan1992

[D12]

Ignacio Ojeda. Kronecker square roots and the block vec matrix. The American Mathematical Monthly, 122(1):60, 2015. URL: https://doi.org/10.4169/amer.math.monthly.122.01.60, doi:10.4169/amer.math.monthly.122.01.60.

[D13]

Gábor J. Székely, Maria L. Rizzo, and Nail K. Bakirov. Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6):2769 – 2794, 2007. URL: https://doi.org/10.1214/009053607000000505, doi:10.1214/009053607000000505.

[D14]

Shihong Yue, Xiuxiu Wang, and Miaomiao Wei. Application of two-order difference to gap statistic. Transactions of Tianjin University, 14:217–221, 06 2008. doi:10.1007/s12209-008-0039-1.