分层聚类组合优化

https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86
Buy Me a Coffee at ko-fi.com

理论介绍

分层聚类组合优化

Riskfolio-Lib 可以使用一些最新的机器学习资产配置模型。可用的模型有:

  • 分层风险平价(HRP) [C1], [C2], [C3]

  • 分层均等风险贡献(HERC) [C4], [C5], [C2].

  • 嵌套聚类优化(NCO) [C6], [C2].

在前两种情况下,我们可以选用以下 32 种风险度量,来使用普通的风险平价模型计算 HRP 和 HERC 的组合:

分散性风险度量:

  • 标准差。

  • 方差。

  • 平方根峰度。

  • 平均绝对偏差(MAD)。

  • 基尼平均差(GMD)。

  • 条件在险范围。

  • 尾部基尼范围。

  • 极差。

下行风险度量:

  • 半标准差。

  • 平方根半峰度。

  • 一阶下偏矩(欧米伽比率).

  • 二阶下偏矩(索提诺比率).

  • 在险价值(VaR)。

  • 条件在险价值(CVaR)。

  • 熵在险价值(EVaR)。

  • 相对在险价值(RLVaR)。

  • 尾部基尼。

  • 最坏情况实现(Minimax)。

回撤风险度量:

  • 复合和非复合累计收益的平均回撤。

  • 复合和非复合累计收益的溃疡指数。

  • 复合和非复合累计收益的在险回撤(DaR)。

  • 复合和非复合累计收益的条件在险回撤(CDaR)。

  • 复合和非复合累计收益的熵在险回撤(EDaR)。

  • 复合和非复合累计收益的相对在险回撤(RLDaR)。

  • 复合和非复合累计收益的最大回撤(卡玛比率)。

对于 NCO 模型,我们可以选择使用四个目标函数,每个目标都有可用的风险措施:

  • 使所选的风险度量最小化。

  • 最大化效用函数 \(\mu w - l phi_{i}(w)\)

  • 基于选定的风险度量,使风险调整后的收益率最大化。

  • 所选风险度量的同等风险贡献组合。

模块函数

class HCPortfolio.HCPortfolio(returns=None, alpha=0.05, a_sim=100, beta=None, b_sim=None, kappa=0.3, solver_rl=None, solvers=None, w_max=None, w_min=None, alpha_tail=0.05, gs_threshold=0.5, bins_info='KN')[源代码]

Class that creates a portfolio object with all properties needed to calculate optimal portfolios.

参数:
  • returns (DataFrame, optional) – A dataframe that containts the returns of the assets. The default is None.

  • alpha (float, optional) – Significance level of VaR, CVaR, EVaR, RLVaR, DaR, CDaR, EDaR, RLDaR and Tail Gini of losses. The default is 0.05.

  • a_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of losses. The default is 100.

  • beta (float, optional) – Significance level of CVaR and Tail Gini of gains. If None it duplicates alpha value. The default is None.

  • b_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of gains. If None it duplicates a_sim value. The default is None.

  • kappa (float, optional) – Deformation parameter of RLVaR and RLDaR, must be between 0 and 1. The default is 0.30.

  • solver_rl (str, optional) – Solver available for CVXPY that supports power cone programming. Used to calculate RLVaR and RLDaR. The default value is None.

  • solvers (list, optional) – List of solvers available for CVXPY used for the selected NCO method. The default value is None.

  • w_max (pd.Series or float, optional) – Upper bound constraint for hierarchical risk parity weights [C3].

  • w_min (pd.Series or float, optional) – Lower bound constraint for hierarchical risk parity weights [C3].

  • alpha_tail (float, optional) – Significance level for lower tail dependence index. The default is 0.05.

  • gs_threshold (float, optional) – Gerber statistic threshold. The default is 0.5.

  • bins_info (int or str) –

    Number of bins used to calculate variation of information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

optimization(model='HRP', codependence='pearson', covariance='hist', obj='MinRisk', rm='MV', rf=0, l=2, custom_cov=None, custom_mu=None, linkage='single', k=None, max_k=10, bins_info='KN', alpha_tail=0.05, gs_threshold=0.5, leaf_order=True, d=0.94, **kwargs)[源代码]

This method calculates the optimal portfolio according to the optimization model selected by the user.

参数:
  • model (str, can be {'HRP', 'HERC' or 'HERC2'}) –

    The hierarchical cluster portfolio model used for optimize the portfolio. The default is ‘HRP’. Possible values are:

    • ’HRP’: Hierarchical Risk Parity.

    • ’HERC’: Hierarchical Equal Risk Contribution.

    • ’HERC2’: HERC but splitting weights equally within clusters.

    • ’NCO’: Nested Clustered Optimization.

  • codependence (str, optional) –

    The codependence or similarity matrix used to build the distance metric and clusters. The default is ‘pearson’. Possible values are:

    • ’pearson’: pearson correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{pearson}_{i,j})}\).

    • ’spearman’: spearman correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{spearman}_{i,j})}\).

    • ’kendall’: kendall correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{kendall}_{i,j})}\).

    • ’gerber1’: Gerber statistic 1 correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{gerber1}_{i,j})}\).

    • ’gerber2’: Gerber statistic 2 correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{gerber2}_{i,j})}\).

    • ’abs_pearson’: absolute value pearson correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{pearson}_{i,j}|)}\).

    • ’abs_spearman’: absolute value spearman correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{spearman}_{i,j}|)}\).

    • ’abs_kendall’: absolute value kendall correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{kendall}_{i,j}|)}\).

    • ’distance’: distance correlation matrix. Distance formula \(D_{i,j} = \sqrt{(1-\rho^{distance}_{i,j})}\).

    • ’mutual_info’: mutual information matrix. Distance used is variation information matrix.

    • ’tail’: lower tail dependence index matrix. Dissimilarity formula \(D_{i,j} = -\log{\lambda_{i,j}}\).

    • ’custom_cov’: use custom correlation matrix based on the custom_cov parameter. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{pearson}_{i,j})}\).

  • covariance (str, optional) –

    The method used to estimate the covariance matrix: The default is ‘hist’. Possible values are:

    • ’hist’: use historical estimates.

    • ’ewma1’: use ewma with adjust=True. For more information see EWM.

    • ’ewma2’: use ewma with adjust=False. For more information see EWM.

    • ’ledoit’: use the Ledoit and Wolf Shrinkage method.

    • ’oas’: use the Oracle Approximation Shrinkage method.

    • ’shrunk’: use the basic Shrunk Covariance method.

    • ’gl’: use the basic Graphical Lasso Covariance method.

    • ’jlogo’: use the j-LoGo Covariance method. For more information see: [C7].

    • ’fixed’: denoise using fixed method. For more information see chapter 2 of [C8].

    • ’spectral’: denoise using spectral method. For more information see chapter 2 of [C8].

    • ’shrink’: denoise using shrink method. For more information see chapter 2 of [C8].

    • ’gerber1’: use the Gerber statistic 1. For more information see: [C9].

    • ’gerber2’: use the Gerber statistic 2. For more information see: [C9].

    • ’custom_cov’: use custom covariance matrix.

  • obj (str can be {'MinRisk', 'Utility', 'Sharpe' or 'ERC'}.) –

    Objective function used by the NCO model. The default is ‘MinRisk’. Possible values are:

    • ’MinRisk’: Minimize the selected risk measure.

    • ’Utility’: Maximize the Utility function \(\mu w - l \phi_{i}(w)\).

    • ’Sharpe’: Maximize the risk adjusted return ratio based on the selected risk measure.

    • ’ERC’: Equally risk contribution portfolio of the selected risk measure.

  • rm (str, optional) –

    The risk measure used to optimize the portfolio. If model is ‘NCO’, the risk measures available depends on the objective function. The default is ‘MV’. Possible values are:

    • ’equal’: Equally weighted.

    • ’vol’: Standard Deviation.

    • ’MV’: Variance.

    • ’KT’: Square Root Kurtosis.

    • ’MAD’: Mean Absolute Deviation.

    • ’MSV’: Semi Standard Deviation.

    • ’SKT’: Square Root Semi Kurtosis.

    • ’FLPM’: First Lower Partial Moment (Omega Ratio).

    • ’SLPM’: Second Lower Partial Moment (Sortino Ratio).

    • ’VaR’: Value at Risk.

    • ’CVaR’: Conditional Value at Risk.

    • ’TG’: Tail Gini.

    • ’EVaR’: Entropic Value at Risk.

    • ’RLVaR’: Relativistic Value at Risk.

    • ’WR’: Worst Realization (Minimax).

    • ’RG’: Range of returns.

    • ’CVRG’: CVaR range of returns.

    • ’TGRG’: Tail Gini range of returns.

    • ’MDD’: Maximum Drawdown of uncompounded cumulative returns (Calmar Ratio).

    • ’ADD’: Average Drawdown of uncompounded cumulative returns.

    • ’DaR’: Drawdown at Risk of uncompounded cumulative returns.

    • ’CDaR’: Conditional Drawdown at Risk of uncompounded cumulative returns.

    • ’EDaR’: Entropic Drawdown at Risk of uncompounded cumulative returns.

    • ’RLDaR’: Relativistic Drawdown at Risk of uncompounded cumulative returns.

    • ’UCI’: Ulcer Index of uncompounded cumulative returns.

    • ’MDD_Rel’: Maximum Drawdown of compounded cumulative returns (Calmar Ratio).

    • ’ADD_Rel’: Average Drawdown of compounded cumulative returns.

    • ’DaR_Rel’: Drawdown at Risk of compounded cumulative returns.

    • ’CDaR_Rel’: Conditional Drawdown at Risk of compounded cumulative returns.

    • ’EDaR_Rel’: Entropic Drawdown at Risk of compounded cumulative returns.

    • ’RLDaR_Rel’: Relativistic Drawdown at Risk of compounded cumulative returns.

    • ’UCI_Rel’: Ulcer Index of compounded cumulative returns.

  • rf (float, optional) – Risk free rate, must be in the same period of assets returns. The default is 0.

  • l (scalar, optional) – Risk aversion factor of the ‘Utility’ objective function. The default is 2.

  • custom_cov (DataFrame or None, optional) – Custom covariance matrix, used when codependence or covariance parameters have value ‘custom_cov’. The default is None.

  • custom_mu (DataFrame or None, optional) – Custom mean vector when NCO objective is ‘Utility’ or ‘Sharpe’. The default is None.

  • linkage (string, optional) –

    Linkage method of hierarchical clustering. For more information see linkage. The default is ‘single’. Possible values are:

    • ’single’.

    • ’complete’.

    • ’average’.

    • ’weighted’.

    • ’centroid’.

    • ’median’.

    • ’ward’.

    • ’DBHT’: Direct Bubble Hierarchical Tree.

  • k (int, optional) – Number of clusters. This value is took instead of the optimal number of clusters calculated with the two difference gap statistic. The default is None.

  • max_k (int, optional) – Max number of clusters used by the two difference gap statistic to find the optimal number of clusters. The default is 10.

  • bins_info (int or str) –

    Number of bins used to calculate variation of information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

  • alpha_tail (float, optional) – Significance level for lower tail dependence index. The default is 0.05.

  • gs_threshold (float, optional) – Gerber statistic threshold. The default is 0.5.

  • leaf_order (bool, optional) – Indicates if the cluster are ordered so that the distance between successive leaves is minimal. The default is True.

  • d (scalar) – The smoothing factor of ewma methods. The default is 0.94.

  • **kwargs – Other variables related to covariance estimation. See Scikit Learn and chapter 2 of [D1] for more details.

返回:

w – The weights of optimal portfolio.

返回类型:

DataFrame

参考文献

[C1]

Marcos López de Prado. Building diversified portfolios that outperform out of sample. The Journal of Portfolio Management, 42(4):59–69, 2016. URL: https://jpm.pm-research.com/content/42/4/59, arXiv:https://jpm.pm-research.com/content/42/4/59.full.pdf, doi:10.3905/jpm.2016.42.4.059.

[C2] (1,2,3)

Daniel Sjöstrand and Nima Behnejad. Exploration of hierarchical clustering in long-only risk-based portfolio optimization. Master's thesis, Copenhagen Business School, Solbjerg Pl. 3, 2000 Frederiksberg, Denmark, 5 2020. URL: https://research-api.cbs.dk/ws/portalfiles/portal/62178444/879726_Master_Thesis_Nima_Daniel_15736.pdf.

[C3] (1,2,3)

Johann Pfitzinger and Nico Katzke. A constrained hierarchical risk parity algorithm with cluster-based capital allocation. Working Papers 14/2019, Stellenbosch University, Department of Economics, 2019. URL: https://ideas.repec.org/p/sza/wpaper/wpapers328.html, doi:.

[C4]

Thomas Raffinot. Hierarchical clustering-based asset allocation. The Journal of Portfolio Management, 44(2):89–99, December 2017. URL: https://doi.org/10.3905/jpm.2018.44.2.089, doi:10.3905/jpm.2018.44.2.089.

[C5]

Thomas Raffinot. The hierarchical equal risk contribution portfolio. 08 2018. doi:10.2139/ssrn.3237540.

[C6]

Marcos Prado. A robust estimator of the efficient frontier. SSRN Electronic Journal, pages, 01 2019. doi:10.2139/ssrn.3469961.

[C7]

Wolfram Barfuss, Guido Previde Massara, T. Di Matteo, and Tomaso Aste. Parsimonious modeling with information filtering networks. Physical Review E, Dec 2016. URL: http://dx.doi.org/10.1103/PhysRevE.94.062306, doi:10.1103/physreve.94.062306.

[C8] (1,2,3)

Marcos M. López de Prado. Machine Learning for Asset Managers. Elements in Quantitative Finance. Cambridge University Press, 2020. doi:10.1017/9781108883658.

[C9] (1,2)

Sander Gerber, Harry Markowitz, Philip Ernst, Yinsen Miao, Babak Javid, and Paul Sargen. The gerber statistic: a robust co-movement measure for portfolio optimization. SSRN Electronic Journal, 2021. URL: https://doi.org/10.2139/ssrn.3880054, doi:10.2139/ssrn.3880054.