Papers
Topics
Authors
Recent
Search
2000 character limit reached

Post-Screening Portfolio Selection

Published 19 Apr 2026 in q-fin.PM and stat.ME | (2604.17593v1)

Abstract: We propose post-screening portfolio selection (PS$2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$2$ with factors (FPS$2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.

Summary

  • The paper introduces a two-stage method where high-dimensional screening recovers the asset support before using low-dimensional estimation to set portfolio weights.
  • It establishes that the support of the optimal weight vector coincides with regression coefficients from a Lasso screening, providing strong theoretical guarantees and error bounds.
  • Empirical and simulation evidence, including tests on S&P 500 data, show that the FPS² variant outperforms benchmarks even in challenging, strong-factor environments.

Post-Screening Portfolio Selection: A Two-Step Approach to High-Dimensional Mean–Variance Investing

Introduction and Motivation

The paper "Post-Screening Portfolio Selection" (2604.17593) develops a conceptual and methodological framework for high-dimensional mean–variance portfolio optimization that rigorously distinguishes between the tasks of asset selection (support recovery) and portfolio weight estimation. In modern settings, the number of investable assets (NN) typically rivals or exceeds the sample size (TT), creating an ill-posed inference problem for conventional estimators of the covariance matrix and mean vector. While shrinkage and regularization strategies have become normative, their one-stage nature confounds support discovery and weight estimation, often amplifying numerical instability and reducing interpretability in the presence of pervasive factors or collinearities.

The authors propose post-screening portfolio selection (PS2^2), a two-stage protocol in which high-dimensional tools are used exclusively for initial identification of assets that matter for portfolio efficiency, followed by weight estimation using standard low-dimensional procedures on the reduced asset set. The conceptual innovation is a support identity: in the mean–variance framework, for any non-zero constant α\alpha, the support of the optimal population portfolio weights coincides with that of the coefficients in a regression of a constant on excess asset returns (without intercept). This reformulation enables rigorous support recovery via methods such as the Lasso, before re-deploying classical estimation in a reduced-dimension context. Figure 1

Figure 1: Schematic of PS2^2: screening a large asset set for eligible candidates, then estimating portfolio weights on the selected subset.

The paper also addresses the crucial breakdown of sparse screening in the presence of strong (pervasive) factors, introducing FPS2^2 (PS2^2 with factors), a version that defactors the asset returns and permits joint investing in both residuals and investable factors.

Theoretical Structure and Methodology

Support Identity and Two-Stage Portfolio Construction

Central to PS2^2 is the theoretical assertion that the support of ww^* — the mean–variance optimal weight vector — coincides with the support of regression coefficients from regressing a constant (without intercept) on excess returns. This allows the high-dimensional problem to be decomposed as:

  1. Support Recovery: Apply Lasso-type penalized regression (of a constant on returns) to recover a superset of the non-zero weights in ww^*.
  2. Low-Dimensional Estimation: Calculate portfolio weights by standard low-dimensional methods (plug-in, OLS) using only the selected assets, setting weights for non-selected assets to zero.

This strategy preserves stability of the final portfolio, even when TT0 is much larger than TT1, as the inversion of ill-conditioned covariance matrices is avoided on the full-dimensional space.

Strong Factor Failure and Residual Defactoring

Empirical asset returns often feature strong pervasive factors, rendering the mean vector and hence the screening target non-sparse and inducing severe multicollinearity (destabilizing Lasso selection). To resolve this, FPSTT2 incorporates a preliminary defactoring step: the asset returns are projected onto the orthogonal complement of the factor space; Lasso screening is performed on the defactored residuals, and the final portfolio includes both selected residuals and investable factors. Figure 2

Figure 2: Empirical correlation matrix of S&P 500 stocks: original (left), after defactoring using Fama–French 3 factors (right), showing reduced cross-sectional dependence.

This decomposition is theoretically justified by the explicit characterization: given the factor model TT3, the optimal portfolio on the augmented asset-factor space is sparse in the residual component, and the population solution reveals the precise contribution of investable factors versus unique “alphas.” Thus, residual-based screening and factor investing jointly reconstruct the efficient frontier in high dimensions.

Screening Methodology and Theoretical Guarantees

The screening step is carried out using Lasso or variants thereof, regressing a constant response on asset returns (or their residuals post-defactoring). The post-Lasso OLS estimator is supported by high-probability bounds under sub-Gaussian or sub-exponential tails and factor model conditions. Figure 3

Figure 3: Lasso screening power vs. signal strength as a function of sample size and non-sparsity; power degrades for stronger common shocks, underscoring the need for defactoring.

The main theoretical results include:

  • Sure-Screening Property: Under minimal signal conditions and regularization scaling TT4 (with factor strength TT5), the probability of capturing the true support in the selected set approaches 1 as TT6.
  • Error Control: The TT7 error of the post-screening weight estimator satisfies

TT8

where TT9 = portfolio sparsity.

  • Performance Preservation: The Sharpe ratio loss decays as 2^20; hence, risk-adjusted performance is retained under the stipulated high-dimensional conditions.

Comprehensive conditions for approximate sparsity (i.e., identification of portfolios with small but nonzero true weights) and for the validity of the defactoring correction are also elaborated.

Empirical Validation and Simulation Evidence

Monte Carlo Analysis

Simulation studies across four DGPs (idiosyncratic-only, weak-factors, strong-factors, and combined factor structures) systematically compare PS2^21, FPS2^22, and state-of-the-art alternatives (e.g., MAXSER, QIS, Graphical Lasso).

Key findings:

  • When sparse screening is plausible (idiosyncratic/weak-factor DGPs): PS2^23 identifies the true support with high power and low FDR as 2^24 increases. The mean squared error (MSE) and achieved Sharpe ratios notably outperform one-stage regularization-based methods, especially when 2^25 or 2^26.
  • When strong factors are present: PS2^27 fails unless defactoring is used. FPS2^28 (with factor investing) achieves screening power and estimation error close to the oracle (i.e., perfect support recovery), with only modest loss in Sharpe ratio even for moderate 2^29.

Empirical Application to S&P 500 Constituents

Using weekly S&P 500 returns and Fama–French 3 factors (2000–2025), rolling-window out-of-sample evaluation is performed for FPSα\alpha0 and competing portfolios.

  • Gross Sharpe Ratios: FPSα\alpha1 robustly outperforms both the factor-only (FF3) and equal-weighted portfolios pre-pandemic and in stable regimes, and delivers competitive performance relative to state-of-the-art plug-in and one-stage shrinkage estimators.
  • Net After Transaction Costs: The superiority of dynamic screening-based strategies attenuates but remains present in stable periods; performance degrades during volatile/post-pandemic periods, indicating sensitivity of cross-sectional signal extraction to regime shifts.

FPSα\alpha2 thus interpolates between the instability of naive high-dimensional inversion and the incompleteness of factor-only models, while preserving interpretability and computational feasibility.

Implications and Future Directions

The separation of support recovery from weight estimation offers a principled methodology for portfolio construction in the regime α\alpha3, reconciling the aspirations of sparse investing with the statistical challenges posed by high dimensionality and pervasive factor structure. The explicit decomposition clarifies why regularization alone is not a panacea and motivates residual-based approaches for dealing with strong factor-induced multicollinearity.

Implications:

  • Practical: PSα\alpha4 and FPSα\alpha5 enable scalable, interpretable, and stable portfolio construction for large asset universes, with direct utility for practitioners facing dimensionality bottlenecks and turnover/cost constraints.
  • Theoretical: The probabilistic screening guarantees provide a template for designing inference strategies in other high-dimensional economic or econometric settings where support recovery is more tractable than coefficient estimation.
  • Extensions: There is scope for developing extensions incorporating heavy-tailed returns, temporal dependence, non-linear or dynamic factor models, richer signals or constraints, and alternative screening invariants. Particularly, adaptive procedures for determining the factor structure and sectoral/industry hierarchies can further improve applicability and robustness.

Conclusion

"Post-Screening Portfolio Selection" delivers a structural rethinking of high-dimensional portfolio choice, rigorously formalizing a two-step approach that leverages scalable high-dimensional inference to reduce the burden on estimation and improves both empirical performance and interpretability (2604.17593). The framework elucidates failure modes of conventional sparse estimators in the presence of pervasive factors and constructs effective remedies via residualization and factor investing. Empirical and simulation results demonstrate significant gains over standard benchmarks in appropriate regimes, highlighting the potential for support-recovery-driven portfolio optimization in both academic research and applied quantitative asset management.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.