ETFs: What’s Better? Full Replication vs. Representative Sampling?

Passive investing has grown in popularity in recent decades relative to active investing. As it is argued, it is difficult, if not impossible, to outperform the market, which is supported by numerous academic papers that find the average mutual fund returns are equal to or lower than market returns. Passive investors, therefore, seek to match the market returns while minimizing the costs. These investors often skew towards exchange-traded funds (ETFs), which offer ownership in a large subset of the market by passively tracking the price movements of their underlying benchmark index. As a result, ETF returns are usually close to the market return, while their fees are typically low, especially relative to other, more active mutual funds.

ETFs employ two fundamentally distinct methods to replicate their underlying benchmark index. The more conventional method, physical replication, involves holding all constituent securities (full replication) or a representative sample (representative sampling) of the benchmark index. In contrast, the synthetic replication achieves the benchmark return by entering into a total return swap or another derivative contract with a counterparty, typically a large investment bank. As we have previously discussed, there is no significant difference in the tracking ability between the physical and synthetic ETFs in the long term. However, synthetic ETFs have more substantial tracking errors after a sudden increase in counterparty risk but are less affected by liquidity shocks than physically replicated ones. And while our article compares physical and synthetic ETFs, it does not address the differences between the full replication ETFs and sampling ETFs. Therefore, one may ask a question: “When selecting a physically replicated ETF, which replication method is better, full replication or representative sampling?”

A novel paper by Dyer and Guest (2022) offers several insights on this topic as it examines the tracking ability of 3,365 U.S.-based equity physical ETFs and mutual funds from 2010 through 2020. Among the studied funds, 52% use physical replication, 37% use representative sampling, and the remainder are “hybrids” that typically employ physical replication but may implement sampling under certain conditions. The study shows that sampling funds have higher turnover and expenses while earning worse returns relative to full replication funds. In particular, the differences in costs and returns translate into about 60 basis points lower returns for samplers per year on a net return basis. This finding is not driven by niche indices, as authors find similar results in the subsample of funds tracking the S&P 500 and other market-cap-based indices. However, differences between samplers and replicators disappear for funds following indices with many constituents (i.e., 1,000-3,000 stocks), suggesting that sampling can reduce replication costs in certain situations.

The replication vs. sampling dimension should be of interest to ETF investors, who are often encouraged to focus on factors such as expenses and fees when selecting investments. While these factors are certainly crucial, investors may have little awareness of the extent to which fund expenses and returns are driven by the mechanics, including replication vs. sampling, underlying their fund managers’ efforts to track a benchmark index.

Authors: Travis Dyer and Nicholas Guest

Title: A Tale of Two Index Funds: Full Replication vs. Representative Sampling



We examine the two approaches used by equity index funds to track their benchmark index. The first, full replication, mimics the index with exactness. The second, representative sampling, holds a subset of the index. We find that samplers trade 3-4 times more, have 30-50% higher expenses and fees, and earn 50-70 basis points lower annual returns, which is substantial given index funds’ mandate to limit tracking error to a few basis points. Samplers’ underperformance is not purely driven by higher expenses and transaction costs, but also poor stock picking. Overall, our analyses suggest representative sampling is detrimental to index investors.

As always we present several interesting figures and tables:

Notable quotations from the academic research paper:

“In the purest form of passive equity investing, an investor’s portfolio includes each stock in the market in exact proportion to its weight in the market (i.e., the total stock market index). However, for several reasons, including that it is impractical for most investors to hold several thousand stocks, funds typically attempt to replicate only a subset of the market, known as an index. They do so using one of two methods.

First, owning each stock in proportion to the underlying index is known as full replication. This strategy is challenging for many reasons, including that it typically requires adjustments to all (i.e., tens, hundreds, or thousands) of the portfolio’s positions each time an index adds or removes a stock. Many of the required adjustments are small and pertain to relatively illiquid stocks, which creates the potential for large trading costs that reduce the benefits of replication.

The second approach, called representative sampling, selects only a subset of index components for inclusion in the investor’s portfolio, but retains the goal of matching index returns. Of course, sampling creates the potential for even greater tracking errors and thus strays farther from the passive ideal. However, because the strategy requires holding fewer stocks, it may reduce trading costs, which would enhance returns. For example, because they do not hold the entire index, samplers might be able to avoid the most illiquid stocks or avoid trading following many instances of index reconstitution.

We show that sampling funds have higher turnover than replicating funds. This suggests that the active component of sampling, or the selection of stocks using variables other than index weights, more than offsets any reduction in trading arising from holding fewer positions. We also find that sampling funds have higher expense ratios and management fees, consistent with the costs of active selection more than outweighing the benefits of holding fewer positions, and with fund managers seeking compensation from investors for their efforts to actively invest. However, our examination of fund returns suggests these higher expenses and fees are not warranted because the sampling fund managers do not appear to be skilled at active investing. In particular, sampling funds’ returns are lower than replicating funds.

Several additional analyses support and extend our main results. First, our results hold in subsamples of S&P 500 indexers and other market-cap-based indexers, which helps rule out concerns that our findings are driven by one or a few peculiar indices, by “style” or “sector” funds, or by unobservable cross-index differences. Second, we find that our results are strongest among funds following indices with fewer constituent stocks, and that they entirely disappear for samplers following indices with 1,000 or more stocks. This suggests sampling is not harmful only when it can drastically reduce the number of stocks held in the portfolio. Third, we find that investors’ funds increasingly flow to samplers relative to replicators over our sample period, which is puzzling given our cost and return results.

The differences in costs, returns, and flows we document are economically significant. For example, replicators outperform samplers by about 60 basis points (bps) per year on a net return basis. To illustrate the potential wealth effects of this difference, consider a hypothetical investor who makes a one-time index investment of $100K at 35 years old and holds the investment for the next 30 years. Assuming a constant 8% annual return, the investor’s holding will be worth about $1,000K at age 65. However, if annual returns are 60 bps lower (i.e., 7.4%), then the value of the investor’s holding would only be about $850K at age 65. This $150K, or 15%, difference in portfolio value is approximately equivalent to losing the last two years of returns over the 30-year horizon.

Most importantly, our findings should be useful to fund managers trying to decide how to track an index, to plan sponsors selecting investment options for an organization’s employees, and to the ultimate investors trying to evaluate their index fund managers. The disparate approaches and outcomes of replication vs. sampling have been surprising to financial economists (including both academics and practitioners) with whom we have shared our results thus far. To us, this suggests that most mom-and-pop investors, and even many finance professionals, are likely similarly unaware of the distinctions.”

Are you looking for more strategies to read about? Sign up for our newsletter or visit our Blog or Screener.

Do you want to learn more about Quantpedia Premium service? Check how Quantpedia works, our mission and Premium pricing offer.

Do you want to learn more about Quantpedia Pro service? Check its description, watch videos, review reporting capabilities and visit our pricing offer.

Are you looking for historical data or backtesting platforms? Check our list of Algo Trading Discounts.

Or follow us on:

Facebook Group, Facebook Page, Twitter, Linkedin, Medium or Youtube

Share onRefer to a friend

Subscribe for Newsletter

Be first to know, when we publish new content

    The Encyclopedia of Quantitative Trading Strategies

    Log in

    We boasts a total prize pool of $15,000
    Gain a Share of a Total Prize Pool of $15.000
    Gain a Share of a Total Prize Pool
    - bi-weekly research insights -
    - tips on new trading strategies -
    - notifications about offers & promos -