Index Mapping For ETF Proxies

In order to present results as realistic as possible in our PAA-paper, we constructed long-term end-of-month data series for popular ETF proxies, like SPY, GLD and TLT (see paper appendix on SSRN). All data series start December 1969. For the pre-inception history, the proxies are derived from suitable indices. As part of a complete revision of the long-term data set, we recently improved the construction of the data series by mapping the underlying index through a linear formula to arrive at the best fit over the life span of the ETF to be replicated. The construction process is demonstrated below for EFA. The link to an example spreadsheet with all the necessary calculations is published at the end of this post.


EFA seeks to track the investment results of the MSCI EAFE index which is composed of large- and mid-capitalization developed market equities, excluding the U.S. and Canada. The index data is available as free download from the MSCI website. Comparing EFA’s historical data record against the various index levels supported by MSCI like Price, Gross, Net, reveals the MSCI EAFE Net index as underlying index. Historical dividend adjusted data for EFA itself is offered by Yahoo Finance, also for free. For constructing a long-term EFA proxy the data from both sources is required.

With the data readily available in Excel, the next step is to derive the data for the ETF-proxy from the underlying index for the In-Sample (IS) period. The goal is to map the underlying to arrive at the best fit over the life span (=IS) of the ETF through a linear formula: r+ = b * r + a, where “r” is the return of the index and “r+” is the return for the proxy. The values for the coefficients “a” and “b” are determined through Excel’s Solver add-in by minimizing the unexplained sum of squared deviations for the return series of EFA and the ETF-proxy.


After Solver finishes the calculation cycles, the found coefficients result in high R-Squared and correlation readings.