<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: When r2 Regression is Very High</title>
	<atom:link href="http://landshape.org/enm/when-r2-regression-is-very-high/feed/" rel="self" type="application/rss+xml" />
	<link>http://landshape.org/enm/when-r2-regression-is-very-high/</link>
	<description>The power of numeracy</description>
	<lastBuildDate>Thu, 29 Jul 2010 16:43:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Martin Ringo</title>
		<link>http://landshape.org/enm/when-r2-regression-is-very-high/comment-page-1/#comment-2172</link>
		<dc:creator>Martin Ringo</dc:creator>
		<pubDate>Thu, 27 Jul 2006 01:53:51 +0000</pubDate>
		<guid isPermaLink="false">http://landshape.org/enm/?p=124#comment-2172</guid>
		<description>David,
I shouldn&#039;t be saying this, at least in the sense that I am supposed to be an econometrician, but there are some quick and dirty ways to look at the magnitude of the serial correlation.  Simply drop a lag (Y(t-1), Y(t-2), ... where Y is the dependent variable) into the equation and look at the effect on the coefficients of the X&#039;s.  If the effect is significant (relative the standard errors in the equation with the lag structure), the get series about the serial correlation (Cochrane-Orcutt or Prais-Winston for first order autocorrelation then move to second etc. or just go to an ARMA estimating routine -- R has them although I don&#039;t know how well they work -- and remember that the equation now includes an implicit lag structure).
Steve McIntrye&#039;s point about the Durbin-Watson statistic being cause for rejection of a regression result is not really correct.  First, remember that OLS estimates are unbiased even with serial correlation (a point many of his readers apparently never understood).  The issue is one of efficiency (size of variance), hence check against a model with a lag structure (or maybe better just estimate the 1st order correlation from the residuals and see if it scares you).  Second, in many cases for non-time series data, regressions produce horrible D-W statistics.  Just ignore them.  For time series, of course, the D-W (or even better the Breusch-Godfrey Serial Correlation Lagrange Multiplier Test or ARCH version) should be used, but with a small sample size, such as 10, one shouldn&#039;t be using time series.  Yes, I have done it, but I was living in sin at time although given was I have seen in climate reconstructions, I am beginning to think I might have been saint. :-)</description>
		<content:encoded><![CDATA[<p>David,<br />
I shouldn&#8217;t be saying this, at least in the sense that I am supposed to be an econometrician, but there are some quick and dirty ways to look at the magnitude of the serial correlation.  Simply drop a lag (Y(t-1), Y(t-2), &#8230; where Y is the dependent variable) into the equation and look at the effect on the coefficients of the X&#8217;s.  If the effect is significant (relative the standard errors in the equation with the lag structure), the get series about the serial correlation (Cochrane-Orcutt or Prais-Winston for first order autocorrelation then move to second etc. or just go to an ARMA estimating routine &#8212; R has them although I don&#8217;t know how well they work &#8212; and remember that the equation now includes an implicit lag structure).<br />
Steve McIntrye&#8217;s point about the Durbin-Watson statistic being cause for rejection of a regression result is not really correct.  First, remember that OLS estimates are unbiased even with serial correlation (a point many of his readers apparently never understood).  The issue is one of efficiency (size of variance), hence check against a model with a lag structure (or maybe better just estimate the 1st order correlation from the residuals and see if it scares you).  Second, in many cases for non-time series data, regressions produce horrible D-W statistics.  Just ignore them.  For time series, of course, the D-W (or even better the Breusch-Godfrey Serial Correlation Lagrange Multiplier Test or ARCH version) should be used, but with a small sample size, such as 10, one shouldn&#8217;t be using time series.  Yes, I have done it, but I was living in sin at time although given was I have seen in climate reconstructions, I am beginning to think I might have been saint. <img src='http://landshape.org/enm/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://landshape.org/enm/when-r2-regression-is-very-high/comment-page-1/#comment-2114</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Mon, 24 Jul 2006 14:00:53 +0000</pubDate>
		<guid isPermaLink="false">http://landshape.org/enm/?p=124#comment-2114</guid>
		<description>Hi Martin, Thanks for that.  Referring to Steve post exerpted from Percival, I think it is a very clear cautionary tail of exactly the problem, small N and high rho series, which is what it was trying to do.  Rho&#039;s this high are seen in temperature series, so its not a moot point.  The remedies you suggest while things that should be done, are often not done, and that is the point of the article.  When I speak to ecologists about autocorrelation I don&#039;t think they relalize the magnitude of error possible.  And N=10 is not uncommon.  Though the specifics of each situation are different and need to be looked at case by case.</description>
		<content:encoded><![CDATA[<p>Hi Martin, Thanks for that.  Referring to Steve post exerpted from Percival, I think it is a very clear cautionary tail of exactly the problem, small N and high rho series, which is what it was trying to do.  Rho&#8217;s this high are seen in temperature series, so its not a moot point.  The remedies you suggest while things that should be done, are often not done, and that is the point of the article.  When I speak to ecologists about autocorrelation I don&#8217;t think they relalize the magnitude of error possible.  And N=10 is not uncommon.  Though the specifics of each situation are different and need to be looked at case by case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martin Ringo</title>
		<link>http://landshape.org/enm/when-r2-regression-is-very-high/comment-page-1/#comment-1974</link>
		<dc:creator>Martin Ringo</dc:creator>
		<pubDate>Wed, 19 Jul 2006 21:51:28 +0000</pubDate>
		<guid isPermaLink="false">http://landshape.org/enm/?p=124#comment-1974</guid>
		<description>David,

Your &quot;Spurious #5: Variance of Autocorrelated Process&quot; is a reference that one should read only after a good grounding in the Classical Linear Model of regression and the violation of that model&#039;s  zero serial correlation assumptions by an autoregressive error term.  Percival, the author of &quot;Three Curious Properties of the Sample Variance and Autocovariance for Stationary Processes with Unknown Mean,&quot; is making a big deal about some things that are seldom a problem.  

He biggest point is that the variance of a subsegment of a series is a serious underestimate of the process variance.  Yes that is true, but we only have subsegments, i.e. finite N.  If N is small relative 1 over rho square (rho being the true autocorrelation coefficient), then there are real problems.  Percival&#039;s example has 1/(1-rho^2) = 167 and N = 10, hence the problems he shows because N is 6% of 1/(1-rho^2).  When that ratio is, say, over 10, there isn&#039;t the extreme degree of underestimation.  With the Percival rho of 0.997, using an N of 1670 will leave an underestimate of about 20-30%.

Further, if the autocorrelation really was 0.997, then the series would show unit roots, which means either the researcher should be working with first differences or with a cointegrated model (if one can be found). 

Finally, Percival &quot;cheats&quot; a bit in his example of a process X(t)= b*t.  If that was the process, it would be easily seen, and furthermore it is not stationary thus the ACF does not have it expected meaning.  Thus, the example is making a point about the estimated ACF, but the example, itself, has no real practical purpose.

My criticisms of Percival should not be taken to mean that I don&#039;t think the article is worth reading.  Rather, it should not be read by people who don&#039;t have a bit more than a basic understanding of the time series models.</description>
		<content:encoded><![CDATA[<p>David,</p>
<p>Your &#8220;Spurious #5: Variance of Autocorrelated Process&#8221; is a reference that one should read only after a good grounding in the Classical Linear Model of regression and the violation of that model&#8217;s  zero serial correlation assumptions by an autoregressive error term.  Percival, the author of &#8220;Three Curious Properties of the Sample Variance and Autocovariance for Stationary Processes with Unknown Mean,&#8221; is making a big deal about some things that are seldom a problem.  </p>
<p>He biggest point is that the variance of a subsegment of a series is a serious underestimate of the process variance.  Yes that is true, but we only have subsegments, i.e. finite N.  If N is small relative 1 over rho square (rho being the true autocorrelation coefficient), then there are real problems.  Percival&#8217;s example has 1/(1-rho^2) = 167 and N = 10, hence the problems he shows because N is 6% of 1/(1-rho^2).  When that ratio is, say, over 10, there isn&#8217;t the extreme degree of underestimation.  With the Percival rho of 0.997, using an N of 1670 will leave an underestimate of about 20-30%.</p>
<p>Further, if the autocorrelation really was 0.997, then the series would show unit roots, which means either the researcher should be working with first differences or with a cointegrated model (if one can be found). </p>
<p>Finally, Percival &#8220;cheats&#8221; a bit in his example of a process X(t)= b*t.  If that was the process, it would be easily seen, and furthermore it is not stationary thus the ACF does not have it expected meaning.  Thus, the example is making a point about the estimated ACF, but the example, itself, has no real practical purpose.</p>
<p>My criticisms of Percival should not be taken to mean that I don&#8217;t think the article is worth reading.  Rather, it should not be read by people who don&#8217;t have a bit more than a basic understanding of the time series models.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.324 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-07-30 14:05:29 -->
