The Michigan Secular Student Alliance: A First Look at Ross McKitrick's "A First Look"

Saturday, June 6, 2015

A First Look at Ross McKitrick's "A First Look"

[edit: Ross has been kind enough to reply here, and in fact continued to be kind so I will adjust some of the language here. He has written an updated version of his article, which can be read here. I'll also add edits as necessary to reflect our exchange (or language fixes) in the comments; they shall be in red. - 06/08/15, 3:00pm EST]

Recent corrections to buoy and ship temperature measurements have resulted in a renewed global temperature dataset that shows global warming "hiatus" since 1998 probably doesn't exist. Karl et al. (2015) comment on the effects of three corrections in particular:

Ship data have been shown to be consistently warmer than buoy data when they measure from the same region; this is important because the prevalence of buoy data has greatly increased within the past couple decades, so this introduces a known cooling bias in those recent years to the raw data. This correction was carried out by bringing the buoy measurements up by the global mean of these regional differences. It would have also been possible to correct this by bringing the ship measurements down; more on why this doesn't matter, and is in fact less preferable, later.
In addition to being more prevalent, buoy measurements are also more precise: they are subject to less noise than ship measurements are. As such, since we prefer to have precise measurements, they were weighted more relative to ship data, by a factor equal to the ratio between each method's measurement variance.
Finally, the ship data themselves are wrong as well. Two main methods exist for ship measurements: engine room intake, which overestimates sea surface temperatures due to exposure from heat in the engine room; and bucket haul measurements, which underestimate sea surface temperatures due to a heightened rate of heat loss from the bucket as it is hoisted from the ocean. Changes in the prevalence of certain methods relative to each other, as well as changes in the prevalence of insulated v. non-insulated buckets, affect temperature measurements over time. While corrections for ship data had been applied prior to the start of American involvement in World War II, ship metadata (basically, data about measurement method) shows that large changes in the relative frequency of these methods still occurred after the war. This insulated v. non-insulated bucket bias in particular was corrected by continuing a colocation comparison to nighttime marine air temperature measurements to the present day, instead of ending them in 1941 as before.

These corrections were part of the newly updated Extended Reconstructed Sea Surface Temperature (ERSST) record, now at Version 4. The first was detailed by Kennedy et al. (2011B), and the second and third by Huang et al. (2015A) (who provided the final ERSST.v4). As Karl et al. state in their paper, the contribution of each to the increased 2000-2014 trend of 0.064˚C/decade over ERSST.v3b was 0.014˚C/dec, 0.012˚C/dec, and 0.030˚C/dec.

People familiar with the wider array of indicators we have for global heat accumulation will know that the "hiatus" was an artifact of surface and atmospheric temperatures only. The largest heat sink, the global oceans (shown below), show no sign of such a hiatus, which is why instead of the "skeptic" cry of "global warming stopped in 1998", serious scientific inquiries have tried to investigate the alleged hiatus by minor changes in surface radiative forcing or changes in atmospheric circulation; or, in fact, an increase in ocean uptake.

The newest paper from Karl et al. essentially calls into question the idea of a hiatus at all, and instead blames it on incomplete sampling and a change in sea surface temperature measurement methodology over the same time period.

Watts Up With That has been hard at work trying to find some fault with the paper, and as shown by Sou at HotWhopper, Anthony probably isn't trying his best to provide an objective approach to analyzing the paper. (I greatly understate that.) But a somewhat serious person, Dr. Ross McKitrick, has commented on the paper there all the same, and I'd like to respond to some of his "first looks" at the new paper.

minor points

To get a couple small things out of the way: Ross starts off his article by saying that the idea of a hiatus is made by examining several datasets, all of them some version of surface temperatures or atmospheric temperatures (lower tropospheric in particular). His last dataset he says is from the 0-2000m Argo float network—actually, my dataset graphed above is from that. The caption of the figure he cited specifically says that's from 5-m data, again a surface record, and I am particularly disappointed that the graph started at the peak of the 1997-1998 El Nino, but so be it.

That all of these show the same "hiatus" is more an indication that they're all basically measuring the same thing. But even then, one has to wonder what McKitrick means by "examination", since especially for the first several datasets he links to, it's not at all clear that there has been a hiatus, especially with the rather (ahem) inconvenient truth of the data from the most recent ~12 months. But even without the 2014 data, has McKitrick done any analysis as Tamino has performed on the GISTEMP data? Change point analysis, ANOVA, fitting polynomials to residuals? My guess is he has not, just given that Tamino found, for each test, results that were (and I paraphrase only slightly) "not even close" to significant.

The "hiatus" appears to be clear if you compare model predictions with observations—but it really does help to know why models predict what they do, and what happens if you give the models the correct inputs and correct ENSO pattern, something "skeptics" don't seem keen on discussing in detail. Kevin Cowtan at Skeptical Science has done that in spades.

Major Points

Small potatoes aside, let me get to some of the more substantive points of McKitrick's post. His first point to make with regard to correction (1) is to point out what he thinks is a large uncertainty in the bias between buoys and ships (emphasis McKitrick's original):

However, Kennedy et al. note that the estimate is very uncertain: it is 0.12±1.7˚C!

McKitrick refers to Table 5 of Kennedy et al., which shows that the mean global estimate is 0.12˚C, with a standard deviation of 0.85˚C (so he obtained 1.7 by multiplying that by 2, which is fine). This is a rather curious mistake for someone who knows about statistics to make, because what was being calculated was a sample mean. (edit: revamped this next sentence for clarity) For sample means, you do not use the standard deviation of the samples, but the square root of the variance divided by the sample size (i.e. √[var(x)/n] ). Table 5 gives the standard error values as well, and it is those values that should be used for the estimate of the uncertainty of the mean. (If one wants to calculate the standard errors for each value, they can again refer to Table 5 where the overlap counts, i.e. the sample sizes, are clearly included in the right-most column.) The estimate is thus not very uncertain, but in fact very certain: 0.12±0.02˚C. Karl et al. used this global mean value.

[edit: Ross' correction is to clarify that the regional uncertainty is high for these values. I agree with him, but I do not think that this solves the standard error v. standard deviation problem, as sample size helps to fix that problem when calculating a mean. I can probably write a post to illustrate this.]

McKitrick's next point about (1) is actually to contest the use of the global mean itself, instead pointing out how the differences vary by region and that other analyses from the Hadley group (i.e. Kennedy et al.) and Hirahara et al. (2014) used the regional analyses.

He doesn't really provide any analysis to show why the use of the global mean would cause a very important deviation from the use of the regional means, which is understandable since, after all, these are new papers and McKitrick's probably doesn't have ready access to all of the data; but throwing in this faux uncertainty based off of nothing more than two other papers using the regional values should be discouraged. Or rather, one paper: Hirahara et al. used the global value for their analysis (emphasis mine):

The mean ERI bias of +0.13˚C is obtained and is within the range for the global region listed in Table 5 of Kennedy et al. (2011). The biases appear to vary regionally and seasonally with large biases in the Central North Pacific and the southern oceans. However, sampling data are insufficient to attribute the features to the ERI bias. Thus, only the global mean bias is used.

So Hirahara et al. seem to be of the opinion that regional analysis may not be appropriate. Either way, we should remember a common platitude at this point: all models are wrong, but some are useful. I think that, unless McKitrick would like to provide some alternative analysis that shows a very large difference between using the global mean value v. the regional values, the model used by Karl et al. (and Hirahara et al.) can be considered useful for what it intends to do: fix the known ship biases.

McKitrick does not contest correction (2), but says of correction (3) that:

However, this particular step has been considered before by Kennedy et al. and Hirahara et al., who opted for alternative methods in part because, as Kennedy et al. and others have pointed out, the NMAT data have their own "pervasive systematic errors", some of which were mentioned above.

Some more context would show this is incorrect though. The very next sentence in the Kennedy (2013) paper he cites, for instance, is this:

The use of NMAT to adjust SST data is, to an extent, unavoidable as the heat loss from a bucket does depend on the air-sea temperature difference.

The Kennedy (2013) paper discusses two methods that were proposed by previous researchers, both of which use NMAT data to figure out the relative fraction of insulated v. non-insulate buckets. The later one, used by Karl et al. (and also by Huang et al., as McKitrick fails to point out), was proposed by Smith and Reynolds (2002) and used collocated measurements, with the first being proposed by Folland and Parker (1995), who used:

[...] a simplified physical model of the buckets used to make SST measurements combined with fields of climatological air-temperature, SST, humidity, wind and solar radiation. [...] The fractional contributions of canvas and wooden buckets were estimated by assuming a linear change over time from a mix of wooden and canvas buckets to predominantly canvas buckets by 1920. The rate of this change was estimated by minimizing the air-sea temperature difference in the tropics. The same method was also used in Raynor et al. (2006) and Kennedy et al. (2011C).

In other words, the Kennedy et al. (2011) papers also used NMAT in some fashion to estimate the fractional contribution of insulated v. non-insulated buckets. Furthermore, again from Hirahara et al. in their section 4(c), they also used the Folland and Parker method, which uses NMAT; however, to Ross' point as well, Hirahara et al. used metadata in addition to NMAT to estimate the fraction of insulated v. non-insulated measurements, in particular after 1971 and in fact exclusively between 1941 and 1971.

So what does all of this mean? Essentially every bucket correction used NMAT in some fashion. Smith and Reynolds, Folland and Parker, Raynor et al., Hirahara et al., Kennedy et al., Huang et al., and Karl et al. All of the papers say this. In particular, discussion of Karl et al. should include discussion of Huang et al., which to my understanding used exclusively NMAT corrections to calculate the bucket biases; Karl et al. replicate their analysis directly. Ross suggests this method is not preferred, but only argues this based off of the use of another method by a different group. The comparisons between these two methods are shown in the first figure in the next section; the results seem similar between Huang et al. and Hirahara et al. in the 1998-2000 period in question, so I'm disinclined to agree there's a problem here.

"Numerical Example"

Several of Ross' simulated corrections that he provided in his numerical example do not seem to be justified given the types of corrections applied above. To start, he introduces a linearly increasing negative adjustment to the pseudo-ship data starting in 1940, and why? Who knows why; it doesn't seem at all relevant to any of the three corrections above, unless he thinks that is what the continued NMAT correction is. Ross does not really discuss the Huang et al. paper, where Karl et al. said they got the new ERSST.v4 corrections from. However, it should be brought up because from their Figure 6, which shows the full effect of the continued NMAT correction, we can see it is not a linear increase with time.

It doesn't even matter that much in the final model since the simulated ship fraction goes down with time, but since McKitrick seems to be so critical of Karl et al. for not choosing the particular NMAT method that he likes (for no particular reason), we should probably expect some due diligence here too.

This particular correction would cause a downward trend, though it doesn't really matter in the long run for two reasons. First, the buoy fraction takes over in Ross' example. Second, the model starts to diverge toward the end of the series is because of the massive over-correction applied to the pseudo-buoy series. Why does McKitrick assume that the correction should be a massive over-statement like this? He gives zero reason anywhere in his post, though since the quasi-independent estimates given by Kennedy et al. and Hirahara et al. are very close to each other, we might be safe in saying they're much more accurate than the overcorrection in Ross' model would imply. So, I'll choose the "correct" correction. What more, when you remove that ~~absolutely~~ false growing negative correction on the ship series, you get a more accurate model, as such.

The dip toward the end is due to the similarly massive negative "corrections" applied to the ship data in 1990 and 2000, as McKitrick thinks is necessary, though according to Huang et al. above is not, at least for 1990. But McKitrick's model does not seriously consider the actual reason that the NMAT correction is needed in the first place, which is to divine the fractional composition of insulated v. non-insulated bucket measurements. So It's not even clear at all that this correction should be simulated in this fashion; in fact it's almost certain that's not how it should be done at all.

If the accuracy of the NMAT correction is in question, then let's hear why, not these misleading/ambiguous statements about which researchers prefer one method over another. And let's certainly not vastly overestimate the correction in a numerical example to make a misleading point as well. Until the time when McKitrick (or any one else at WUWT) wants to take this paper seriously, I think the best thing to do is to consider the authors as knowing what they're talking about.

8 comments:

Victor VenemaJune 6, 2015 at 4:09 PM
"Table 5 gives the standard error values"
I was a little confused here because all info is in Table 5, just in different columns, maybe you could write:
"Table 5 also gives the standard error values"

A statistician confusing the sample standard deviation and the standard error of the mean, that is quite something. Scary what mitigation scepticism does to a person.
ReplyDelete
Replies
RossJune 8, 2015 at 10:44 AM
I have posted a revised version of the document at http://www.rossmckitrick.com/uploads/4/8/0/8/4808045/mckitrick_comments_on_karl2015_r1.pdf that addresses some of your criticisms. I will send it to Anthony to post at WUWT.
Regarding the use of SD or SE, the site-specific uncertainty is as indicated by the SD. I have revised the text to clarify that the bias uncertainty applies to the specific location.

As to whether it matters using region-specific information or the global mean, in several places you ask whether this or that adjustment matters. There is a prima facie case that the 3 main adjustments in K15 must matter since they themselves attribute their new results to them. This presumably includes, in part, using +0.12 everywhere rather than using variable adjustments based on metadata.

You are correct that Hirahara et al use a global mean adjustment, but they also use a model of variations in bucket type over time to change the weights. However they do not use regional variations--I misread them on that point and I've corrected that statement.

You are misreading my paragraph on adjustment #3 and your accusation of lying is uncalled for. For the post-1941 period Kennedy and Hirahara use MAT data to estimate aspects of their adjustment calculations but they also use metadata and they do not rely exclusively on NMAT. The K15 group rely on achieving a trendless difference between SST and NMAT in collocations to calibrate their bias coefficients, and they themselves state that this introduces a large change in the 1998-2000 interval that has a big effect on the end-of-sample trend. I have revised the text to clarify the point that other teams don't solely rely on fitting to NMAT but also use metadata.

I have nothing against NMAT or any other data set (in fact I've added the NMAT graphs to my document). The point I am making is that the K15 results arose not due to new data but new adjustments based on their expert judgment, and it is not immediately obvious that these are the correct decisions. The numerical example is not meant to replicate the construction of SST data, but to show how estimated adjustments can introduce trends not observed in the underlying data. You could construct any number of examples that go any way you like.
ReplyDelete
Replies

Add comment