Opinion Solidly Addresses the Second of Two Major Claims Made by Sheriff Joe Arpaio’s Press Conference in July.On July 18, Sheriff Joe Arpaio of Maricopa County, Arizona (“America’s Toughest Sheriff”) stated, at a press conference, that President Obama’s birth certificate was a “proven” forgery.
In doing so, Arpaio relied almost entirely on two things. The first of these two major claims has been examined here.
The second was the opinion of two individuals — Garrett Papit and Tim Selaty, Jr. — that the characteristics of the birth certificate PDF simply could not be innocently explained.
Mr. Papit’s report was published by Arpaio’s office. But although the Sheriff’s office mentioned Mr. Selaty’s name, they didn’t publish his report or give him much prominence. Instead, Mr. Selaty had to post the report on his own web site. That seemed odd at the time. However, it seems to me that this was probably for two reasons.
First, Selaty didn’t reach all of the same conclusions that some of the other individuals quoted by the Posse did. It appears that publishing Selaty’s report would have raised questions about why this was the case.
In other words, the testimony of their “experts” was not consistent (!)
Secondly, Mr. Selaty is approximately 22 years old. One might easily wonder — with no disrespect intended to Selaty, who seems to have decent enough general computer skills for someone at the beginning of a career — why, exactly, was Arpaio’s Posse relying on a 22 year old for an “expert opinion” regarding the birth certificate of the President of the United States?
Personally, if I were accusing the President of the United States of having a forged birth certificate, I would rely on real, recognized experts in the appropriate fields. And most certainly, no 22-year-olds would be in the mix. Unless, of course, they were legitimate, indisputable, widely recognized, certifiable geniuses who had finished their first PhD by, say, age 18.
Again, no disrespect intended to Tim Selaty. But this is not the case here.
Garrett Papit Claims to Have “Proven” that Obama’s PDF File Has Been “Tampered With.”It didn’t take long for Garrett Papit, the more prominently-showcased individual, to show up at this blog and proclaim with confidence that he had proven Mr. Obama’s birth certificate PDF file had been “tampered with.”
He was very emphatic about that point.
For any readers who are unfamiliar with the background of the issue, my own technical evaluation — published just over a year ago after some 500 hours of completely independent investigation — was that every single characteristic of the PDF file that was known at that time could easily be explained by innocent processes. The most prominent of these was optimization (or compression) of the file.
In fact, I found the technical evidence to be overwhelming in that direction. In other words, not one of the significant birther allegations — and I examined all of them — stood up under honest and competent scrutiny.
And going beyond those to test Obama’s PDF in every additional way I could come up with, I was unable to uncover any other genuine evidence that would demonstrate any manner of “forgery.”
Garrett Papit, of course, claimed that I was absolutely wrong about the technical evidence. In fact, he went further than merely disagreeing with me technically. Bizarrely, Mr. Papit publicly and falsely accused me of having “lied” about myself “on multiple occasions.”
Challenged to prove his charges, Papit could not do so. He eventually retracted his false accusation and apologized. However, it took several days for him to do so.
The Essence of Garrett Papit’s Technical ClaimsThe gist of Mr. Papit’s paper — the one publicized by Arpaio’s office — was that there are really only two forms of file optimization or compression — and that the characteristics of Obama’s file most assuredly fit neither.
Probably most importantly, he claimed that “mixed raster content” compression never produces more than one bitmap layer. Since Obama’s PDF file has 8 bitmap layers, this claim — if true — could raise some legitimate questions.
After reviewing Mr. Papit’s paper, I personally found his technical claims to be dubious — especially in regard to “mixed raster content” file compression. They seemed to be more assertions on Papit’s part than solid statements backed up by any documented technical evidence. And there were other strong reasons (perhaps a topic for another post) to doubt Papit’s claims.
Nonetheless, I did not have at my hands any immediate and concrete proof that would categorically show his technical claims to be wrong.
In fact, while I have broad experience in many different aspects of computer technology dating back more than 30 years, I am not an expert in the specific field of mixed raster content (MRC) compression.
Neither, by the way, is Garrett Papit.
Papit is basically a computer programmer for JC Penney. He has a Bachelor’s Degree in Computer Information Systems, and an MBA with a concentration in Managerial Information Systems.
In his paper, he states that he is “also familiar with the methodology involved in PDF optimization and compression.”
I’m sorry, but “familiar with” doesn’t cut it for the type of blanket claims that Papit has made in his paper — and on this site as well. I’m “familiar with” carburetors, but I can’t give you a definitive opinion of what one is and is not technically capable of. And if I were to pretend to do so, I would be far exceeding my level of expertise regarding carburetors. That would be unprofessional… at best.
Evaluating Papit’s Claims
In order to evaluate Garrett Papit’s claims regarding mixed raster content file processing, I searched for US patents on the basic technology — and found well over 100 of them.
It seems that MRC compression is not quite as simple and cut-and-dried a process as Mr. Papit — in the half-dozen sparse pages he wrote about the technology — would have people to believe.
There were two possible ways to evaluate the truth of Papit’s claims. The first would have been to invest a great deal of time to really learn and understand the various possible ins and outs of mixed raster content (MRC) compression. And the second option was to consult a real, existing expert in the field.
I chose the second option, for two reasons. First, I’ve invested far too much of my time into this issue already. I didn’t want to invest the time to develop an entire body of knowledge about MRC compression, without any need or intent to apply that knowledge in my own profession. And secondly, I felt that the opinion of a recognized expert in this field would be of far more value than my own opinion anyway.
For These Reasons, I Contacted a Real Expert. In Fact, I Contacted One of the Foremost Experts in the World.
As I examined patents and technical papers written on MRC compression, one name in particular seemed to pop up again and again — that of Ricardo de Queiroz.
Ricardo de Queiroz is one of the primary fathers of this entire technology.
The very first “mixed raster content” patent in the United States was granted to Leon Bottou and Yann Andre LeCun… But the 2nd, 4th, 5th, 7th, 8th, and 13th patents were granted to Ricardo de Queiroz and his team. That’s about half of the first dozen or so patents. And some of his team members and students have also gone on to further develop the technology.
In addition, Professor de Queiroz appears again and again as an author of the available technical papers on MRC compression.
Now there are certainly many other individuals who have contributed to the development of this technology; and several in particular have made really big contributions. But I decided, based on what I read in the patent filings and technical papers, that if I were going to contact one expert in the world on this particular technology, the person I would pick would be Ricardo de Queiroz.
So I contacted him. And Dr. de Queiroz was gracious enough to reply — for which I thank him. In clarifying what compression technology is capable of, he has rendered a genuine service to all who have held any interest in this controversy.
Before I present Professor de Queiroz’s response to my inquiries, we should briefly note a couple of other things.
1) Professor de Queiroz did not simply volunteer an opinion on Obama’s birth certificate PDF. His expression of an opinion was a response to being asked for an opinion.
2) American politics has little impact on a Brazilian living in Brazil. This being the case, it would be very difficult to attribute any political motive to a technical opinion expressed by Dr. de Queiroz.
3) In any event, a technical expert of de Queiroz’s stature, generally speaking, gives technical opinions, not political ones. And such an expert, generally speaking, would never risk the reputation he’s built up over decades by issuing an opinion that would be easily shot down by the other top experts in his field.
4) In terms of expertise in this specific field:
As high school football player is to Tim Tebow.
As Sunday golfer is to Tiger Woods.
As high school physics teacher is to Stephen Hawking.
Well, you get the idea.
Now it’s very possible that Garrett Papit might well be a better banjo player than Ricardo de Queiroz. He might even be better at wood carving, or Volkswagen engine repair.
But when it comes to MRC compression, he doesn’t even approach being in the same league. (Nor, incidentally, do I — or virtually anybody else who’s ever previously commented publicly on the artifacts in the PDF.)
Professor de Queiroz’s evaluation was expressed in an email letter to me. I have boldfaced some of the most important points. I have also added a few notes of my own, in brackets.
Evaluation of Obama PDF File by Professor Ricardo de Queiroz
Dear Mr. Woodman,
There is no possible way I can tell if the PDF of President Obama’s birth certificate (POBC) made available by the White House is a “forgery” or not. The forgery can happen before being processed not to mention that the paper document itself could be forged, before the scanning. Thus, this is not the point.
[Note: This is very similar to what I said in my book on the birth certificate — JW.]
The question is whether all these artifacts we see after rendering the PDF of POBC are signs of forgery. I do not see that. I see them more likely as a result of inadequate processing.
The document has poor quality and it has been aggressively processed, no questions about it. The question is whether the corruptive processing was individual with the intent of forging it, or if it was automated within regular MRC segmentation.
If it was a forgery it was a very sloppy job. Any photoshop-knowledgeable person, of the garden variety, can do a much better job than that. If it is automated, it is a lousy job too, but bear in mind that algorithms for these jobs are not trained on specific documents. They were more likely developed, trained and tested on magazine pages and books. A US birth certificate is unlikely to give good results because it may be an outlier in the big picture of all documents they had in mind when developed their MRC tool.
MRC is about separating the single-image document into multiple layers, hopefully each one with a given characteristic. This has to be done automatically, in what we call segmentation. What I see in the document are signs of MRC segmentation consistent with strategies in line with the techniques pioneered by DjVu. I (and my students) do not advocate doing the segmentation that way, but that is not the point either. In fact, I would not be surprised if the software which segmented the WH document was derived from some DjVu tool.They first try to “lift” the text to another layer. They can find more than one type of text and place them in different layers. The rest is background and they compress with standard image compression methods. In the POBC [President Obama Birth Certificate] I see lots of signs of that. It missed a lot of text, like the R in BARACK and in many other places. The missed text is aggressively compressed with JPEG for example, which justifies the damage to those text parts.
About the halos around some text: I am not sure why they do it, but it may be trying to suppress another halo problem caused by “lifting” scanned text that leaves some of the foreground in the background and vice-versa causing trouble to compress the layers. We wrote some papers about it. You can still see background through inside some “O” letters and inside the check boxes.
There might be morphological dilation around the text mask or the segmentation is block-based. The halo could be caused by the foreground in a dilated mask, or by processing the background. One plausible alternative is that the algorithm finds text as the letters with a bit of the surrounding background for safety. Some Adobe tools do that.
Furthermore, the text is lifted to the foreground and sharpened (nearly binarized) making the background surroundings to disappear. When the text layer is pushed back onto the background plane the letter surroundings become halo. There is also some grayish lifted text, which was perhaps found to have different statistics and was then treated differently. The mask is binary, the foreground (text) can have any color or texture, or even parts of the background around the text. All these are conjectures; different algorithmic choices might produce similar results.
I took a birth certificate which has a similar background pattern, scanned and compressed using an older DjVu tool. It has shown the same problems as POBC, like text letters that were missed and sent to background, and multiple text styles. It didn’t have halo, though, because its algorithm decided to obliterate the whole background pattern. Perhaps if I had time to toy around with packages and parameters I might find something very close to what was used to generate the document shown by the WH, but I unfortunately do not have the time right now.
In summary I can only say I see much stronger signs of common MRC algorithmic processing of the image rather than some intentional manipulation.
Ricardo L. de Queiroz
Questions and Answers
Professor de Queiroz was also gracious enough to answer a few questions I asked him regarding his opinion. Again, I have italicized his replies, and boldfaced the more important points.
1) I understand (and since initially writing you, I have also found an example of this in a paper you wrote) that MRC compression can have multiple bitmasks without necessarily having to have corresponding foreground layers. I also understand that there is no particular reason why a bitmask would need to be more than one color. And it seems to me that having multiple single-color bitmasks is likely a space-saving technique. Would you say all this is correct?
Yes and no. MRC can have a flat color for the whole foreground layer. This is equivalent of a mask without a foreground plane as a bitmap? Yes. But there is an implicit foreground plane, with all pixels at the same color. If there is a foreground plane/layer or not is a semantic question. Just to add to your confusion, these DJVu like algorithms do not necessarily follow the MRC standard. They are MRC-like. As for the space saving, it depends on what is done. Because of the redundancy of multiple layers there are many ways to generate the same rendered image. This question perhaps needs to be worked a little more before I can give you a good answer.
[Professor de Queiroz’s comments above highlight a very fundamental problem with Garrett Papit’s paper, and his claims. In Papit’s world, there are only a couple of different compression algorithms, and every computer program in the world follows those 2 basic algorithms. In the real world, there are almost limitless possible variations. There are, for example, adaptive compression algorithms, algorithms that fully follow the MRC standard, MRC-like algorithms, similar DJVu algorithms, DJVu-like algorithms, and so forth.]
2) You referred to “multiple text styles.” What did you mean by that? (Another person claimed that different fonts were present in the document’s typed information; I earlier analyzed that at length and found no significant evidence this might be the case.)
What text styles you see in the document? As a human, I would wild guess the following: (a) title is one, (b) the typewritten data is another, (c) the boxes titles like “1a Child’s first name” is another, (d) the signatures are another, and same goes for the (e) handwritten stuff and the (f) stamped dates. Not sure I missed any other. But the segmentation algorithm apparently decided to divide the document into few types of text information.
First there is what it thought it was black text (a; b with mistakes like the R in BARACK; c with mistakes like the 1 in box number 10; parts of the signatures as the last letters of Dunham’s signature, some handwritten stuff in boxed 18b and 19b). Second, there are a few gray text or whatever quality it thought to make any distinction from the first case. Parts of the boxes 17a, 20 an 22 seems to be put in one pot. The stuff on the bottom is all placed in another pot, even though there are lots of different styles there. But there are a few sub-parts of that that might be yet another text color. I think this is a stamp. Thirdly, everything else is background, which includes a lot of the text and graphics.
3) I understand your overall conclusion to be that the things you see (including the bitmask layers, etc.) are explainable by MRC compression; and you do not see anything that appears to you likely to have been the product of manual manipulation. Is this correct?
So there we have it. The bottom line? Not only is it NOT “proven” that the bitmap layers and other obvious artifacts in Obama’s PDF are the products of manual manipulation; one of the fathers of the kind of compression technology used in the birth certificate has now examined the file, and he sees no indication that the artifacts — including the multiple bitmap layers — are anything BUT the result of innocent, technological processes.
In other words: The Arpaio Posse’s most recent claims, just like their earlier claims, are worthless. And after a year and 5 months of literally dozens upon dozens of such claims by birthers, there’s still no real indication in Obama’s PDF of any kind of manual manipulation or “forgery…” at all.
Meanwhile, Joe Arpaio / Jerome Corsi / Joseph Farah / WorldNetDaily (and other birthers) continue to studiously ignore and brush aside all real and competent technical opinions in favor of whatever “expert” they can find — whether 22 years old or not — who will back the claims they want to promote.