Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image/jpeg: add options to partially decode or tolerantly decode invalid images? #10447

Open
pranavraja opened this issue Apr 14, 2015 · 37 comments
Labels
FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@pranavraja
Copy link

go version devel +ce43e1f Mon Apr 13 23:27:35 2015 +0000 linux/amd64

Attempted to use jpeg.Decode on the below image:
https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg

Expected the image to decode successfully, as it displays in a browser.

Actual result:
invalid JPEG format: short Huffman data

@nigeltao
Copy link
Contributor

It seems to decode fine for me, on tip. I know your Go version is only a few days old, but can you "git sync" and re-try?

@pranavraja
Copy link
Author

Strange, I updated to go version devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000 linux/amd64, and am still getting the same error. Here's my test program:

package main

import (
        "image/jpeg"
        "net/http"
        "fmt"
)

func main() {
        res, err := http.Get("https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg")
        if err != nil {
                panic(err)
        }
        defer res.Body.Close()
        img, err := jpeg.Decode(res.Body)
        if err != nil {
                panic(err)
        }
        fmt.Println(img.Bounds())
}

And here's the output:

panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
main.main()
        /usr/share/fix-images/check.go:17 +0x182

@minux
Copy link
Member

minux commented Apr 15, 2015 via email

@pranavraja
Copy link
Author

Added fmt.Println(runtime.Version())

devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000
panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
main.main()
        /usr/share/fix-images/check.go:19 +0x28a

Anyway, as long as this is fixed on tip I'm happy to close this.

@tenorok
Copy link

tenorok commented Jun 13, 2016

Hello!
I try to use golang v1.6.2 and with image of @pranavraja everything okay, but with image http://bubble.ru/system/magazines/mg_n11_01_original.jpg I have the same error.
Please, can you explain what is problem?

@nigeltao
Copy link
Contributor

I'm not sure what the problem is, but it's not a regression: I see the same error on the stable release (Go 1.6). We're in code freeze for the upcoming 1.7 release; I'll take a look at it once the tree opens again for 1.8.

@nigeltao nigeltao reopened this Jun 14, 2016
@nigeltao nigeltao added this to the Go1.8 milestone Jun 14, 2016
@tenorok
Copy link

tenorok commented Jun 24, 2016

I'm uploaded image with problem to github for safety, because hoster can remove her.
https://cloud.githubusercontent.com/assets/1322855/16350692/f67e1266-3a68-11e6-96a1-205b396b1ace.jpg

@cctse
Copy link

cctse commented Aug 18, 2016

I met the same problem. It raise "OSError: image file is truncated (53 bytes not processed)" when I use python to download a jpeg file, the binary data can be save to file with ImageFile.LOAD_TRUNCATED_IMAGES=True, but the truncated pixel will set to black. with go, all pixel process well and not truncate but it will raise "invalid JPEG format: short Huffman data" when decode it. the jpeg show full in safari but truncated in chrome. maybe go need truncate the jpeg as python does

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 7, 2016
@elektroid
Copy link

elektroid commented Oct 9, 2016

I have the same issue with pictures from my phone, I attach one in case it helps
https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg

$ go version
go version go1.7.1 freebsd/amd64

@mattn
Copy link
Member

mattn commented Oct 9, 2016

package main

import (
    "fmt"
    "image/jpeg"
    "net/http"
)

func main() {
    urls := []string{
        "https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg",
        "https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg",
    }
    for _, u := range urls {
        res, err := http.Get(u)
        if err != nil {
            panic(err)
        }
        defer res.Body.Close()
        img, err := jpeg.Decode(res.Body)
        if err != nil {
            panic(err)
        }
        fmt.Println(img.Bounds())
    }
}

I got fail in the second @elektroid mentioned.

(0,0)-(1920,1080)
panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
panic(0x5f5c00, 0xc0421a4060)
        c:/dev/go/src/runtime/panic.go:527 +0x1ae
main.main()
        c:/dev/go-sandbox/jpeg.go:22 +0x21d
exit status 2

first.

9ae536dd97d2d92fc17a6590509a51c0.jpg: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=0], baseline, precision 8, 1920x1080, frames 3

second.

2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg: JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=9, datetime=2016:09:29 20:09:59, GPS-Data, model=Aquaris M5.5, resolutionunit=2, yresolution=155, xresolution=163], baseline, precision 8, 3120x4160, frames 3

@elsonwu

This comment has been minimized.

@nigeltao
Copy link
Contributor

@elektroid I'll try to find some time next week to look at it, but FWIW, I get e-mail for every comment on this issue, and somewhere along the mail pipeline, or in my browser's JPEG decoder, that attachment doesn't look like a valid JPEG. I've attached a screenshot from my mail, where I've added a pink ring to emphasize where it breaks down.

invalid

@nigeltao
Copy link
Contributor

@elsonwu can you give more details than "some special jpg will cause this problem"? Can you attach an example? Do other programs (e.g. web browsers, Photoshop) handle those special JPEGs OK or do they also reject them?

@bradfitz
Copy link
Contributor

bradfitz commented Nov 7, 2016

@nigeltao, what's the status here?

@elektroid
Copy link

I switched to "gopkg.in/gographics/imagick.v1/imagick" hoping it would cope with my improper files but it fails to load them too.

@nigeltao
Copy link
Contributor

nigeltao commented Nov 8, 2016

Sorry, I didn't find the time to make a detailed investigation, and there have been no recent changes to Go's image/jpeg package, but it sounds like non-Go software is also reporting errors with some or all of these cases.

@bradfitz
Copy link
Contributor

Yeah, but I can open it in Chrome. I thought we tried to match whatever browsers do.

@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@special
Copy link

special commented Nov 19, 2016

All of the failing testcases here and others that I've found are truncated. They don't have complete SOS segments, don't contain an EOI, and raise warnings with other decoders. In @elsonwu's case, it's fixed by appending \x00\xff\xd9. The others are missing more data.

Still, there is a bug in that there's no way to decode truncated images, which seem relatively common and are readable with most other decoders.

My first thought is to return a partially decoded image (if there is one) along with the error; see special@c7a05f3. I don't mind adding docs/tests and submitting that if the approach is ok.

Otherwise, it seems slightly inappropriate to silence legitimate decoding errors, and there's no API for decoding options, so I'm not sure what else to do.

@nigeltao
Copy link
Contributor

We could possibly change jpeg.Decode (and the other image codecs) to return (non-nil Image, non-nil error) with partial results if it encounters an error, although that's unusual in general for functions returning (T, err), and certainly not going to happen for Go 1.8.

As for matching whatever browsers do, what browsers do influences how far down the Postel's law slope we slip, but I'm wary of the slippery slope, and according the JPEG spec, these are invalid images.

@jlongman
Copy link

jlongman commented Feb 1, 2017

Just did a quick look at this for my reasons and summarized it like this:

  • sid and nancy : incorrectly terminated - no missing bytes
  • parking garage : truncated and incorrectly terminated - missing bytes
  • the guy : strange icc_profile and incorrectly terminated - no missing bytes
  • russian comic book : incorrectly terminated - no missing bytes

I'm saying the files are not truncated by examining the bottom right block and seeing pixels - as opposed to parking garage which is clearly truncated a couple of lines of blocks early. I didn't check metadata vs lines of pixels however, so whole lines of blocks could be missing.

The ICC_PROFILE may be something that's accepted in JPEG formats, or the format descriptions I was looking at weren't showing the working standard (as opposed to the published standard). It clearly decodes in other decoders in any case.

Anyways, I don't have a solution to your quandary about returning an error and an image, I suppose you could put the decoded image in the error (ack), but my thought is that the issue might better be described as instead of "image/jpeg: Unable to decode valid JPEG image" as "image/jpeg: Unable to decode invalid/truncated JPEG image". Cheers!

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.9 May 23, 2017
@korya
Copy link

korya commented Jun 21, 2017

I have another example of invalid JPEG image. The problem with the file is that the second half of the file is filled with garbage bytes:

$ xxd bad-image.jpg
00000000: ffd8 ffe0 0010 4a46 4946 0001 0100 0001  ......JFIF......
00000010: 0001 0000 ffdb 0043 0005 0304 0404 0305  .......C........
 ... valid JPEG contents ...
0001fff0: 7ca0 574a 75cf 835d 4b12 cffd 9a0e 8fa1  |.WJu..]K.......
00020000: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......
 ... same 16-byte pattern ...
0003f4e0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......
0003f4f0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......

image.Decode fails with invalid JPEG format: missing 0xff00 sequence, but the browsers display the image:

screen shot 2017-06-21 at 09 12 28

@rasky
Copy link
Member

rasky commented Nov 29, 2017

@bradfitz can we change the title of this issue? This is now about doing something for invalid images, not valid ones.

@bradfitz bradfitz changed the title image/jpeg: Unable to decode valid JPEG image image/jpeg: add options to partially decode or tolerantly decode invalid images? Nov 29, 2017
@bradfitz bradfitz added FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Nov 29, 2017
@gopherbot gopherbot removed the NeedsFix The path to resolution is known, but the work has not been done. label Nov 29, 2017
@bradfitz
Copy link
Contributor

@rasky, done.

@sheerun
Copy link

sheerun commented Oct 9, 2018

Here's another example: __20

@dvaldivia
Copy link

still running into this issue in go version go1.11.4 darwin/amd64

robbynshaw added a commit to robbynshaw/textile-explore that referenced this issue Feb 27, 2019
Currently experiencing an error on the go side with image decoding from flickr.

See golang/go#10447
@zzwx
Copy link

zzwx commented Apr 15, 2019

VanGoghStarryNight___
(VCG Wilson / Corbis via Getty Images)

Attempt to decode leads to:
invalid JPEG format: missing SOI marker

While the image is easily shown in major browsers

go version
go version go1.12.1 windows/amd64

@bradfitz
Copy link
Contributor

@sheerun's example looks like a progressive JPEG that's truncated. I only say that because it's so blurry and has that JPEG blocky look to it and I assume another pass would sharpen it. I didn't look at its structure.

@sheerun
Copy link

sheerun commented Apr 16, 2019

yes indeed. I guess it's out of scope of go-lang to nicely handle such case? I think ideal API would return both partially loaded image and flag that tells that image is corrupted

@nigeltao
Copy link
Contributor

I guess it's out of scope of go-lang to nicely handle such case?

Yeah, I'd still say what I said a couple of years ago:
#10447 (comment)

@yybmsrs
Copy link

yybmsrs commented Mar 10, 2020

go version go1.13.4 windows/amd64
this problem still exists.

@tlelson
Copy link

tlelson commented Oct 26, 2020

Would it be reasonable to have a few error types in this package so that a user can selectively identify and deal with a specific error such as ErrHuffmanCode ?

@whorfin
Copy link

whorfin commented May 18, 2021

I just thought I'd leave this here, in the hopes of being helpful.
In reference to the navidrome use-case I linked above, I made a very simply change to src/image/jpeg/reader.go
It's not suitable for a PR, clearly, given the larger conversation here about "the right thing to do". In my case, this works perfectly on my images, returning visually the same image I'd see loading the image (example linked above) in a web browser [gm identify says it has premature EOF, identify does not]
The idea is dangerous and simple - don't bail on "short Huffman data", and consider a premature EOF to be the same as EOI - escape the parsing loop and let the outer image return code sort it out.
So rather than fully parsing and handling all premature errors specially, the idea would be (which I have clearly not fully done) to carefully and incrementally construct a valid image while decoding, being careful to always leave the image structure in as valid a state as possible. This seems to be the spirit of what is there now.
If we run out of image data, return what we have.
Simple patch for the "short Huffman data" case ... which I hope helps someone:

--- reader.go-orig	2021-05-17 16:59:44.565429376 +0000
+++ reader.go-new	2021-05-18 11:31:08.323759100 +0000
@@ -12,6 +12,7 @@
 	"image/color"
 	"image/internal/imageutil"
 	"io"
+	"errors"
 )
 
 // TODO(nigeltao): fix up the doc comment style so that sentences start with
@@ -538,7 +539,12 @@
 	for {
 		err := d.readFull(d.tmp[:2])
 		if err != nil {
-			return nil, err
+			// consider unexpected EOF to be same as EOI, let outer code
+			// find out if we have an image to return
+			if !errors.Is(err, io.ErrUnexpectedEOF) {
+				return nil, err
+			}
+			break
 		}
 		for d.tmp[0] != 0xff {
 			// Strictly speaking, this is a format error. However, libjpeg is
@@ -648,7 +654,10 @@
 			}
 		}
 		if err != nil {
-			return nil, err
+			// don't bail on short Huffman data
+			if !errors.Is(err, FormatError("short Huffman data")) {
+				return nil, err
+			}
 		}
 	}
 

@metakeule
Copy link

So to recap:

  1. This issue is now 8 years old and have never been fixed.
  2. It is a showstopper for any use of the image library on a server where random images are uploaded.
  3. This library behaves worse than any reasonable image processing software out there.
  4. The issue could have been easily fixed, by returning a proper error message and the processed image up to the point where the unexpected error happened.
  5. The maintainer was not sure about what to do and avoided any decision for 7 years.
  6. In this time nobody from the Go developers stepped up to help our fix the issue by himself.
  7. After the maintainer prevented any solution from happening for 7 years, he now is unassigned from the issue.
  8. Now nobody is assigned to the issue and nobody fixed it.

Great job!

@metakeule
Copy link

metakeule commented Apr 13, 2023

In case, anybody does care, I have forked now the jpeg package and fixed some of the issues with the help of the code of @whorfin

https://gitlab.com/golang-utils/image2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests