Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: DetectContentType not working for text/html with line feed character #71584

Closed
russinholi opened this issue Feb 6, 2025 · 2 comments
Labels
BugReport Issues describing a possible bug in the Go implementation.

Comments

@russinholi
Copy link

Go version

go version 1.23.5

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/reginaldo/.cache/go-build'
GOENV='/home/reginaldo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/reginaldo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/reginaldo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/snap/go/10818'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/snap/go/10818/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.5'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/reginaldo/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2302776111=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I'm using the http.DetectContentType to get the content type of a html file that starts with these lines:

<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...

So I call http.DetectContentType passing the first 512 bytes of the file.

I shared this go.play link to help to reproduce the issue: https://go.dev/play/p/cM_Wy5pEiYT

What did you see happen?

As after <!DOCTYPE html we have a \n character, the content type returned by the function is text/plain; charset=utf-8.
Apparently because the isTT function is not considering line-feed characters as a tag-termination byte and is returning false when matching the file content with the html signature string:

// isTT reports whether the provided byte is a tag-terminating byte (0xTT)
// as defined in https://mimesniff.spec.whatwg.org/#terminology.
func isTT(b byte) bool {
	switch b {
	case ' ', '>':
		return true
	}
	return false
}

What did you expect to see?

The correct content type returned should be text/xml; charset=utf-8 .

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Feb 6, 2025
@seankhliao
Copy link
Member

this follows whatwg mimesniff https://mimesniff.spec.whatwg.org/#tag-terminating-byte

A tag-terminating byte (abbreviated 0xTT) is any one of the following bytes: 0x20 (SP), 0x3E (">").

closing as working as intended.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation.
Projects
None yet
Development

No branches or pull requests

3 participants