Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: NTFS deduped file changed from regular to irregular #63429

Closed
imsodin opened this issue Oct 6, 2023 · 15 comments
Closed

os: NTFS deduped file changed from regular to irregular #63429

imsodin opened this issue Oct 6, 2023 · 15 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Milestone

Comments

@imsodin
Copy link

imsodin commented Oct 6, 2023

This issue is a followup to a golang-nuts discussion: https://groups.google.com/g/golang-nuts/c/YtBDdIgxazY/m/-X1Gc0xWBwAJ
@qmuntal asked me to file an issue there.

What version of Go are you using (go version)?

$ go version
go version go1.21.1 windows/amd64

Does this issue reproduce with the latest release?

I didn't try, but given the underlying issue (see below) it's unlikely to have changed - I don't have a system to repro.

What operating system and processor architecture are you using (go env)?

Same here, not my system where the repro happened: A windows system with an ntfs filesystem with deduplication is needed

What did you do?

  1. Create a file on an ntfs filesystem with deduplication, copy it and wait for deduplication to happen.
  2. Run os.Lstat on the deduplicated file.

What did you expect to see?

In go <=1.20 the result of os.Lstat gave .IsRegular() == true, which is expected.

What did you see instead?

In go1.21 it's false and it's considered a symlink.

This broke syncing of those files in syncthing, where this issue was originally reported: syncthing/syncthing#9120

Quoting @qmuntal from golang-nuts:

Having said this, I've done a quick web search looking what other frameworks and applications do with IO_REPARSE_TAG_DEDUP, and the quorum seems to be that it can be treated as a regular file, Windows handles it transparently without user intervention.
For example, boost recently started treating it as a regular file: Treat dedup files as regular files on Windows. · boostorg/filesystem@141727b (github.com).

Please file an issue proposing to special-case IO_REPARSE_TAG_DEDUP as a regular file. We can continue the discussion there. Meanwhile, or if the proposal is not approved, you can still manually get the reparse tag for all non-regular files (e.g. src/os/types_windows.go#L51-L62), and special case whatever reparse tags you need.

Also given go <=1.20 already had the desired behaviour (deduped file was considered regular), I'd personally consider this a regression and would appreciated if a fix could be backported to go1.21.

@seankhliao seankhliao added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 6, 2023
@bcmills
Copy link
Contributor

bcmills commented Oct 11, 2023

In go1.21 it's false and it's considered a symlink.

I don't think that's accurate. A symlink would have the ModeSymlink bit set, but as of https://go.dev/cl/460595 I believe you will file that the ModeSymlink bit remains unset, and the ModeIrregular bit is what is set for dedup files. The two bits mean different things.

@bcmills
Copy link
Contributor

bcmills commented Oct 11, 2023

POSIX defines a “regular file” as “a randomly accessible sequence of bytes, with no further structure imposed by the system.” Under that definition (which I think is the one the Go os package should use), it isn't clear to me which reparse points (if any) should be considered regular files.

Reparse points by definition do have “further structure imposed by the system” — namely, they have reparse tags and are handled by file system filters associated with those reparse tags. And some kinds of reparse points, such as IO_REPARSE_TAG_MOUNT_POINT and IO_REPARSE_TAG_LX_FIFO, certainly do not represent a randomly accessible sequence of bytes.

On the other hand, I would expect that at least some kinds of reparse points do represent “a randomly accessible sequence of bytes” from the perspective of an application. If we can explicitly enumerate which reparse points have that property, perhaps it would be ok to treat them as regular files? But they do still have “further structure imposed by the system”, so maybe not — I'm really not sure.

In the particular case of syncthing, perhaps there is some property other than regularity that syncthing could use instead?

(CC @golang/windows)

@bcmills bcmills added this to the Backlog milestone Oct 12, 2023
@bcmills
Copy link
Contributor

bcmills commented Oct 12, 2023

Compare #61893, #42184.

@gopherbot
Copy link

Change https://go.dev/cl/536655 mentions this issue: os,path/filepath: treat all non-symlink reparse points as irregular files

@qmuntal
Copy link
Contributor

qmuntal commented Oct 20, 2023

It is probably expecting too much from os to special-case reparse points other than IO_REPARSE_TAG_SYMLINK, The number of possible reparse points is unbounded, not only Microsoft can define them, also vendors and third parties. Also, other might be interested on special-casing IO_REPARSE_TAG_DEDUP instead of treating it is a regular file.

A good compromise could be to provide an easy way to access the reparse tag that is currently unexported in fs.FileInfo, so users can decide what to do with irregular files without having to open the file again and stating it using raw syscall functions. Maybe something like this:

package fs

type ReparseTagFileInfo interface {
	FileInfo
	ReparseTag() uint32
}
package os

func (fs *fileStat) ReparseTag() uint32 {
	return fs._ReparseTag
}

Which would be used like this:

var fi, err = os.Lstat(name)
if fi.Mode()&fs.ModeIrregular {
        if fix, ok := fi.(fs.ReparseTagFileInfo); ok {
	        switch fi.ReparseTag() {
                case syscall.IO_REPARSE_TAG_DEDUP:
                // custom logic
                }
        }
}

@imsodin
Copy link
Author

imsodin commented Oct 20, 2023

That would also be useful for us indeed - we currently do just what you say, we use syscalls on every irregular file to detect reparse points to treat them like regular ones.

@bcmills bcmills changed the title os: NTFS deduped file changed from regular to symlink os: NTFS deduped file changed from regular to irregular Oct 24, 2023
@rsc
Copy link
Contributor

rsc commented Oct 24, 2023

I agree with the Boost developers that turning on NTFS deduplication should not make ordinary files look "irregular". People will turn this on to save some disk space, and then Go programs that use .IsRegular() to avoid opening printer devices and such will stop processing their files. That's not right for a portable os layer.

Whatever we decide for #61893, I think we should special-case the dedup reparse points and report them as regular files. If the behavior is new in Go 1.21 (wasn't present in Go 1.20), I think we should probably treat this as a critical bug fix and backport it.

@gopherbot
Copy link

Change https://go.dev/cl/537915 mentions this issue: os: report IO_REPARSE_TAG_DEDUP files as regular in Stat and Lstat

@bcmills
Copy link
Contributor

bcmills commented Oct 26, 2023

@imsodin, I have a proposed patch in https://go.dev/cl/537915 but I can't easily test it. (I don't have a Windows Server instance with deduplication enabled.)

If you can build that commit from source (https://go.dev/doc/install/source) and confirm that it fixes the problem for you or your users, I will request a backport to Go 1.21.

(Note that Gerrit has a “Download patch” option in the menu to download a pending CL.)

@imsodin
Copy link
Author

imsodin commented Oct 27, 2023

@bcmills Your CL is equivalent to the workaround we added after upgrading to go1.21 and encountering the issue, and the user already confirmed it did fix the issue for them: syncthing/syncthing@9553365.
I hope that's good enough?

@bcmills
Copy link
Contributor

bcmills commented Oct 27, 2023

I agree that the approach is equivalent — mostly I want to rule out bugs in my implementation of it. 😅

But I'll go ahead and start the backport process based on that information.

@bcmills
Copy link
Contributor

bcmills commented Oct 27, 2023

@gopherbot, please backport to Go 1.21. This was a behavior change in Go 1.21 impacting users of the os package, and its impact was not realized at the time.

@gopherbot
Copy link

Backport issue(s) opened: #63764 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/538218 mentions this issue: [release-branch.go1.21] os: report IO_REPARSE_TAG_DEDUP files as regular in Stat and Lstat

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 2, 2023
gopherbot pushed a commit that referenced this issue Nov 8, 2023
…lar in Stat and Lstat

Prior to CL 460595, Lstat reported most reparse points as regular
files. However, reparse points can in general implement unusual
behaviors (consider IO_REPARSE_TAG_AF_UNIX or IO_REPARSE_TAG_LX_CHR),
and Windows allows arbitrary user-defined reparse points, so in
general we must not assume that an unrecognized reparse tag represents
a regular file; in CL 460595, we began marking them as irregular.

As it turns out, the Data Deduplication service on Windows Server runs
an Optimization job that turns regular files into reparse files with
the tag IO_REPARSE_TAG_DEDUP. Those files still behave more-or-less
like regular files, in that they have well-defined sizes and support
random-access reads and writes, so most programs can treat them as
regular files without difficulty. However, they are still reparse
files: as a result, on servers with the Data Deduplication service
enabled, files could arbitrarily change from “regular” to “irregular”
without explicit user intervention.

Since dedup files are converted in the background and otherwise behave
like regular files, this change adds a special case to report DEDUP
reparse points as regular.

Fixes #63764.
Updates #63429.

No test because to my knowledge we don't have any Windows builders
that have the deduplication service enabled, nor do we have a way to
reliably guarantee the existence of an IO_REPARSE_TAG_DEDUP file.

(In theory we could add a builder with the service enabled on a
specific volume, write a test that encodes knowledge of that volume,
and use the GO_BUILDER_NAME environment variable to run that test only
on the specially-configured builders. However, I don't currently have
the bandwidth to reconfigure the builders in this way, and given the
simplicity of the change I think it is unlikely to regress
accidentally.)

Change-Id: I649e7ef0b67e3939a980339ce7ec6a20b31b23a1
Cq-Include-Trybots: luci.golang.try:go1.21-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/538218
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/565136 mentions this issue: os: don't treat mount points as symbolic links

gopherbot pushed a commit that referenced this issue Mar 4, 2024
This CL changes the behavior of os.Lstat to stop setting the
os.ModeSymlink type mode bit for mount points on Windows. As a result,
filepath.EvalSymlinks no longer evaluates mount points, which was the
cause of many inconsistencies and bugs.

Additionally, os.Lstat starts setting the os.ModeIrregular type mode bit
for all reparse tags on Windows, except for those that are explicitly
supported by the os package, which, since this CL, doesn't include mount
points. This helps to identify files that need special handling outside
of the os package.

This behavior is controlled by the `winsymlink` GODEBUG setting.
For Go 1.23, it defaults to `winsymlink=1`.
Previous versions default to `winsymlink=0`.

Fixes #39786
Fixes #40176
Fixes #61893
Updates #63703
Updates #40180
Updates #63429

Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-longtest,gotip-windows-arm64
Change-Id: I2e7372ab8862f5062667d30db6958d972bce5407
Reviewed-on: https://go-review.googlesource.com/c/go/+/565136
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Projects
None yet
Development

No branches or pull requests

7 participants