Skip to content

feat: add tier-1 platform support for change_time#128256

Draft
juliusl wants to merge 2 commits intorust-lang:mainfrom
juliusl:pr/support-tier-1-change-time
Draft

feat: add tier-1 platform support for change_time#128256
juliusl wants to merge 2 commits intorust-lang:mainfrom
juliusl:pr/support-tier-1-change-time

Conversation

@juliusl
Copy link
Contributor

@juliusl juliusl commented Jul 26, 2024

View all comments

fix: remove en-us from doc links to support globalization

Related: #121478
r? tgross35

fix: remove `en-us` from doc links to support globalization
@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 26, 2024
@juliusl
Copy link
Contributor Author

juliusl commented Jul 26, 2024

@tgross35 unfortunately there are still cases where None would be returned, but I updated the documentation to clarify when that is the case

@juliusl
Copy link
Contributor Author

juliusl commented Jul 26, 2024

Note: Currently the doc-url I link to that explains ChangeTime is dev blog. I have a PR out to update the official docs so that it reflects the information from there MicrosoftDocs/sdk-api#1863

/// This will return `None` if the `Metadata` instance was not created using the `FILE_BASIC_INFO` type.
/// Returns the value of the `ChangeTime{Low,High}` field from the
/// [`FILE_BASIC_INFO`] struct associated with the current file handle.
/// [`ChangeTime`], is the last time file metadata was changed, such as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// [`ChangeTime`], is the last time file metadata was changed, such as
/// [`ChangeTime`] is the last time file metadata was changed, such as

Spurious comma

/// [`WIN32_FIND_DATA`]: https://learn.microsoft.com/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw
/// [`FindFirstFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findfirstfilea
/// [`FindNextFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findnextfilea
/// [`ChangeTime`]: https://devblogs.microsoft.com/oldnewthing/20100709-00/?p=13463#:~:text=I%E2%80%99m%20told%20that%20the%20difference%20is%20metadata.%20The,attributes%20%28hidden%2C%20read-only%2C%20etc.%29%20or%20renaming%20the%20file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// FILE_BASIC_INFO contains additional fields not returned by GetFileInformationByHandle
let basic_info = self.basic_info()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the failure conditions are here, is accessing FILE_BASIC_INFO expected to never fail? Looks like uwp already does this.

If failures are possible, you can just do self.basic_info().ok().map(|info| c::FILETIME { ... }).

@tgross35
Copy link
Contributor

Cool, thanks! The above looks reasonable to me with the possible failure concern addressed.

Since this works on all platforms, is a test possible now?

@ChrisDenton
Copy link
Member

Hm, this adds an extra sys call for every use of metadata? I'd rather avoid that, especially as change time is more niche. Multiple sys calls were done for UWP targets out of necessity but for our tier 1 platforms we do try to be performant.

@tgross35
Copy link
Contributor

Per usual with Windows

r? @ChrisDenton

@rustbot rustbot assigned ChrisDenton and unassigned tgross35 Jul 27, 2024
@juliusl
Copy link
Contributor Author

juliusl commented Jul 30, 2024

@ChrisDenton

Hm, this adds an extra sys call for every use of metadata?

Yeah, I thought about that over the weekend, going to do a bit more tinkering to see if I can find a way to avoid an extra call.

Alternatively, how do you feel about an extended_metadata() api so that the perf cost is opt-in?

@juliusl
Copy link
Contributor Author

juliusl commented Jul 31, 2024

Okay wow this was a trip to research. So apparently cpython ran into a similar issue which eventually led windows to add this api:

https://learn.microsoft.com/windows/win32/api/winbase/nf-winbase-getfileinformationbyname

This API can return an information class type that returns a struct called FILE_STAT_BASIC_INFORMATION. This struct has all of the previous information, plus the ChangeTime I'm trying to get to.

Here's the real cool part. According to documentation, this api doesn't open the file in order to return metadata, which means that it should actually improve performance along this code path since GetFileInformationByHandle/GetFileInformationByHandleEx actually need to open/close a file handle.

This routine returns the requested information about the file specified by FileName. It does so without opening and returning a handle to the file.

Also, the struct includes the reparse tag if set, meaning the current additional syscall can be avoided when the reparse attribute is set.

For now, I'll try it out and see if I can get it working and I'll update the PR accordingly.

@juliusl
Copy link
Contributor Author

juliusl commented Jul 31, 2024

So slightly blocked by this PR getting merged microsoft/win32metadata#1934 in order for bindings.txt/windows-bindgen to pick up the new API. (https://github.com/microsoft/win32metadata/blob/3b275fdb05e7dbab989c00b8a86e985853c6b65e/generation/WinSDK/RecompiledIdlHeaders/um/WinBase.h#L9394)

However, I confirmed that the API already exists in winbase.h, by just running my own bindgen, which means I could at least test the functionality w/ this Draft PR.

@juliusl
Copy link
Contributor Author

juliusl commented Aug 1, 2024

Additional prior art -- libuv/libuv#4327

@riverar
Copy link
Contributor

riverar commented Aug 16, 2024

We can also look at getting this API in earlier than microsoft/win32metadata#1934, will file a separate issue to track that.

@ChrisDenton
Copy link
Member

I've been looking into it using this. One concern is how new the API is. The vast majority of users will not have it and the API docs are still displaying a stability warning (though given Python's unusually early use of the API, I'm not sure how seriously to take that). We could fallback to NtQueryInformationByName which is supported since 1709 (for FILE_STAT_INFORMATION). We do tend to avoid NT APIs in std unless necessary but using it as fallback should be fine. Though even this still leaves Windows Server 2016 out in the cold.

Another issue is that not all filesystem drivers support it. Notably fat32 but also third party drivers or (worse) things that hook the filesystem APIs. Maybe using a FILE_BASIC_INFO fallback there would not be as bad because performance in such cases is already going to be affected.

To be clear, I'm not against it at all. Just that there's enough question that I'd rather think/ask/test a bit more.

@ChrisDenton
Copy link
Member

In any case, there's no reason to block fixing the links. Could you either update this PR to just do the link changes or open a new PR for that? And separately open an issue for using GetFileInfromationByName? I don't mind if you also want to experimentally implement it in a PR but I'd rather not tie it to the link changes.

@riverar
Copy link
Contributor

riverar commented Sep 9, 2024

Right, GetFileInformationByName(..., FileStatBasicByNameInfo, ...) doesn't succeed on FAT32 volumes.

@juliusl
Copy link
Contributor Author

juliusl commented Sep 9, 2024

@ChrisDenton - Created a new PR for the links, #130168, I'll start a new issue for this particular thing

@bors
Copy link
Collaborator

bors commented Sep 12, 2024

☔ The latest upstream changes (presumably #130253) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2024
@Dylan-DPC
Copy link
Member

@juliusl what's the status of this? thanks

@juliusl
Copy link
Contributor Author

juliusl commented Nov 4, 2024

@Dylan-DPC hey there, was waiting on the windows build that includes this API to get to retail, I'll need to circle back to see what the status is but I think it's either released or is close to releasing.

@Dylan-DPC
Copy link
Member

@juliusl any further updates on this? thanks

@Dylan-DPC
Copy link
Member

@juliusl it's been a while, any updates on this pr? or is there anything that's blocking it? thanks

@juliusl
Copy link
Contributor Author

juliusl commented Feb 17, 2026

@Dylan-DPC hey there, sorry for the late reply. Last I was waiting on was a new version of windows to propagate. I can check to see where that's at currently.

@juliusl
Copy link
Contributor Author

juliusl commented Mar 3, 2026

@Dylan-DPC Okay the good news is that the API we've been looking for should be pretty common. I continued my investigation and there are a couple of things to consider before I think we can move forward.

The main issue that caused us to pause this PR for tier-1 was the concern @ChrisDenton brought up regarding the extra syscall in sys::fs::File::file_attr. Which is where the idea of using GetFileInformationByName came from. Since this API is now available I was able to benchmark compare the performance of several different solutions.

  1. Current implementation: Open + GetFileInformationByHandle
  2. windows_change_time implementation: Open + GetFileInformationByHandle + GetFileInformationByHandleEx (basic_info)
  3. GetFileInformationByName - This would mean that only Path::metadata(..) would be able to get change_time as File::file_attr is being called from an opened file-handle. It also means that calls to stat(..) would benefit from the fast path on all non-FAT32 regular files.
  4. I also tested what would happen if we used NtQueryInformationFile to populate FileAttrs. However, since there isn't a info struct that completely satisfies FileAttrs, it would mean at least 2 calls would always be used.

For now, I'll discuss just 1) 2) 4) since 3) is sort of a whole thing of it's own.

Regular files

stat (follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
file_open_and_nt_query_stat_info CreateFileW → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 74,928 8,563 -0.6%
file_open_and_get_info_by_handle CreateFileW → GetFileInformationByHandle 75,382 9,799 baseline
file_open_and_get_info_by_handle_plus_basic_info CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) 77,026 7,586 +2.2%

lstat (do not follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
file_lstat_open_and_get_info_by_handle CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle 73,317 5,846 baseline
file_lstat_open_and_nt_query_stat_info CreateFileW(OPEN_REPARSE_POINT) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 75,959 6,358 +3.6%
file_lstat_open_and_get_info_by_handle_plus_basic_info CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) 78,138 9,010 +6.6%

Junctions (reparse points)

stat (follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
junction_stat_open_and_get_info_by_handle CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 90,427 8,581 baseline
junction_stat_open_and_nt_query_stat_info CreateFileW (follows junction) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 91,684 12,094 +1.4%
junction_stat_open_and_get_info_by_handle_plus_basic_info CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) → GetFileInformationByHandleEx(BasicInfo) 94,645 14,196 +4.7%

lstat (do not follow symlinks)

Benchmark Syscall sequence ns/iter ±
junction_lstat_open_and_get_info_by_handle CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 70,103 5,205

Devbox Specs:

  • CPU: Intel Core i7-8700 @ 3.20GHz (6 cores / 12 threads)
  • RAM: 32 GB

Takeaways

  • Adding BasicInfo for ChangeTime costs ~2-7% overhead on top of the current approach. This would mean 5µs on a single file, 15-50ms for 10,000 files, 150-500ms for 100,000 files, etc.
  • Calling NtQueryInformationFile directly is about the same as the current implementation, except we would get back all required fields and it would remove the conditional logic that the current implementation has today. <- (This was surprising to find out)

--

GetFileInformationByName Results

So this wouldn't be a complete solution, however it would provide change_time on files at the file path level rather than the file handle level (although that's arguably better given what the use cases for change_time are). However, it's important to note the caveats are not being supported by FAT32 and symbolic-links.

Benchmark Syscall sequence ns/iter ± vs baseline
file_get_info_by_name GetFileInformationByName 66,871 5,941 -11.3%
junction_get_info_by_name GetFileInformationByName (detects reparse) → CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 149,286 11,159 +65.1%

11% is a pretty nice speed-up for regular files, however in the junction case, since we still need to open the file to get file size we get hit w/ the 65.1% slowdown.

I'll put these results in the dedicated issue I have but just wanted to link the contexts together.

--

Now with an idea of performance, designing the solution is a bit less hand wavy. I have two ideas at the moment:

  1. We implement this by keeping the implementation inside of the current metadata function. There will be overhead. From my perspective it's a tiny bit of overhead, but it's overhead.

  2. We add a an _extended set of api's to both sys::fs::File and fs::File in order to silo the overhead. For example, we'd add a sys::fs::File::extended_file_attrs, and call that from fs::MetadataExt::extended_metadata. This would mean adding 1 new function to the existing trait. I'm not aware of how the library team would feel about this, but I'm including this idea since it seems simplest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-windows Operating system: Windows S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants