Skip to content

Speed up retrieval of files from stores #928

@lfrank

Description

@lfrank

Feature Request

Problem

Right now the hash of large files in a store can take quite a long time. For example, on a ~200GB raw data file, the has took >200 seconds, which results in a long delay before data from that file can be used.

Requirements

Ideally the system might store the hash, the size, and the modification date, and would first check the size and date for a match. Given a matching size and date, that seem sufficient to ensure that it's the same file. This could also be a .config option for the store.

Justification

Provide the key benefits in making this a supported feature. Ex. Adding support for this feature would ensure [...]

Alternative Considerations

Do you currently have a work-around for this? Provide any alternative solutions or features you've considered.

Related Errors

Add any errors as a direct result of not exposing this feature.

Please include steps to reproduce provided errors as follows:

  • OS (WIN | MACOS | Linux)
  • Python Version OR MATLAB Version
  • MySQL Version
  • MySQL Deployment Strategy (local-native | local-docker | remote)
  • DataJoint Version
  • Minimum number of steps to reliably reproduce the issue
  • Complete error stack as a result of evaluating the above steps

Screenshots

If applicable, add screenshots to help explain your feature.

Additional Research and Context

Add any additional research or context that was conducted in creating this feature request.

For example:

  • Related GitHub issues and PR's either within this repository or in other relevant repositories.
  • Specific links to specific line or focus within source code.
  • Relevant summary of Maintainers development meetings, milestones, projects, etc.
  • Any additional supplemental web references or links that would further justify this feature request.

Metadata

Metadata

Assignees

Labels

enhancementIndicates new improvements

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions