Save repository_type to LfsObjectsProject
What does this MR do?
Overview
This MR adds a new property repository_type
to the table lfs_objects_projects
to record the repository type (one of 'project'
, 'wiki'
or 'design'
) that the corresponding blob of the LFS object lives in.
As we are enabling the ability for design
and wiki
repositories to store files in LFS (https://gitlab.com/gitlab-org/gitlab-ee/issues/9490 and https://gitlab.com/gitlab-org/gitlab-ce/issues/43721), recording the repository_type
will make it easier to prune unreferenced LFS objects in the future.
Note that this property is currently allows null values. This discussion thread decided to introduce this new column, ensure that all LFS files for the new 'design' repository are correctly saving with the right repository_type
, and allow there to be null
values (past and future), knowing that these records will all be for the regular 'project' repository. This technical debt will be addressed starting in https://gitlab.com/gitlab-org/gitlab-ce/issues/43721 where all parts of the app that create LfsObjects will be consolidated to save a repository_type
value. There will also need to be an additional data migration and not null
constraint added to the column after that piece of work.
More detail
https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/13389 adds the ability for images in a project's design
repository (project.design_repository
) to be stored in LFS.
This discussion thread raised the issue of how in the future we want to prune unreferenced LFS objects, and as LFS pointer files can now be in multiple repositories, to do so efficiently we should record which repository the blob lives in.
As an LFS object's oid
is determined by the data in the blob file, it's
possible now for a project to have the same LfsObject
for blobs in three
separate repositories ('project'
, 'wiki'
or 'design'
). This means up to three LfsObjectsProject
records with the same project
and lfs_object
but different repository_type
s. As this model is the many-to-many join table the has_many
relationships between lfs_objects
and projects
now need a distinct
scope to limit the records returned to
non-duplicates.
Lfs::FileTransformer
can now create LfsObjectsProject
with
repository_type
saved. This transformer is used by DesignManagement::SaveDesignsService
, so LFS for blobs in the design repo will always have a repository_type
saved.
Migration output
== 20190602014139 AddRepositoryTypeToLfsObjectsProject: migrating =============
-- add_column(:lfs_objects_projects, :repository_type, :integer, {:limit=>1, :null=>true})
-> 0.0026s
== 20190602014139 AddRepositoryTypeToLfsObjectsProject: migrated (0.0027s) ====
EXPLAIN output
See this Slack thread where I've asked for input into the performance of distinct
in these queries.
These two EXPLAIN queries compare master
branch with 9490-record-repository_type-on-lfs_objects_projects
branch.
EXPLAIN for project.lfs_objects query in master
:
=> EXPLAIN for: SELECT "lfs_objects".* FROM "lfs_objects" INNER JOIN "lfs_objects_projects" ON "lfs_objects"."id" = "lfs_objects_projects"."lfs_object_id" WHERE "lfs_objects_projects"."project_id" = $1 [["project_id", 33]]
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.30..16.52 rows=7 width=96)
-> Index Scan using index_lfs_objects_projects_on_project_id on lfs_objects_projects (cost=0.15..7.28 rows=7 width=4)
Index Cond: (project_id = 33)
-> Index Scan using lfs_objects_pkey on lfs_objects (cost=0.15..1.31 rows=1 width=96)
Index Cond: (id = lfs_objects_projects.lfs_object_id)
(5 rows)
EXPLAIN for project.lfs_objects query in this MR source branch 9490-record-repository_type-on-lfs_objects_projects
:
=> EXPLAIN for: SELECT DISTINCT "lfs_objects".* FROM "lfs_objects" INNER JOIN "lfs_objects_projects" ON "lfs_objects"."id" = "lfs_objects_projects"."lfs_object_id" WHERE "lfs_objects_projects"."project_id" = $1 [["project_id", 28]]
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2.24..2.28 rows=2 width=158)
-> Sort (cost=2.24..2.24 rows=2 width=158)
Sort Key: lfs_objects.id, lfs_objects.oid, lfs_objects.size, lfs_objects.created_at, lfs_objects.updated_at, lfs_objects.file, lfs_objects.file_store
-> Hash Join (cost=1.11..2.23 rows=2 width=158)
Hash Cond: (lfs_objects.id = lfs_objects_projects.lfs_object_id)
-> Seq Scan on lfs_objects (cost=0.00..1.07 rows=7 width=158)
-> Hash (cost=1.09..1.09 rows=2 width=4)
-> Seq Scan on lfs_objects_projects (cost=0.00..1.09 rows=2 width=4)
Filter: (project_id = 28)
(9 rows)
Database checklist
-
Conforms to the database guides
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted(uses a change method that is rollback-able)
-
Added the output of the migration(s) to the MR body - [-] Added tests for the migration in
spec/migrations
if necessary (e.g. when migrating data)
When adding or modifying queries to improve performance:
- [-] Included data that shows the performance improvement, preferably in the form of a benchmark
- [-] Included the output of
EXPLAIN (ANALYZE, BUFFERS)
of the relevant queries
When adding foreign keys to existing tables:
- [-] Included a migration to remove orphaned rows in the source table before adding the foreign key
- [-] Removed any instances of
dependent: ...
that may no longer be necessary
When adding tables:
- [-] Ordered columns based on the Ordering Table Columns guidelines
- [-] Added foreign keys to any columns pointing to data in other tables
- [-] Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs
When removing columns, tables, indexes or other structures:
- [-] Removed these in a post-deployment migration
- [-] Made sure the application no longer uses (or ignores) these structures
General checklist
- [-] Changelog entry added, if necessary
- [-] Documentation created/updated
-
Tests added for this feature/bug -
Conforms to the code review guidelines -
Conforms to the merge request performance guidelines -
Conforms to the style guides