Skip to content

Bug: Identity Values not set correctly with Bulk Insert from Sql Server due to Incorrect Sorting of data. #691

@cajuncoding

Description

@cajuncoding

Bug Description

There is a critical issue with the existing implementation of Bulk Insert for Sql Server whereby the Identity values are not correctly set on the models passed into RepoDb. In the existing implementation the models are passed in and RepoDb mutates the models to set their Identity values that are generated by Sql Server and returned via the Output clause of the Merge queries. This has been specifically observed in SqlAzure.

However, the processing order of SqlBulkCopy API and/or the Output clause is not guaranteed and the order of items returned may be different than the order of the original entities specified, which results in mismatched Identity column values being set on the entities specified.

And, mutating the list to result in a different sort order than that which is specified would be really bad for consuming client code and pose many other problems. So the intuitive process and result would be for the original entity enumerable set to remain in the exact order specified (as it does now), but with 100% guarantee that the Idenity values will be correctly set on all entities.

This can be due to multiple possible issues such as:

  • SqlBulkCopy.WriteToServer() method may affect the order where the order of data being written is not guaranteed, and there is still no support for the ORDER() hint. In my testing this has resulted in the data being written to the temp Sql Server table in inverted order.
    • This appears to often invert the order, but ultimately cannot be relied upon, and data needs to be sortable by a known factor that matches the entity ordinal sort order as specified.
  • Sql Merge query Output clause may not write the results in the exact same order of the source data due to Sql Server Merge query optimizations and indexes on the source/target tables.
    • This can be quite complex to determine if it will impact due to many underlying reasons, but ultimately cannot be relied upon without an explicit Order By clause.

This is most easily observed when there are various Indexes set on the target tables for which Bulk Insert is writing data into.

Note: Even though a clustered is being added to the temp table after data is inserted, but before the merge query is executed, this clustered index provides a sorted index based on qualifiers, and may not match the order enumerable list of entities specified to the original BulkMerge() method.

Recommended Solution:
Regardless of the potential cause of different ordering, the solution to guarantee the order of data from Sql Server is to always specify and Order By clause of the results being retrieved. An example of retrieving the output data in the correct order so that is always 100% of the time matches the original enumerable entity list order is here in the SqlBulkHelpers library:
https://github.com/cajuncoding/SqlBulkHelpers/blob/7770338ad94aa933b57637662be5c2949b898c5a/NetStandard.SqlBulkHelpers/SqlBulkHelpers/QueryProcessing/SqlBulkHelpersMergeQueryBuilder.cs#L91

The easiest way to guarantee that Identity values are ordered correctly is to provide a unique key in the Temp Table that is the ordinal position (INT) of each enumerable entity added into the temp table record at execution time. So each item in the enumerable list would be added into the the temp table with an additional column that denotes it's ordinal position from the enumerable list.

This is done here in a library that's dedicated to Sql Server bulk inserting for identities -- and adding the ordinal position as the data is converted into a DataTable (slightly less efficient, but provides expected behavior) -- also sample from the SqlBulkHelpers library:

https://github.com/cajuncoding/SqlBulkHelpers/blob/7770338ad94aa933b57637662be5c2949b898c5a/NetStandard.SqlBulkHelpers/SqlBulkHelpers/QueryProcessing/SqlBulkHelpersObjectMapper.cs#L67

Schema and Model:

Please share to us the schema of the table (not actual) that could help us replicate the issue if necessary.

/****** Object:  Table [dbo].[SqlBulkHelpersTestElements]    Script Date: 1/3/2021 2:28:17 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[SqlBulkTestEntities](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Key] [nvarchar](max) NULL,
	[Value] [nvarchar](max) NULL,
        CONSTRAINT [PK_SqlBulkTestEntities] PRIMARY KEY CLUSTERED ( [Id] ASC ) 
                WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX IDX_Id_Includes_Value
	ON [dbo].[SqlBulkTestEntities] ([Id])
	INCLUDE (Value);
GO

And also the model that corresponds the schema.

    [Map("SqlBulkTestEntities")]
    public class SqlBulkTestEntity
    {
        public int Id { get; set; }
        public string Key { get; set; }
        public string Value { get; set; }

        public override string ToString()
        {
            return $"Id=[{Id}], Key=[{Key}]";
        }
    }

Library Version:
RepoDb v1.12.4 and RepoDb.SqlServer v1.1.1

But likely affects the later RepoDb v1.12.5 also based on code review....

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdeployedFeature or bug is deployed at the current releasefixedThe bug, issue, incident has been fixed.priorityTop priority feature or things to dotodoThings to be done in the future

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions