In Joyful Craftsmen we provide, among other things, technical support to our customers, which from time to time involves us in very interesting cases. One of them occurred after a migration of a critical stock trading system database to SQL Server 2017. Every night, when there are no trades on the market, a sanitary job runs and deletes old data from some tables in one of its steps. After the migration, this job started to fail with an interesting error message:
Executed as user: DOMAIN\XYZ. Error: The query processor could not produce a query plan. For more information, contact Customer Support Services. [SQLSTATE 42000] (Error 8624). The step failed.
The error also caused a crash of the thread and a minidump creation:
After some research, we were able to get to the critical part of the code, where the error was occurring:
DELETE TER
FROM [trd].[EntityReferences] AS TER
INNER JOIN [trd].[TransactionBuffers] AS T
ON TER.Id = T.ID AND TER.Type = 10670
WHERE TER.Type = 10670
AND T.Modified < @ToDate
The interesting thing was that the error occurred only when the code had run inside the SQL Server Agent’s job. When we copied the exact same lines of code into SSMS, everything was working properly and the statement finished in a few seconds. As a logical conclusion we tried to align the connection settings of SSMS to the connection options of SQL Server Agent and then the query started to run immediately. This helped us to get the job working but another question arose – why the query had failed with such a strange error under the SQL Server Agent’s connection options?
Connection options vs query optimizer
Before we get to the explanation, let’s have a brief look on what some of these connection options mean and how they are set. The fact that different connectors are using different connection options is pretty well known and on the internet we can find several resources which show differences for most typical connectors. One of them can be found in Erland Sommarskog’s great article Query Plan Mysteries.
ANSI Setting | Applications using ADO .Net, ODBC or OLE DB | SSMS | SQLCMD, OSQL, BCP, SQL Server Agent | ISQL, DB-Library |
ANSI_NULL_DFLT_ON | ON | ON | ON | OFF |
ANSI_NULLS | ON | ON | ON | OFF |
ANSI_PADDING | ON | ON | ON | OFF |
ANSI_WARNINGS | ON | ON | ON | OFF |
CONACT_NULLS_YIELD_NULL | ON | ON | ON | OFF |
QUOTED_IDENTIFIER | ON | ON | OFF | OFF |
ARITHABORT | OFF | ON | OFF | OFF |
In a table taken from Erland’s article I have highlighted two options which differ between SSMS and SQL Server Agent – ARITHABORT and QUOTED_IDENTIFIER.
An ARITHABORT option sets how to handle an overflow or divide-by-zero errors. More specifically, it sets if a number divided by null or zero is a null (when ARITHABORT is OFF) or an error (when ARITHABORT is ON). To make it more complicated this behavior is true only when SET ANSI WARNINGS is OFF. The following diagram visualizes this behavior graphically:
A QUOTED_IDENTIFIER option in simple words means – should a query optimizer consider double quotes as an identifier just as it does for square brackets? All of us know that [] are used in T-SQL to identify keywords or characters against T-SQL syntax. If QUOTED_IDENTIFIER is set to ON, the double quotes will represent the same thing as the square brackets. That’s the reason why both following statements fail:
SET QUOTED_IDENTIFIER ON
GO
SELECT [1]
GO
SELECT „1“
GO
Msg 207, Level 16, State 1, Line 3
Invalid column name ‚1‘.
Msg 207, Level 16, State 1, Line 5
Invalid column name ‚1‘.
And why the query optimizer for statements
SET QUOTED_IDENTIFIER OFF
GO
SELECT [1]
GO
SELECT „1“
GO
returns an error only for the first one.
By the way, do different connection options for SSMS and SQL Server Agent mean that they are using different drivers to connect to the engine itself? Not at all! As you can see in the following figure, these settings are changed by calling an explicit query statement whenever a new connection is created.
In the first step, SQL Server Agent sets all the connection options during the login phase to „ODBC defaults“ which are immediately rewritten by calling set quoted_identifier off. This happens every single time when a new job step opens a connection to SQL Server.
The exact same thing is happening within the SSMS where default driver-based options are also rewritten by a consequent statement as shown in the next figure.
This behavior can be understood in case of SSMS, where the query aligns many settings from a query options dialog, but it really doesn’t make much sense for SQL Server Agent’s connection. Especially when there is no option for how to globally change it.
Root cause
Now that we know how these options are set and what their impact is, let’s focus back on the Internal Query Processor Error. It’s important to say that the table where the delete was performed was something like a crossroads of a database, which linked many tables together. Developers were very conscientious in creating foreign key constraints between all linked tables, which is great for ensuring the integrity of the data but not that useful if you are concerned about performance…
A following figure shows what query plan was generated for that simple delete:
Dozens of relationships for referenced tables mean that for every single row dozens of other tables have to be checked. This is not something that could cause an error by itself, but it led us to the idea of checking the underlying objects in each of the referenced tables. And then we found them…
One of the most discussed things related to connection options are filtered indexes. If you want to create and use them, your connection options have to have QUOTED_IDENTIFIER set to ON. Surprisingly, this information is missing in the documentation for filtered indexes but you can find it in the documentation for QUOTED_IDENTIFIER and for general indexing.
You can display filtered indexes easily by calling:
SELECT * FROM sys.indexes i WHERE i.has_filter = 1
When we crosschecked the output with objects referencing the table where deletion has occurred, we found two filtered indexes. Once we dropped them, the error was gone.
A similar thing could happen with XML indexes as well.
Is it a bug? Under normal circumstances, when you try to perform an operation which would touch a filtered index, you get a nice error message saying „DELETE failed because of SET options“:
USE AdventureWorks2016
GO
SET QUOTED_IDENTIFIER ON;
CREATE NONCLUSTERED INDEX FI_Production_Weight
ON Production.Product ([Weight],[ProductSubcategoryID])
WHERE [Weight] IS NOT NULL;
GO
SET QUOTED_IDENTIFIER OFF;
DELETE
FROM Production.Product
WHERE ProductSubcategoryID = 1;
Msg 1934, Level 16, State 1, Line 10
DELETE failed because the following SET options have incorrect settings: ‚QUOTED_IDENTIFIER‘. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
But in our case, when the filtered index was „far“ in a tree of referencing tables, a mysterious Internal Query Processor Error appeared instead and the thread crashed. This is definitely not an intended behavior and most likely a bug.
How is this case connected with an upgrade to SQL Server 2017 which I mentioned at the beginning of the post? It isn’t! It just confused us, because the upgrade was connected with some indexes creation (including filtered indexes) from developers and local DBAs didn’t know about it. It’s a good example how Database Lifecycle Management shouldn’t be handled, but that is another story. The important thing is that the connection options for SQL Server Agent remain unchanged in all current versions of SQL Server and an upgrade had nothing to do with the issue we experienced.
Takeaways
A key takeaway is that filtered indexes and T-SQL code inside T-SQL job step in SQL Server Agent don’t play together. Therefore, the first statement in any T-SQL job step should always be SET QUOTED_IDENTIFIER ON. Otherwise we run the risk that the job will fail with a nice or way less nice error message. This is especially true if you are not using stored procedures to execute your code from SQL Server Agent.
If stored procedures or other programmable objects are used (which is good practice anyway), don’t forget to check their connection options by:
WITH c AS (
SELECT OBJECTPROPERTY([object_id], ‚IsQuotedIdentOn‘) AS IsQuotedIdentOn, OBJECTPROPERTY([object_id], ‚ExecIsQuotedIdentOn‘) AS ExecIsQuotedIdentOn, *
FROM sys.objects o
WHERE o.type NOT IN ( ‚D‘, ‚C ‚ )
)
SELECT *
FROM c
WHERE c.IsQuotedIdentOn <> 1
OR c.ExecIsQuotedIdentOn <> 1;
GO
If any rows are returned, check them and try to set proper connection settings for them.