Home Dashboard Directory Help
Search

The Scalar Expression function would speed performance while keeping the benefits of functions. by Andrew Novick


Status: 

Active


373
3
Sign in
to vote
Type: Suggestion
ID: 273443
Opened: 4/25/2007 1:47:12 PM
Access Restriction: Public
Duplicates: 524983 544121
3
Workaround(s)
view

Description

SQL Server scalar User-Defined Functions (UDFs) have a performance problem that could be solved with an enhancement to the SQL Server database engine. Because SQL Server must execute each function on every row, using any function incurs a cursor like performance penalty.  However, since many UDFs are simple, they can often be converted to a single expression.  This suggestion proposes that a new type of UDF, the Scalar Expression UDF, be added to SQL Server.  Doing so would often eliminate the performance penalty paid when using scalar UDFs and allow them to be used more widely.  Using UDFs has the advantage of abstraction that makes code easier to maintain and so making them easier to use has substantial benefits.

No matter how simple a UDF there’s a large performance penalty paid when they’re used.  This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more.  The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing.  It must invoke each UDF on each row.  If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows.  If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing. It’s the row by row processing that seems to slow SQL Server the most. I’ve documented the performance penalty in chapter 11 of my book, Transact-SQL User-Defined Functions, and in the accompanying script. I've posted a PDF with the relevent section of the book and a complete example script on my site at:
http://www.novicksoftware.com/Articles/sql-server-udf-performance-problem/sql-server-udf-performance-problem.htm
Details
Sign in to post a comment.
Posted by Andrew Novick on 11/15/2013 at 7:02 AM
Hi Will,
This is still a useful idea. I use the inline functions workaround all the time but it makes the query difficult to read.
And, yes it would be similar to C macros and the same, or similar to using an inline table workaround, posibly without the joins and other SELECT specific capabilities, just the expression.
So I'm Looking forward to it.
Regards,
Andy

“He that can have patience can have what he will.”
― Benjamin Franklin
Posted by Will Rayer on 11/15/2013 at 6:23 AM
Microsoft has not updated this for almost 6 years - please can we have an update? If the inline scalar UDF is simply expanded in the same way as C macros it should hopefully not be too complicated to implement. Many keen and enthusiastic SQL Server users and evangelists would love to see this idea, so please can Microsoft show their continued committment to the core SQL engine by implementing this idea.
Posted by JRStern on 4/27/2013 at 4:56 PM
An excellent idea. We've all been waiting for the deterministic/non-deterministic difference to be worked correctly since it was announced for SQL 2000!

Like the inline TVF solution, just treat them like macros.

Or hey, implement nearly the same function AS a macro - except that I suppose we then want some way to make the macros global.

BTW I just (re)posted "constants for TSQL" as another suggestion. Using even scalar functions may or may not allow them to be used for constant values, unless either a constant keyword is introduced or the compiler gets a lot smarter.
Posted by Kent Waldrop on 2/5/2013 at 8:04 AM
This issue has been "in the oven" for a while; where exactly does this stand? Is this being looked at for the next iteration of SQL Server?
Posted by DATAPOINT on 10/2/2012 at 8:29 AM
I would also like to see this fixed. Using scalar funciton is sometimes very straightforward solution, but if this compromise performace they are quite useless.
Posted by Mark Guinness on 9/14/2012 at 3:25 PM
This is also an obstacle for using scalar functions for constant values:

http://devplanet.com/blogs/brianr/archive/2008/02/13/sql-server-constants-the-constant-struggle.aspx
Posted by marsovo on 1/25/2012 at 12:42 PM
It's ludicrous that after so many years we still only have such a useless implementation of scalar functions and not an inline version. I wonder how many total man-hours have been wasted on this, when an inline scalar UDF seems like such a staggeringly simple thing to implement in the database engine compared to the other UDF variants. The existing ones are almost guaranteed to screw anyone who comes across them, at least at first, since there's no indication that they slow your performance down by orders of magnitude.
Posted by Professional The on 11/3/2011 at 12:45 PM
I'd like to be able to declare an inline scalar function (above a query). The compiler could desugar it if it wanted to. It would make things more readable in certain cases.
Posted by Arnoud van Bers on 6/23/2011 at 3:04 AM
What is Microsofts view on this as it is now one of the most requested improvements?
Posted by Bob Frasca- on 12/24/2010 at 12:12 PM
I'm a little confused. It isn't a problem specifically with user defined functions. It's ANY scalar function used in the SELECT list or WHERE clause including built-in functions like ISNULL and COALESCE. If a value in a column has to be processed by a function (regardless of the source of the function, i.e. UDF or built-in) in order to determine whether the row needs to be filtered from the result set then every row must be processed. This isn't an engine problem, it's a design problem.

If you have to process a specific column of every row through some kind of filter then I'm not sure how the proposal is going to resolve that. Typically, I've found that this is an architectural issue. For example, if you don't allow NULL's you won't need to use ISNULL or COALESCE. Of course, that isn't always possible and may not always be advisable. The point is that a little foresight during the data architecture phase can help alleviate some of these issues before they happen.

The two workarounds already proposed are excellent choices where these scenarios already exist.
Posted by Gert-Jan Strik on 12/24/2010 at 7:14 AM
I like this proposed solution. However, I'd prefer another solution, or maybe a combination of the two. I propose that the use of deterministic functions is cached.

With a scalar UDF cache in place, if the query selects a lot of rows with (probably) many duplicate input values, the optimizer might choose to transform the query with the scalar UDF into a selfjoined query (spool) that will fetch the value from cache instead of computing the output value. Even with a poor cache hit rate, the deduplicating can potentially speed up the query a lot (depending on the cost of the UDF).

I expected caching of scalar UDF output as early as SQL Server 2005, but it is still lacking.
Posted by abair34 on 6/3/2010 at 3:03 PM
As a DBA I see alot of developers who aren't big sql guys that say hey we can create functions in SQL? Cool now I can wrap this small little piece of reusable code in a function and call it from my queries. It feels native to them and then I have to reel them in and let them know the downsides of it. I'd love to see this fixed!
Posted by Reader Man Me on 5/1/2010 at 12:06 AM
I did not think this would take all these years to resolve, i think MS should look for the features that affect its big product and make them more efficient.
Posted by CheetahAfoot on 3/25/2010 at 2:47 PM
Yep. It would be pretty useful to get this problem solved cause writing table function instead of naturally scalar function is not what I want to do. It's a kind of trick but not a right way obviously. Thanks.
Posted by Jethro on 3/23/2010 at 8:58 PM
Why the microsoft leave this problem so long --almost 4 years, It is very simple to resove it for those UDF that has no data access.
Posted by Microsoft on 12/11/2007 at 11:19 AM
Hello,

Thank you for sending your feedback to us. We will consider this enhancement in a future release of SQL Server.

Thanks,
-Vineet Rao.
Posted by Mighty-O on 10/15/2007 at 11:17 AM
I echo the desire for SQL-invoked routine and understand one of the elements of the CLR implementation is to satisfy this requirement. The rub is CLR functionality is quite convoluted compared Andrew’s suggestion. The substitution aspect is particularly appealing to this diehard Transact-SQL coder.
Posted by SQLWork on 10/11/2007 at 8:45 AM
In other programming languages, this might be a macro expansion, where an include file contains the macro, which expands into inline code at compile time. (I believe that this was discussed as a possibility in the very early days of Yukon.) In any case, however implemented, Inline Scalar UDF would be a major improvement over the speed-bump that exists now.

I was going to post this myself, but happily add my vote to the existing suggestion.
Posted by Mark Yudkin on 10/11/2007 at 4:28 AM
ANS SQL 99 actualy defines a syntax (section 11.49) for function definition that can be used for an inline scalar function (body consisting of a single RETURN statement) - inlining being an implementation rather than a definitional issue. There are other database products (e.g. IBM DB2 UDB) that already support this, so going the ANS way would assist portability.

The enhancement is clearly necessary.
Sign in to post a workaround.
Posted by Tom Groszko on 2/12/2013 at 6:39 PM
If the function can be written as a single statement put it in an inline table function and use the apprpriate apply statement. There is nothing preventing the table returned from an inline table function from having only a single column. Compared to a UDF you will get a significant performance benefit.
Posted by Vikram Gopalan on 2/26/2009 at 1:56 PM
I had the same problem when I used scalar UDF in join column, the performance was horrible. After I replaced the UDF with temp table that contains the results of UDF and used it in join clause, the performance was order of magnitudes better. MS team should fix UDF's to be more reliable.
Posted by Adam Machanic on 4/25/2007 at 5:17 PM
See:

http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx

File Name Submitted By Submitted On File Size  
sql-server-udf-performance-problem.zip (restricted) 4/25/2007 -