The Scalar Expression function would speed performance while keeping the benefits of functions. - by Andrew Novick

Status : 


Sign in
to vote
ID 273443 Comments
Status Active Workarounds
Type Suggestion Repros 11
Opened 4/25/2007 1:47:12 PM
Duplicates 524983 544121 Access Restriction Public


SQL Server scalar User-Defined Functions (UDFs) have a performance problem that could be solved with an enhancement to the SQL Server database engine. Because SQL Server must execute each function on every row, using any function incurs a cursor like performance penalty.  However, since many UDFs are simple, they can often be converted to a single expression.  This suggestion proposes that a new type of UDF, the Scalar Expression UDF, be added to SQL Server.  Doing so would often eliminate the performance penalty paid when using scalar UDFs and allow them to be used more widely.  Using UDFs has the advantage of abstraction that makes code easier to maintain and so making them easier to use has substantial benefits.

No matter how simple a UDF there’s a large performance penalty paid when they’re used.  This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more.  The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing.  It must invoke each UDF on each row.  If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows.  If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing. It’s the row by row processing that seems to slow SQL Server the most. I’ve documented the performance penalty in chapter 11 of my book, Transact-SQL User-Defined Functions, and in the accompanying script.  I've posted a PDF with the relevent section of the book and a complete example script on my site at:
Sign in to post a comment.
Posted by Steven Hibble on 11/2/2017 at 11:27 AM
Joe Sack mentioned in-lining scalar UDFs is being actively worked on, possibly for vNext.
This was mentioned during a PASS Summit 2017 session on Adaptive Query Processing.
Posted by Will Rayer on 2/3/2017 at 1:34 AM
It will soon be the 10th birthday of this idea. Please can Microsoft implement it?.....
Posted by Mike5th on 6/28/2016 at 6:01 AM
Cmon its 2016... In memory doesn't help. Inline is only a 25% bump. This would be awesome for building dimensions. For example Select ProductNum, GetProdDesc(ProductNum), GetProdAttrib(Productnum, Attrib1) as Color, getProdAttrib.... etc etc.

I don't want to create 30 joins to get these attribs on the dimension and I could go etl less. This is a mess in SSIS...

I tried In Memory(Made it slower?) and In line (Select attrib from GetPRodDescIL(ProductNum, Attrib1) ) as Color, ... and its not really any faster and certainly much worse for readability...
Posted by Fotis Mylonas on 4/3/2015 at 4:10 AM
This can be used to improve code readability, if for example you have a certain value for a status or something, its better to write "where status=ActiveStatus()" than "where status=123456789"
Those two expressions should produce the same execution plan, if the only thing that the expression "ActiveStatus" does is to return a fixed integer. If you implement this now by using scalar value functions, you fall into the trap of the performance problems they have. in short, i would like to have the sql server equivelant of the public static const used in programming languages. I know i could implement this by table value functions but i would like to have something even cheaper to just gain code readability in certain cases
Posted by Randy in Marin on 3/2/2015 at 2:18 PM
It really is time to fix the UDF. However, for those cases where we just don't want to repeat an expression in a single query, a UDF is still a bad idea. I don't want to reference an external object just to avoid repeating the expression. Would it not be better to allow an inline variable - query or statement scoped - of some sort to reference the expression? The implementation would be much easier - replace variable with expression and compile. It might not take 15 years.
Posted by Andrew Novick on 11/15/2013 at 7:02 AM
Hi Will,
This is still a useful idea. I use the inline functions workaround all the time but it makes the query difficult to read.
And, yes it would be similar to C macros and the same, or similar to using an inline table workaround, posibly without the joins and other SELECT specific capabilities, just the expression.
So I'm Looking forward to it.

“He that can have patience can have what he will.”
― Benjamin Franklin
Posted by Will Rayer on 11/15/2013 at 6:23 AM
Microsoft has not updated this for almost 6 years - please can we have an update? If the inline scalar UDF is simply expanded in the same way as C macros it should hopefully not be too complicated to implement. Many keen and enthusiastic SQL Server users and evangelists would love to see this idea, so please can Microsoft show their continued committment to the core SQL engine by implementing this idea.
Posted by JRStern on 4/27/2013 at 4:56 PM
An excellent idea. We've all been waiting for the deterministic/non-deterministic difference to be worked correctly since it was announced for SQL 2000!

Like the inline TVF solution, just treat them like macros.

Or hey, implement nearly the same function AS a macro - except that I suppose we then want some way to make the macros global.

BTW I just (re)posted "constants for TSQL" as another suggestion. Using even scalar functions may or may not allow them to be used for constant values, unless either a constant keyword is introduced or the compiler gets a lot smarter.
Posted by Kent Waldrop on 2/5/2013 at 8:04 AM
This issue has been "in the oven" for a while; where exactly does this stand? Is this being looked at for the next iteration of SQL Server?
Posted by DATAPOINT on 10/2/2012 at 8:29 AM
I would also like to see this fixed. Using scalar funciton is sometimes very straightforward solution, but if this compromise performace they are quite useless.
Posted by Mark Guinness on 9/14/2012 at 3:25 PM
This is also an obstacle for using scalar functions for constant values:
Posted by marsovo on 1/25/2012 at 12:42 PM
It's ludicrous that after so many years we still only have such a useless implementation of scalar functions and not an inline version. I wonder how many total man-hours have been wasted on this, when an inline scalar UDF seems like such a staggeringly simple thing to implement in the database engine compared to the other UDF variants. The existing ones are almost guaranteed to screw anyone who comes across them, at least at first, since there's no indication that they slow your performance down by orders of magnitude.
Posted by Professional The on 11/3/2011 at 12:45 PM
I'd like to be able to declare an inline scalar function (above a query). The compiler could desugar it if it wanted to. It would make things more readable in certain cases.
Posted by Arnoud van Bers on 6/23/2011 at 3:04 AM
What is Microsofts view on this as it is now one of the most requested improvements?
Posted by Bob Frasca- on 12/24/2010 at 12:12 PM
I'm a little confused. It isn't a problem specifically with user defined functions. It's ANY scalar function used in the SELECT list or WHERE clause including built-in functions like ISNULL and COALESCE. If a value in a column has to be processed by a function (regardless of the source of the function, i.e. UDF or built-in) in order to determine whether the row needs to be filtered from the result set then every row must be processed. This isn't an engine problem, it's a design problem.

If you have to process a specific column of every row through some kind of filter then I'm not sure how the proposal is going to resolve that. Typically, I've found that this is an architectural issue. For example, if you don't allow NULL's you won't need to use ISNULL or COALESCE. Of course, that isn't always possible and may not always be advisable. The point is that a little foresight during the data architecture phase can help alleviate some of these issues before they happen.

The two workarounds already proposed are excellent choices where these scenarios already exist.
Posted by Gert-Jan Strik on 12/24/2010 at 7:14 AM
I like this proposed solution. However, I'd prefer another solution, or maybe a combination of the two. I propose that the use of deterministic functions is cached.

With a scalar UDF cache in place, if the query selects a lot of rows with (probably) many duplicate input values, the optimizer might choose to transform the query with the scalar UDF into a selfjoined query (spool) that will fetch the value from cache instead of computing the output value. Even with a poor cache hit rate, the deduplicating can potentially speed up the query a lot (depending on the cost of the UDF).

I expected caching of scalar UDF output as early as SQL Server 2005, but it is still lacking.
Posted by abair34 on 6/3/2010 at 3:03 PM
As a DBA I see alot of developers who aren't big sql guys that say hey we can create functions in SQL? Cool now I can wrap this small little piece of reusable code in a function and call it from my queries. It feels native to them and then I have to reel them in and let them know the downsides of it. I'd love to see this fixed!
Posted by Reader Man Me on 5/1/2010 at 12:06 AM
I did not think this would take all these years to resolve, i think MS should look for the features that affect its big product and make them more efficient.
Posted by CheetahAfoot on 3/25/2010 at 2:47 PM
Yep. It would be pretty useful to get this problem solved cause writing table function instead of naturally scalar function is not what I want to do. It's a kind of trick but not a right way obviously. Thanks.
Posted by Jethro on 3/23/2010 at 8:58 PM
Why the microsoft leave this problem so long --almost 4 years, It is very simple to resove it for those UDF that has no data access.
Posted by Microsoft on 12/11/2007 at 11:19 AM

Thank you for sending your feedback to us. We will consider this enhancement in a future release of SQL Server.

-Vineet Rao.
Posted by Mighty-O on 10/15/2007 at 11:17 AM
I echo the desire for SQL-invoked routine and understand one of the elements of the CLR implementation is to satisfy this requirement. The rub is CLR functionality is quite convoluted compared Andrew’s suggestion. The substitution aspect is particularly appealing to this diehard Transact-SQL coder.
Posted by SQLWork on 10/11/2007 at 8:45 AM
In other programming languages, this might be a macro expansion, where an include file contains the macro, which expands into inline code at compile time. (I believe that this was discussed as a possibility in the very early days of Yukon.) In any case, however implemented, Inline Scalar UDF would be a major improvement over the speed-bump that exists now.

I was going to post this myself, but happily add my vote to the existing suggestion.
Posted by Mark Yudkin on 10/11/2007 at 4:28 AM
ANS SQL 99 actualy defines a syntax (section 11.49) for function definition that can be used for an inline scalar function (body consisting of a single RETURN statement) - inlining being an implementation rather than a definitional issue. There are other database products (e.g. IBM DB2 UDB) that already support this, so going the ANS way would assist portability.

The enhancement is clearly necessary.