Home Dashboard Directory Help
Search

Add a built-in table of numbers by Erland Sommarskog


Status: 

Active


131
5
Sign in
to vote
Type: Suggestion
ID: 258733
Opened: 2/18/2007 1:26:18 PM
Access Restriction: Public
4
Workaround(s)
view

Description

There are several SQL problems that can be solved with a table of numbers.
This is a simple one-column table with numbers from 1 and up. It's easy to
construct and fill such a table. However, there are a couple of problems
with it:
1) If you query needs more numbers than there are in the table, you
get incorrect results.
2) Same thing happens if some smartass delets rows in the middle - or adds
a zero to the table.
3) There are queries where the optimizer does not know what the table is for
and makes incorrect assumptions.

The latter calls for some explanation. Assume this function:

CREATE FUNCTION inline_split_me(@param nvarchar(MAX))
RETURNS TABLE AS
RETURN(SELECT ltrim(rtrim(convert(nvarchar(4000),
                 substring(@param, Number,
                            charindex(N',' COLLATE Slovenian_BIN2,
                                     @param + N',', Number) -
                            Number)
             ))) AS Value
     FROM Numbers
     WHERE Number <= convert(int, len(@param))
         AND substring(N',' + @param, Number, 1) = N',' COLLATE Slovenian_BIN2)

This function is used to convert a comma-separated list of numbers to a
table. If I say.

CREATE PROCEDURE get_company_names_inline @customers nvarchar(2000) AS
SELECT C.CustomerID, C.CompanyName
FROM Northwind..Customers C
JOIN inline_split_me(@customers) s ON C.CustomerID = s.Value
go
EXEC get_company_names_inline 'ALFKI,BONAP,CACTU,FRANK'

and have one million rows in my Numbers table, the optimizer thinks that
the Numbers table will return 300000 rows. Had Numbers been a
built-in table, the optimzier would have known the hit rate exactly.
Details
Sign in to post a comment.
Posted by Cade Roux on 3/19/2013 at 1:14 PM
Please, please, please.
Posted by SAinCA on 11/27/2012 at 12:04 PM
Having had the "joy" of splitting a multi-XML-document proprietary 3rd-party supplied file of over 3MB, my Numbers table was 12M+ rows, just in case a heavy ordering day bumped the file size...

On SQL2012, using page-level compression, it consumes over 130MB.

If, as current best practices appear to recommend, our SQL Servers are stacked with huge quantities of RAM, 130MB in, say 120GB isn't going to matter a great deal...

If Microsoft gave us a "standard set" of Numbers that is configurable, but which could be bumped on demand and shrunken automatically to the standard-set-size, memory pressure would be reduced while still delivering high-speed data-access.

My Dates table stretches from 2008 (birth of system) to 2099, comprising 33K rows of 32 columns (at present) and 16MB compressed, with alternate indexes. Again, a triflingly small memory "hog".

I'm less persuaded that a Microsoft-supplied Dates table is a "need" simply because the slices and representations of dates in short/long/week/month/quarter, etc. are really site-specific (hence my 32 columns!).

+1 for the Numbers table...
Posted by Andomar on 6/11/2012 at 1:19 PM
Neither identity nor sequence will help here. It is hard to believe you even read the question.

+1
Posted by Martin Smith on 3/14/2012 at 4:17 PM
I see Postgres has the "generate_series" function for this need.
Posted by Martin Smith on 1/1/2011 at 6:00 PM
Glad to see this suggestion. This functionality would help with many of the perennial problems that come up on news groups and forums.
Posted by MichaelSmith on 12/14/2010 at 2:54 PM
Please read feedback id 347442. Such a built-in function would have (potentially) significant performance impacts, lacking in user defined functions/solutions (mentioned feedback id has examples).
Posted by Toni Salomäki on 11/25/2010 at 11:53 PM
A little bit off-topic, but could I also get the same thing for dates? It just seems so stupid to create table that has dates so that you can join it with other tables to get one row for each day even if data doesn't exist for that day.
Posted by Erland Sommarskog on 8/5/2008 at 12:58 PM
Indeed, the table would only be logical, it would not be stored on disk.
Posted by Erik Eckhardt on 7/15/2008 at 9:59 AM
Why would anything need to be stored on disk? Make the query engine intelligently produce the numbers on demand. It would be an in-memory operation, perhaps not even needing all rows to be materialized, as in the case of splitting a string on a delimiter where the majority of the numbers are discarded.

The optimizer could pay attention to how the numbers are used and do an in-memory loop through the string rather than creating a table, temp or otherwise. The table is uniform and regular, and re-generating it each time in a tight +1 loop is probably faster than building the whole thing. It's like a kind of native index where you know in advance where everything is because the lookup index is exactly equal to the lookup value.

Look at it this way. What would the query engine have to do to satisfy this query:

SELECT Num FROM Numbers WHERE Num = 123

It wouldn't have to hit the Numbers table at all: the input IS the result. The answer can be given at parse time. Similarly,

SELECT Num FROM Numbers WHERE Num BETWEEN 1 AND 100000

Again, why hit the table at all? It already knows the desired result, the numbers 1- 100000. It would be far more efficient to just generate the values in memory than go read them somewhere. (give me 1st value: okay, got 1. Give me 2nd value, okay, got 2. ...)

Many problems are easily solvable with a numbers table that are difficult or impossible otherwise.
Posted by Jeff Moden on 7/7/2008 at 5:11 AM
Adding a built in table of numbers from 1 to 2 billion would certainly take up a lot of disk space especially if you consider the fact that most people never use one with a count over 8000 and those that do almost never use one over a million.

What I'd rather see is a whole number "sequence generator" where you could identify the StartValue, EndValue, and optional Increment which would default to 1. And, it should be able to order the values in an ascending or descending fashion as if it had a Clustered Primary Key doing so.
Posted by Adam Haines on 7/3/2008 at 8:58 AM
This would be a great asset and I believe it to be worth the storage requirement. 100,000 is a good number to default to and if more numbers are needed; I am sure some sort of system sp that builds the table dynamically. The bottom line is a lot of people are not acclimated with the concept of a numbers table and are missing out of all the performance gains from using one. Having MS built-in numbers table not only exposes the numbers table, but also allows those who do not want to manage an additional table to reap the benefits.
Posted by SQL_Menace on 7/3/2008 at 6:11 AM
I would say 100000 by default
Another option would be if that 100000 is not good enough for you there would be a built in proc that will generate this table for you, you would have to pass in the start and end value and the proc would generate this table
Posted by MarcF on 2/27/2008 at 11:40 AM
Where do you stop with the content of the numbers table?

1-100? 1-1000000? What if I need 10billion? What if I need numbers up 2^64?? More?

Seems like an awful waste of space for something that may or may not be used.

As per workaround, use a function. (would be nice if that function was built-in..maybe that's what they mean by 'sequence'?)
Posted by Erland Sommarskog on 2/2/2008 at 2:11 AM
It does not seem that you understood the request. This definitely has nothing to do with
IDENTITY. I am less versed about sequences, but I don't think they will cut it either. If you think
it does, maybe you could provide an example? Take this problem: For the Orders table in
Northwind, write a query that lists the number of orders for all days in 1997. The result set should
include all 365 days, and list zero for days without a number.

This is a typical problem where you need a table of of numbers (or dates). While it's easy to
create such a table, I argue in this request that there would be a number of advantages of
having such a table (function) builtin. For more details on the optimization problems, see my
article on Arrays and Lists in SQL Server, http://www.sommarskog.se/arrays-in-sql.html.

Posted by Microsoft on 2/1/2008 at 7:18 PM
Thanks for the valuable suggestion.

This seems more like adding the sequence support which we're seriously considering for the next major release.

Meanwhile, would using identity column help?
Sign in to post a workaround.
Posted by Jeff Moden on 6/25/2013 at 10:34 PM
I'm not sure if this is the right place to post this but using a rcursive CTE for this is NOT a viable workaround. The whole purpose of a Tally table is performance and a recurisive CTE is going to be as bad or worse than a While loop and it's going to use a whole lot more resources. Please see the following article for why you should NOT use recursive CTE's that count.
http://www.sqlservercentral.com/articles/T-SQL/74118/

Posted by Bob Sovers1 on 8/9/2010 at 1:52 PM
Here is a VERY fast CLR Streaming Table-Valued function of the most generic function (inputs: min, max, step)

    [Microsoft.SqlServer.Server.SqlFunction(
        DataAccess = DataAccessKind.None,
        IsDeterministic = true, IsPrecise = true,
        SystemDataAccess = SystemDataAccessKind.None,
        FillRowMethodName = "FillRow_ReturnValues",
        TableDefinition = "n Int")]
    public static IEnumerable Tf_SeqStep(SqlInt32 min, SqlInt32 max, SqlInt32 step)
    {
        if ((min.IsNull) || (max.IsNull) || (step.IsNull))
        {
            yield break;
        }
        ReturnValues Vals = new ReturnValues();

        for (int i = min.Value; i <= max.Value; i += step.Value)
        {
            Vals.Value = i;
            yield return Vals;
        }
    }

    public struct ReturnValues
    {
        public int Value;
    }

    private static void FillRow_ReturnValues(object obj, out SqlInt32 TheValue)
    {
        ReturnValues ReturnVals = (ReturnValues)obj;
        TheValue = ReturnVals.Value;
    }

Posted by Vladimir Nesterovsky on 2/25/2007 at 12:49 AM
On the other hand you can use a function:

/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 1 to a specified value.
*/
create function dbo.Numbers
(    
/* Number of rows to return. */
@count int
)
returns table
as
return
with Number4(Value) as
(
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0
),
Number8(Value) as
(
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4
),
Number32(Value) as
(
    select 0 from Number8 N1, Number8 N2, Number8 N3, Number8 N4
)
select top(@count) row_number() over(order by Value) Value from Number32;

It's faster than numbers table.
Posted by Vladimir Nesterovsky on 2/19/2007 at 8:50 AM
You can use a function:

/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 1 to a specified value.
*/
create FUNCTION dbo.[Numbers]
(    
/* Number of rows to return. */
@count int
)
RETURNS TABLE
AS
RETURN
with numbers(value) as
(
select 0
union all
select value * 2 + 1 from numbers where value < @count / 2
union all
select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
row_number() over(order by U.v) value
from
numbers cross apply (select 0 v) U

See also: http://www.nesterovsky-bros.com/weblog/PermaLink,guid,9892ab3e-2a3a-4255-b3be-39fda53bdf16.aspx