Add a built-in table of numbers - by Erland Sommarskog

Status : 

 


206
5
Sign in
to vote
ID 258733 Comments
Status Active Workarounds
Type Suggestion Repros 6
Opened 2/18/2007 1:26:18 PM
Access Restriction Public

Description

There are several SQL problems that can be solved with a table of numbers.
This is a simple one-column table with numbers from 1 and up. It's easy to
construct and fill such a table. However, there are a couple of problems
with it:
1) If you query needs more numbers than there are in the table, you 
   get incorrect results.
2) Same thing happens if some smartass delets rows in the middle - or adds
   a zero to the table.
3) There are queries where the optimizer does not know what the table is for 
  and makes incorrect assumptions.

The latter calls for some explanation. Assume this function:

CREATE FUNCTION inline_split_me(@param nvarchar(MAX))
RETURNS TABLE AS
RETURN(SELECT ltrim(rtrim(convert(nvarchar(4000),
                  substring(@param, Number,
                            charindex(N',' COLLATE Slovenian_BIN2,
                                      @param + N',', Number) -
                            Number)
              ))) AS Value
       FROM   Numbers
       WHERE  Number <= convert(int, len(@param))
         AND  substring(N',' + @param, Number, 1) = N',' COLLATE Slovenian_BIN2)

This function is used to convert a comma-separated list of numbers to a
table. If I say.

CREATE PROCEDURE get_company_names_inline @customers nvarchar(2000) AS
   SELECT C.CustomerID, C.CompanyName
   FROM   Northwind..Customers C
   JOIN   inline_split_me(@customers) s ON C.CustomerID = s.Value
go
EXEC get_company_names_inline 'ALFKI,BONAP,CACTU,FRANK'

and have one million rows in my Numbers table, the optimizer thinks that
the Numbers table will return 300000 rows. Had Numbers been a 
built-in table, the optimzier would have known the hit rate exactly.
Sign in to post a comment.
Posted by Wim SQL Server on 4/2/2015 at 7:56 AM
CREATE FUNCTION [dbo].[fn_Nums](@n AS BIGINT)
RETURNS TABLE
WITH SCHEMABINDING,ENCRYPTION
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A, L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A, L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A, L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A, L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5)
SELECT n FROM Nums WHERE n <= @n;
GO
Posted by way0utwest on 12/1/2014 at 7:55 AM
This is fairly trivial to develop, but also the issues raised above (integrity) are potentially a problem in app code. Think SQL Injection or some other tampering that wouldn't be the case in a native numbers table.

While this would be a nice backport, I would argue this is fairly minor work for SQL Server 2014+ (in an SP) and giving us a table with an integer's worth of numbers, would be very handy. I would suggest that the table be a DMV visible in every database.
Posted by Jason Brimhall (MCM) on 11/26/2014 at 6:31 AM
This would be handy even if it it were no larger than a single record and additional functionality were added to SQL Server such as "Connect BY" and so forth to allow for larger virtual numbers tables could be accessed. Many uses are out there for such a table and if there is any doubt, take a look at the DUAL table in Oracle.
Posted by Ed Wagner on 11/26/2014 at 4:42 AM
Personally, I think this relatively simple addition is long overdue. The Tally table has been called the Swiss Army Knife of SQL and I have to agree. It can be used for everything from splitting strings to date generation. An ITVF that's built into the product would allow for optimization that's probably beyond what DBAs can do. I've been using a table of numbers for years and consider it completely normal, as do many others in the SQL Server community. Please consider the addition of this feature to the product.
Posted by Jeff Moden on 11/24/2014 at 9:27 PM
Howdy there, Microsoft,

This suggestion is going to be 8 years old in 3 months. It would be wonderful to celebrate that anniversary with a CU for both 2012 and 2014 that contains this wonderful suggested tool. It shouldn't take you long.

I'll also suggest that it take two parameters so that folks can start it "0" if they need to (which is more common than you'd think).

Thank you for your time.
Posted by Cade Roux on 3/19/2013 at 1:14 PM
Please, please, please.
Posted by SAinCA on 11/27/2012 at 12:04 PM
Having had the "joy" of splitting a multi-XML-document proprietary 3rd-party supplied file of over 3MB, my Numbers table was 12M+ rows, just in case a heavy ordering day bumped the file size...

On SQL2012, using page-level compression, it consumes over 130MB.

If, as current best practices appear to recommend, our SQL Servers are stacked with huge quantities of RAM, 130MB in, say 120GB isn't going to matter a great deal...

If Microsoft gave us a "standard set" of Numbers that is configurable, but which could be bumped on demand and shrunken automatically to the standard-set-size, memory pressure would be reduced while still delivering high-speed data-access.

My Dates table stretches from 2008 (birth of system) to 2099, comprising 33K rows of 32 columns (at present) and 16MB compressed, with alternate indexes. Again, a triflingly small memory "hog".

I'm less persuaded that a Microsoft-supplied Dates table is a "need" simply because the slices and representations of dates in short/long/week/month/quarter, etc. are really site-specific (hence my 32 columns!).

+1 for the Numbers table...
Posted by Andomar on 6/11/2012 at 1:19 PM
Neither identity nor sequence will help here. It is hard to believe you even read the question.

+1
Posted by Martin Smith on 3/14/2012 at 4:17 PM
I see Postgres has the "generate_series" function for this need.
Posted by Martin Smith on 1/1/2011 at 6:00 PM
Glad to see this suggestion. This functionality would help with many of the perennial problems that come up on news groups and forums.
Posted by MichaelSmith on 12/14/2010 at 2:54 PM
Please read feedback id 347442. Such a built-in function would have (potentially) significant performance impacts, lacking in user defined functions/solutions (mentioned feedback id has examples).
Posted by Toni Salomäki on 11/25/2010 at 11:53 PM
A little bit off-topic, but could I also get the same thing for dates? It just seems so stupid to create table that has dates so that you can join it with other tables to get one row for each day even if data doesn't exist for that day.
Posted by Erland Sommarskog on 8/5/2008 at 12:58 PM
Indeed, the table would only be logical, it would not be stored on disk.
Posted by Erik Eckhardt on 7/15/2008 at 9:59 AM
Why would anything need to be stored on disk? Make the query engine intelligently produce the numbers on demand. It would be an in-memory operation, perhaps not even needing all rows to be materialized, as in the case of splitting a string on a delimiter where the majority of the numbers are discarded.

The optimizer could pay attention to how the numbers are used and do an in-memory loop through the string rather than creating a table, temp or otherwise. The table is uniform and regular, and re-generating it each time in a tight +1 loop is probably faster than building the whole thing. It's like a kind of native index where you know in advance where everything is because the lookup index is exactly equal to the lookup value.

Look at it this way. What would the query engine have to do to satisfy this query:

SELECT Num FROM Numbers WHERE Num = 123

It wouldn't have to hit the Numbers table at all: the input IS the result. The answer can be given at parse time. Similarly,

SELECT Num FROM Numbers WHERE Num BETWEEN 1 AND 100000

Again, why hit the table at all? It already knows the desired result, the numbers 1- 100000. It would be far more efficient to just generate the values in memory than go read them somewhere. (give me 1st value: okay, got 1. Give me 2nd value, okay, got 2. ...)

Many problems are easily solvable with a numbers table that are difficult or impossible otherwise.
Posted by Jeff Moden on 7/7/2008 at 5:11 AM
Adding a built in table of numbers from 1 to 2 billion would certainly take up a lot of disk space especially if you consider the fact that most people never use one with a count over 8000 and those that do almost never use one over a million.

What I'd rather see is a whole number "sequence generator" where you could identify the StartValue, EndValue, and optional Increment which would default to 1. And, it should be able to order the values in an ascending or descending fashion as if it had a Clustered Primary Key doing so.
Posted by Adam Haines on 7/3/2008 at 8:58 AM
This would be a great asset and I believe it to be worth the storage requirement. 100,000 is a good number to default to and if more numbers are needed; I am sure some sort of system sp that builds the table dynamically. The bottom line is a lot of people are not acclimated with the concept of a numbers table and are missing out of all the performance gains from using one. Having MS built-in numbers table not only exposes the numbers table, but also allows those who do not want to manage an additional table to reap the benefits.
Posted by SQL_Menace on 7/3/2008 at 6:11 AM
I would say 100000 by default
Another option would be if that 100000 is not good enough for you there would be a built in proc that will generate this table for you, you would have to pass in the start and end value and the proc would generate this table
Posted by MarcF on 2/27/2008 at 11:40 AM
Where do you stop with the content of the numbers table?

1-100? 1-1000000? What if I need 10billion? What if I need numbers up 2^64?? More?

Seems like an awful waste of space for something that may or may not be used.

As per workaround, use a function. (would be nice if that function was built-in..maybe that's what they mean by 'sequence'?)
Posted by Erland Sommarskog on 2/2/2008 at 2:11 AM
It does not seem that you understood the request. This definitely has nothing to do with
IDENTITY. I am less versed about sequences, but I don't think they will cut it either. If you think
it does, maybe you could provide an example? Take this problem: For the Orders table in
Northwind, write a query that lists the number of orders for all days in 1997. The result set should
include all 365 days, and list zero for days without a number.

This is a typical problem where you need a table of of numbers (or dates). While it's easy to
create such a table, I argue in this request that there would be a number of advantages of
having such a table (function) builtin. For more details on the optimization problems, see my
article on Arrays and Lists in SQL Server, http://www.sommarskog.se/arrays-in-sql.html.

Posted by Microsoft on 2/1/2008 at 7:18 PM
Thanks for the valuable suggestion.

This seems more like adding the sequence support which we're seriously considering for the next major release.

Meanwhile, would using identity column help?