PowerShell should have set operations - by Joel 'Jaykul' Bennett

Status : 

 


5
0
Sign in
to vote
ID 844926 Comments
Status Active Workarounds
Type Suggestion Repros 0
Opened 4/1/2014 5:15:45 PM
Access Restriction Public

Description

It would be great if there was (at least simple) support for set operations in PowerShell. 

Specifically, these sorts of pipelines are very slow:

$SelectedItems = $UserInput | where { $_ -in $AllValidItems } # intersection
$InvalidItems = $UserInput | where { $_ -notin $AllValidItems } # difference
$AllItems = @($AllItems) + @($UserInput) | Select -unique # union

# Even set equality is complicated:
$AreEqual = $ValidItems.Length -eq $UserInput.Length -and $(
·   for($i=0;$i-lt$UserInput.Length;$i++) { $UserInput[$i] -eq $ValidItems[$i] } ) -notcontains $False

# Even if I don't care about order:
$AreEqual = $ValidItems.Length -eq $UserInput.Length -and ($UserInput | ? { $_ -in $ValidItems }).Length -eq $UserInput.Length 


These would all be much easier to write CORRECTLY if we could write set unions, intersections, differences, etc.
These would also perform MUCH FASTER if they were based on http://msdn.microsoft.com/en-us/library/bb546153.aspx




As an example of why we need these, Select -Unique takes over 12 seconds to do something that Enumerable.Distinct().ToArray() would do in about a tenth of a second. That means that the obvious pipeline method listed above for doing a union takes 1000x as long as calling Enumerable.Union(...).ToArray() on the same data.

## This command is my own, and the details of its implementation are irrelevant (it does what it says on the box)
<# 493: PS#> $initials = Get-RandomString -Length 3 -Chars 'A-Z' -Count 10000 -Unique
<# 494: PS#> $initials = Get-RandomString -Length 3 -Chars 'A-Z' -Count 25

<# 495: PS#> $AllItems = [System.Linq.Enumerable]::ToArray([System.Linq.Enumerable]::Union([psobject[]]$Initials, [psobject[]]$tests))
<# 496: PS#> $AllItems.length
10012

<# 497: PS#> $AllItems = @($Initials) + @($Tests) | Select -unique # union
<# 498: PS#> $AllItems.length
10012

<# 499: PS#> $AllItems = @($Initials) + @($Tests) #| Select -unique # union
<# 500: PS#> $AllItems.length
10025

<# 501: PS#> $AllItems = [System.Linq.Enumerable]::ToArray([System.Linq.Enumerable]::Distinct([psobject[]]$AllItems))
<# 502: PS#> $AllItems.length
10012

<# 503: PS#> $AllItems = @($Initials) + @($Tests) #| Select -unique # union
<# 504: PS#> $AllItems = $AllItems | Select -unique # union

## This command, like Get-RandomString above, is my own, and the details of it's implementation are irrelevant:
<# 505: PS#> Get-PerformanceHistory -id 495, 497, 499, 501, 504 | ft id, Duration, Command* -auto

 Id Duration  Command
 -- --------  -------
495 0.01097s  $AllItems = [System.Linq.Enumerable]::ToArray([System.Linq.Enumerable]::Union([psobject[]]$Initials, [psobject[]]...
497 12.37078s $AllItems = @($Initials) + @($Tests) | Select -unique # union
499 0.00302s  $AllItems = @($Initials) + @($Tests) #| Select -unique # union
501 0.01203s  $AllItems = [System.Linq.Enumerable]::ToArray([System.Linq.Enumerable]::Distinct([psobject[]]$AllItems))
504 12.40679s $AllItems = $AllItems | Select -unique # union
Sign in to post a comment.