Home Dashboard Directory Help
Search

Invoke-WebRequest / getElementsByTagName is incredibly slow on some machines by DanTup


Status: 

Active


4
0
Sign in
to vote
Type: Bug
ID: 778371
Opened: 2/2/2013 11:00:00 AM
Access Restriction: Public
1
Workaround(s)
view
1
User(s) can reproduce this bug

Description

I've posted full details and a sample on StackOverflow here:

http://stackoverflow.com/questions/14202054/why-is-this-powershell-code-invoke-webrequest-getelementsbytagname-so-incred/14657508?iemail=1

I have some code that uses Invoke-WebRequest to fetch a web page, and then use .ParsedHtml.body.getElementsByTagName to scrape some info from the site.

However, the getElementsByTagName call seems to take a *long* time for me (around 1.5seconds per call) on all three machines I've tested it on (Windows 7 Home Premium, Windows 7 Enterprise, Windows 8). Other people see sub-100 millisecond times running the same code.

There's more discussion about this issue on Google+ here:
https://plus.google.com/communities/114336958783305019912

I can't find any common pattern between machines that do/don't have this issue. It occurs for me with no PS profile, in both ISE and the console. I have VS 2012 (and therefore .NET 4.5) installed on two of the machines, but not the third.
Details
Sign in to post a comment.
Posted by Cyreli on 3/12/2014 at 10:40 AM
Do we have an explanation of why it works fine on x86 but not in x64 ?
Posted by Jeffrey P Snover [MSFT] on 3/10/2014 at 3:04 PM
Lee Holmes was able to repro the issue and here is his writeup:

The issue is that he’s piping COM objects into another cmdlet – in this case, Select-Object. When that happens, we attempt to bind parameters by property name. Enumerating property names of a COM object is brutally slow – so we’re spending 86% of our time on two very basic CLR API calls:

(…)
// Get the function description from a COM type
typeinfo.GetFuncDesc(index, out pFuncDesc);
(…)
// Get the function name from a COM function description
typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp);
(…)

We might be able to do something smart here with caching.

A workaround is to not pipe into Select-Object, but instead use language features:

    # Grab the rows from the table, skipping the first row (column headers)
    $allRows = @($slotTable.getElementsByTagName("tr"))
    $rows = $allRows[1..$allRows.Count]


I hope that helps
Posted by Cyreli on 3/9/2014 at 8:31 PM
Ran some test on this, and I can reproduce, base on this script : http://stackoverflow.com/questions/14202054/why-is-this-powershell-code-invoke-webrequest-getelementsbytagname-so-incred/14657508?iemail=1

The problem (see below for details) comes from the 64 bits mode, everything is fast in x86 mode


So on powershel 4.0

Name             : Windows PowerShell ISE Host
Version         : 4.0
InstanceId     : c0cca308-450e-4922-9836-26f6f16b3b4c
UI             : System.Management.Automation.Internal.Host.InternalHostUserInterface
CurrentCulture : en-US
CurrentUICulture : en-US
PrivateData     : Microsoft.PowerShell.Host.ISE.ISEOptions
IsRunspacePushed : False
Runspace         : System.Management.Automation.Runspaces.LocalRunspace


Powershell -STA
Processing time is between 900 ms and 1600 ms

Powershell -MTA
Processing time is between 900 ms and 1600 ms

Running the same script within ISE (64 bit)
Processing time is between 900 ms and 1600 ms

Running the same script within ISE (x86)
Processing time is between 15 ms and 30 ms !!!
Sign in to post a workaround.
Posted by Jeffrey P Snover [MSFT] on 3/10/2014 at 3:05 PM
The problem is piping COM objects into another cmdlet – in this case, Select-Object. When that happens, we attempt to bind parameters by property name. Enumerating property names of a COM object is brutally slow.
A workaround is to not pipe into Select-Object, but instead use language features:

    # Grab the rows from the table, skipping the first row (column headers)
    $allRows = @($slotTable.getElementsByTagName("tr"))
    $rows = $allRows[1..$allRows.Count]