Quantcast
Channel: PowerShell
Viewing all articles
Browse latest Browse all 15370

Powershell scripts read html content by using Html-Agility-Pack

$
0
0

Sorry for limited knowledge with powershell. Here I try to use "Html-Agility-Pack" to read html content from a website, and output as csv file.

Here is link of  "Html-Agility-Pack"

 I can successful download whole html code with this powershell script, but show out error during parsing part :

Text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$url = "http://cloudmonitor.ca.com/en/ping.php?varghost=www.silogix.fr&vhost=_&vaction=ping&ping...;
$Path = "c:\temp\Pingtest.htm"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path

Add-Type -Path "C:\temp\HtmlAgilityPack.1.4.6\Net20\HtmlAgilityPack.dll"

$webGraber = New-Object -TypeName HtmlAgilityPack.HtmlWeb
$webDoc = $webGraber.Load("c:\temp\Pingtest.htm")
$Thetable = $webDoc.DocumentNode.ChildNodes.Descendants('table') | where {$_.XPath -eq '/div[3]/div[1]/div[5]/table[1]/table[1]'}

$trDatas = $Thetable.ChildNodes.Elements("tr")

Remove-Item "c:\temp\Pingtest.csv"

foreach ($trData in $trDatas)
{
  $tdDatas = $trData.elements("td")
  $line = ""
  foreach ($tdData in $tdDatas)
  {
    $line = $line + $tdData.InnerText.Trim() + ','
  }
  $line.Remove($line.Length -1) | Out-File -FilePath "c:\temp\Pingtest.csv" -Append
}
$ie.Quit()

Get html code, something like this:

HTML
class="light-grey-bg"> ........
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
 ........
  
class="right-dotted-border">Stockholm, Sweden (sesto01):class="right-dotted-border">id="cp20">Okayclass="right-dotted-border">id="minrtt20">21.8class="right-dotted-border">id="avgrtt20">21.8class="right-dotted-border">id="maxrtt20">21.9id="ip20">2a00:1288:f00e:1fe::3001

But the error show out:

Text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
You cannot call a method on a null-valued expression.
At C:\temp\test_script.ps1:21 char:41
+ $trDatas = $Thetable.ChildNodes.Elements <<<< ("tr")
    + CategoryInfo          : InvalidOperation: (Elements:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
 
Remove-Item : Cannot find path 'C:\temp\Pingtest.csv' because it does not exist.
At C:\temp\test_script.ps1:23 char:12
+ Remove-Item <<<< "c:\temp\Pingtest.csv"
    + CategoryInfo          : ObjectNotFound: (C:\temp\Pingtest.csv:String) [Remove-Item], ItemNotFoundExc 
   eption
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.RemoveItemCommand
 
You cannot call a method on a null-valued expression.
At C:\temp\test_script.ps1:27 char:30
+   $tdDatas = $trData.elements <<<< ("td")
    + CategoryInfo          : InvalidOperation: (elements:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
 
You cannot call a method on a null-valued expression.
At C:\temp\test_script.ps1:31 char:43
+     $line = $line + $tdData.InnerText.Trim <<<< () + ','
    + CategoryInfo          : InvalidOperation: (Trim:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
 
Exception calling "Remove" with "1" argument(s): "StartIndex cannot be less than zero.
Parameter name: startIndex"
At C:\temp\test_script.ps1:33 char:15
+   $line.Remove <<<< ($line.Length -1) | Out-File -FilePath "c:\temp\Pingtest.csv" -Append
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : DotNetMethodException

Couldn't figure out what are the errors mean..


Viewing all articles
Browse latest Browse all 15370

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>