Better, stronger, faster … arrays

Using System.Collections.ArrayList to make faster arrays

PowerShell variables are extremely versatile, especially when using them as an array even if it is just using $var=@() and += to add new elements. One thing you will soon notice however is additions become very slow as the array grows in size, this is due to the array being fixed size on creation and to overcome the size limit, += creates a new array as a copy of the old data plus the new data. This results in extra memory reads and slows the process as the array grows.

So, how do we get around this ? Well there are a couple of options

For ease of use, I prefer the System.Collections.ArrayList .Net class as you do not need to specify data types (and I’ve become use to it). Below are examples of quickly using each method, sending 10001 new bits of data into the array.

# Initialise the arrays
$alist = New-Object System.Collections.ArrayList
$b=@()

#to add
0..10000 | %{$alist.add($_)} | out-null
0..10000 | %{$b+=$_}

Why the | out-null ? because this class will output the index of the element just added.

PS D:\RichosPowerShell> $alist = New-Object System.Collections.ArrayList
PS D:\RichosPowerShell> $alist.add("hello")
0
PS D:\RichosPowerShell> $alist.add("world")
1
PS D:\RichosPowerShell> $alist[1]
world
PS D:\RichosPowerShell>

So you have just tried this and yeah the list is a tad faster, but so what its only about ~.5 second quicker.. well then, lets try this.

for ($i = 0; $i -lt 3; $i++){
	Measure-Command {0..300000 | %{$alist.add($_)} | out-null} | select seconds,milliseconds
}
for ($i = 0; $i -lt 3; $i++){
	Measure-Command {0..10000 | %{$b+=$_} | out-null} | select seconds,milliseconds
}

I’ve run the command 3 times, each run was around 2.5 seconds for the ArrayList and the normal method took 2.2, 8.8 and lastly 15.9.. but did you also notice I increased the amount of elements being added each time in the ArrayList by 290,000.

Array_Performance

Proper fast.

Also of note, both methods consumed the same % of CPU on this quad core hyper-thread machine, but for a total of 7.5 seconds vs 30. The data used in this example isn’t going to tax the system that much, but memory can become a concern if you’re manipulating large data sets which is another plus for ArrayList.

You can find more information on this, and other methods at https://powershell.org/2013/09/16/powershell-performance-the-operator-and-when-to-avoid-it/

Leave a comment