≡ Menu

SCCM collection evaluations take a long time to evaluate

We have a decent sized Config Manager 2012 SP1 environment (40,000+ clients), and a fair amount of device collections. As more of our software distribution folks created queries for their device queries, we started to see more frequent hourglasses on collections.

With a large environment, the first thing we did was to disable incremental updates on pretty much all of our device collections. There are certain times where this setting is useful (even necessary), but too many  collections with it will cause performance issues.  See Best Practices for Collections in Configuration Manager for more information

Even with incremental updates disabled, we still would have periods of long delays with collection evaluations.  There were a few steps I took to track down troublesome collections and move toward getting the collection evaluator under control.

A few quick assumptions:

  1. If you have a CAS, then you have already verified that replication is working effectively. Replication issues will delay collection evaluations.
  2. Your SQL server is appropriately scoped for your environment, and it is not under a heavy load.
  3. This is only looking at the collection evaluator process itself, and finding the collections containing queries that were performing poorly. If you don’t find anything out of the ordinary with these methods, then open a ticket with Microsoft.

Preparation: Increase the size of the Collection Evaluator logs

Configure the SMS_Collection_Evaluator component logs to be larger. By default, they are set to 2 MB, and I set ours to 30 MB to keep a few days worth of information.

  1. In the Config Manager console, go to Monitoring, System Status, and select Component Status.
  2. From the ribbon at the top, click Start –> Configuration Manager Service Manager (CMSM).
  3. Under Components, select SMS_COLLECTION_EVALUATOR.
  4. On the right pane, right-click it and choose Logging. Modify as needed and click OK. You can then close the CMSM.

Method 1: The Historical Method

With the logs increased, we are able to see how long collections have taken to evaluate recently, or in the recent past. You will notice entries in the colleval.log file similar to this:

PF: [Primary Evaluator] successfully evaluated collection [ABC00274] and used 0.343 seconds

Note that it may also show entries for Express Evaluator, or Auxilliary Evaluator. You can open up the colleval.log file in Notepad (you Wally Mead fans) and search for seconds in the log, but this will take some time if you have a lot of collections and/or entries. A better method is to open it in CMTrace and filter on lines containing seconds , but you are still viewing more that you need. The best way is to only view only the worst times, and I used PowerShell for this:

$i = 0
$hash = @{}
Switch -Regex (Get-Content -Path "W:\Program Files\Microsoft Configuration Manager\Logs\colleval.log")
{'^.*\d{3,*}\.\d{1,3} seconds' {$evalline = $switch.current
                                $evalline | Out-File c:\temp\longeval.txt -Append
                                $hash.Add($i, $evalline)
                                $i += 1
                                }

}
$hash

I basically searched the collecval.log for any lines over 99 seconds (three digits or more before the decimal, in the decimal number preceeding the word seconds).  Any line that matched, I dumped it to a hash table to display immediately (or do whatever you want to do), and then logged it to c:\temp\longeval.txt to view later.

Method 2: The “In the Moment” Method

Your manager comes to you and states that he is trying to update the collection he created, and it is taking FOREVER to get rid of the hourglass icon. He wants to know what is happening NOW.

First of all, check the colleval.log file to see if you see anything strange, like many, many collections evaluating sequentially.  If that is not happening, then you can run a T-SQL query to see what is happening right now with the Collection Evaluator process. This can be a bit tricky, because by the time you get SQL Server Management Studio opened and are ready to run the query, you may have missed the problem. If you are already in there and waiting for any problems, or you have some really nasty queries taking upwards of 30 minutes to evaluate (yes, we had a few), then you can use this method.

First, run the following query against your CM database:

select distinct spid, program_name, cpu
from sys.sysprocesses
where program_name like ‘SMS_COLLECTION_EVALUATOR_%’ and cpu > 1000

This should show you any collection evaluator process that has been chugging for a while. If it returned something, then take the SPID number — the first column returned — and immediately run (for SPID 73, for example):

dbcc inputbuffer(73)

The EventInfo column will have a SQL Query as its contents.  For easier reading, I highlight it, and Ctrl-C Ctrl-V to a text editor.  Your query may look something like this:

;WITH LimitedBaseTable as (
 select res.* from [vSMS_R_System] as res 
join CollectionMembers as cm on res.ItemKey=cm.MachineID and cm.SiteID=N'ABC0000D' 
)
 insert into #CollTemp(MachineID,ArchitectureKey,Name,SMSID,Domain ,IsClient) 
select  all SMS_R_SYSTEM.ItemKey,SMS_R_SYSTEM.DiscArchKey,SMS_R_SYSTEM.Name0,SMS_R_SYSTEM.SMS_Unique_Identifier0,SMS_R_SYSTEM.Resource_Domain_OR_Workgr0,SMS_R_SYSTEM.Client0 from LimitedBaseTable AS SMS_R_System INNER JOIN v_CH_ClientSummary AS SMS_G_System_CH_ClientSummary ON SMS_G_System_CH_ClientSummary.ResourceID = SMS_R_System.ItemKey  INNER JOIN SoftwareInventoryStatus AS SMS_G_System_LastSoftwareScan ON SMS_G_System_LastSoftwareScan.ClientId = SMS_R_System.ItemKey  INNER JOIN WorkstationStatus_DATA AS ___System_WORKSTATION_STATUS0 ON ___System_WORKSTATION_STATUS0.MachineID = SMS_R_System.ItemKey
where ((SMS_R_System.Netbios_Name0 not  in (select  distinct SMS_R_System.Netbios_Name0 from LimitedBaseTable AS SMS_R_System INNER JOIN vSMS_G_System_SoftwareFile AS SMS_G_System_SoftwareFile ON SMS_G_System_SoftwareFile.ClientId = SMS_R_System.ItemKey   where SMS_G_System_SoftwareFile.FileName = N'123456.exe') AND SMS_R_System.Obsolete0 <> N'1') AND SMS_G_System_CH_ClientSummary.ClientActiveStatus = 1)

Note that this particular query wan’t too bad, but I will use it as an example for the next part.  Notice that the last line contains:

  where SMS_G_System_SoftwareFile.FileName = N’123456.exe’

 I want to know what collections have queries for 123456.exe.  Now I run the following T-SQL query:

select * from v_Collections
where CollectionID in
(select CollectionID from collection_rules where collectionID in
(select collectionid from collection_rules_sql
where sql like‘%123456.exe%’
))

This will list the collections that are referencing 123456.exe in their query, and should allow you to narrow down to which one it is.

Method 3: The Predictive Method

Many times, a poorly running collection is because of a costly selection criteria, such as using NOT LIKE. With this in mind, you can run the following T-SQL query to find potentially poorly performing collection evaluations:

(select vc.SiteID as [CollID],
vc.CollectionName,
‘Not_Like’ as CAT,
s.SQL
from v_Collections vc
inner join Collection_Rules_SQL s
on vc.CollectionID=s.CollectionID
where
s.SQL like ‘%not like%’)

UNION

(select vc.SiteID as [CollID],
vc.CollectionName,
‘Not_Equal’
as CAT,
s.SQL
from v_Collections vc
inner join Collection_Rules_SQL s
on vc.CollectionID=s.CollectionID
where s.SQL like ‘%<>%’)

Note that this will give you collections that PROBABLY will take some time to evaluate.  You can use the first method to see exactly how long a particular collection takes (Update Membership on the collection in the console, and then wait for it to show up in the colleval.log).

Real-life example: someone once was looking for computers that did not have a particular software installed.  The appropriate method would have been to create a collection of devices that have that software installed, and create a second collection for all devices that should have that software installed, and exclude the first collection.  However, this person decided to go straight for the NOT LIKE  collection, with criteria something akin to machines with installed software’s name was NOT LIKE %ABC Software% (NOTE: don’t ever do this):

select *  from  SMS_R_System inner join SMS_G_System_INSTALLED_SOFTWARE on SMS_G_System_INSTALLED_SOFTWARE.ResourceId = SMS_R_System.ResourceId where SMS_G_System_INSTALLED_SOFTWARE.ProductName not like “%ABC Software%”

This nasty query returns every device in the limiting collection (since every computer has at least ONE piece of software that does not match %ABC Software%), plus it had the added bonus of running for about an hour in our environment, since its return set on installed software was the entire installed software recordset of a few million rows (minus a handful of records)!

Hopefully these couple of methods will help you track down long-running collection evaluations.

{ 0 comments… add one }

Leave a Comment