Blog de Dario Leonel May: Performance and Threshold Counters for Exchange Server 2010

Good morning!!
Here the most important performance counters and their thresholds is critical to establishing a performance baseline and monitoring plan to proactively monitor your Exchange 2010 environment and troubleshoot and resolve issues when they arise

back to Performance and Threshold Counters for Exchange Server 2010 - All in One

Counters

Threshold

Troubleshooting

Processor and Process Counters

*Processor(_Total)\% Processor Time*	Should be less than 75% on average.
Shows the percentage of time that the processor is executing application or operating system processes. This is when the processor is not idle.	Should be less than 75% on average.

*Processor(_Total)\% User Time*	Should remain below 75%.
Shows the percentage of processor time that is spent in user mode.
User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems.

*Processor(_Total)\% Privileged Time*	Should remain below 75%.
Shows the percentage of processor time that is spent in privileged mode. Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. It allows direct access to hardware and all memory.	Should remain below 75%.

**Process()\% Processor Time***		If total processor time is high, use this counter to determine which process is causing high CPU.
Shows the percentage of elapsed processor time that all process threads used to execute instructions. An instruction is the basic unit of execution in a computer; a thread is the object that executes instructions; and a process is the object created when a program is run. Code executed to handle some hardware interruptions and trap conditions are included in this count.

*System\Processor Queue Length (all instances)*	Should not be greater than 5 per processor	On a computer with a single processor, observations where the queue length is greater than 5 are a warning that there is frequently more work available than the processor can handle readily. When this number is greater than 10, it is a strong indicator that the processor is at capacity, particularly when coupled with high CPU utilization. On systems with multiprocessors, divide the queue length by the number of physical processors. A multiprocessor system configured using hard processor affinity (processes are assigned to specific CPU cores), which have large values for the queue length, can indicate that the configuration is unbalanced. Although Processor Queue Length typically is not used for capacity planning, it can be used to identify if systems within the environment are capable of running the loads or if additional processors or faster processors should be purchased for future servers.
Indicates the number of threads each processor is servicing.
Processor Queue Length can be used to identify if processor contention or high CPU utilization is caused by the processor capacity being insufficient to handle the workloads assigned to it. Processor Queue Length shows the number of threads that are delayed in the Processor Ready Queue and are waiting to be scheduled for execution. The value listed is the last observed value at the time the measurement was taken.

Memory Counters

*Memory\Available Mbytes*	Should remain above 100 MB at all times.
Shows the amount of physical memory, in megabytes (MB), immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free, and zero page lists. For a full explanation of the memory manager, refer to Microsoft Developer Network (MSDN) or "System Performance and Troubleshooting Guide" in the Windows Server 2003 Resource Kit.	Should remain above 100 MB at all times.

*Memory\Pool Nonpaged Bytes*	Not applicable.	Normally not looked at, unless connection counts are very high because each TCP connection consumes nonpaged pool memory.
Consists of system virtual addresses that are guaranteed to be resident in physical memory at all times and can thus be accessed from any address space without incurring paging input/output (I/O). Like paged pool, nonpaged pool is created during system initialization and is used by kernel-mode components to allocate system memory.	Not applicable.

*Memory\Pool Paged Bytes*	Not applicable.	Monitor for increases in pool paged bytes indicating a possible memory leak.
Shows the portion of shared system memory that can be paged to the disk paging file. Paged pool is created during system initialization and is used by kernel-mode components to allocate system memory.	Not applicable.

*Memory\Cache Bytes*	Not applicable.	Should remain steady after applications cache their memory usage. Check for large dips in this counter, which could attribute to working set trimming and excessive paging. Used by the content index catalog and continuous replication log copying.
Shows the current size, in bytes, of the file system cache. By default, the cache uses up to 50 percent of available physical memory. The counter value is the sum of Memory\System Cache Resident Bytes, Memory\System Driver Resident Bytes, Memory\System Code Resident Bytes, and Memory\Pool Paged Resident Bytes.	Not applicable.

*Memory\Committed Bytes*	Not applicable.	Determines the amount of committed bytes in use.
Shows the amount of committed virtual memory, in bytes. Committed memory is the physical memory that has space reserved on the disk paging files. There can be one or more paging files on each physical drive. This counter displays the last observed value only; it is not an average.	Not applicable.	Determines the amount of committed bytes in use.

*Memory\%Committed Bytes in Use*		If this value is very high (more than 90 percent), you may begin to see commit failures. This is a clear indication that the system is under memory pressure.
Shows the ratio of Memory\Committed Bytes to the Memory\Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced. This counter displays the current percentage value only; it is not an average.

Memory Paging Counters

*Memory->Transition Pages Repurposed/sec*	Should be less than 100 on average. Spikes should be less than 1,000.
Indicates system cache pressure.

*Memory\Page Reads/sec*	Should be less than 100 on average.
Indicates data must be read from the disk instead of memory. Indicates there is not enough memory and paging is beginning. A value of more than 30 per second means the server is no longer keeping up with the load.

*Memory\Pages/Sec*	Should be below 1,000 on average.	The values that are returned by the Pages/sec counter may be more than you expect. These values may not be related to either paging file activity or cache activity. Instead, these values may be caused by an application that is sequentially reading a memory-mapped file. Use Memory\Pages Input/sec and Memory\Pages Output/sec to determine page file I/O.
Shows the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) and non-cached mapped memory files.

*Memory\Pages Input/sec*	Should be below 1,000 on average.
Shows the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\Pages Input/sec to the value of Memory\Page Reads/sec to determine the average number of pages read into memory during each read operation.

*Memory\Pages Output/sec*	Should be below 1,000 on average.
Shows the rate at which pages are written to disk to free space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, and not code. A high rate of pages output might indicate a memory shortage. Microsoft Windows writes more pages back to disk to free up space when physical memory is in short supply. This counter shows the number of pages, and can be compared to other counts of pages, without conversion.

Process Memory Consumption Counters

**Process()\Private Bytes***	Not applicable.	This counter can be used for determining any memory leaks against processes. For the information store process, compare this counter value with database cache size to determine if there is a memory leak in the information store process. An increase in information store private bytes, together with the same increase in database cache, equals correct behavior (no memory leak).
Shows the current number of bytes this process has allocated that cannot be shared with other processes.

**Process()\Virtual Bytes***	Not applicable	Used to determine if processes are consuming a large amount of virtual memory.
Represents (in bytes) how much virtual address space the process is currently consuming.

Process Working Set Counter

*Process(_Total)\Working Set*	Not applicable.	Large increases or decreases in working sets causes paging. Ensure that the paging file is set to the recommended value of RAM+10. If working sets are being trimmed, add Process(*)\Working set to see what processes are affected. This counter could indicate either system-wide or process-wide issues. Cross-reference this counter with Memory\System Cache Resident Bytes to see if system-wide working set trimming is occurring
Shows the current size, in bytes, of the working set of this process. The working set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the working set of a process event if they are not in use. When free memory falls below a threshold, pages are trimmed from working sets. If they are needed, they will be soft-faulted back to the working set before leaving main memory.

Process Handle Counter

**Process()\Handle Count***	Not applicable	An increase in handle counts for a particular process may be the symptom of a faulty process with handle leaks, which is causing performance issues on the server. This is not necessarily a problem, but is something to monitor over time to determine if a handle leak is occurring.
Shows the total number of handles currently open by this process. This number is the sum of the handles currently open by each thread in this process.

.NET Framework Counters

**.NET CLR Memory()\% Time in GC***	Should be below 10% on average.	If this counter increases to a high value, there might be some objects that are surviving Gen 1 garbage collections and being promoted to Gen 2. Gen 2 collections require a full global catalog for clean up. Add other .NET memory counters to determine if this is the case.
Shows when garbage collection has occurred. When the counter exceeds the threshold, it indicates that CPU is cleaning up and is not being used efficiently for load. Adding memory to the server would improve this situation.

**.NET CLR Exceptions()\# of Excepts Thrown / sec***	Should be less than 5% of total RPS (Web Server(_Total)\Connection Attempts/sec * .05).	Exceptions should only occur in rare situations and not in the normal control flow of the program. This counter was designed as an indicator of potential performance problems due to a large (>100 sec) rate of exceptions thrown. This counter is not an average over time; it displays the difference between the values observed in the last two samples divided by the duration of the sample interval.
Displays the number of exceptions thrown per second. These include both .NET exceptions and unmanaged exceptions that get converted into .NET exceptions. For example, the null pointer reference exception in unmanaged code would get thrown again in managed code as a .NET System.NullReferenceException; this counter includes both handled and unhandled exceptions.

**.NET CLR Memory()\# Bytes in all Heaps***	Not applicable.	These regions of memory are of type MEM_COMMIT. (For details, see Platform SDK documentation for VirtualAlloc.) The value of this counter is always less than the value of Process\Private Bytes, which counts all MEM_COMMIT regions for the process. Private Bytes minus # Bytes in all Heaps is the number of bytes committed by unmanaged objects. Used to monitor possible memory leaks or excessive memory usage of managed or unmanaged objects.
Shows the sum of four other counters: Gen 0 Heap Size, Gen 1 Heap Size, Gen 2 Heap Size, and the Large Object Heap Size. This counter indicates the current memory allocated in bytes on the GC Heaps.

Network Counters

*Network Interface()\Bytes Total/sec**	For a 100-MBps network adapter, should be below 6–7 MBps.
Indicates the rate at which the network adapter is processing data bytes.	For a 1000-Mbps network adapter, should be below 60–70 Mbps.
This counter includes all application and file data, in addition to protocol information such as packet headers.

**Network Interface()\Packets Outbound Errors***	Should be 0 at all times.
Indicates the number of outbound packets that could not be transmitted because of errors.	Should be 0 at all times.

*TCPv4\Connections Established*	Not applicable.	Determines current user load.
*TCPv6\Connections Established*
Shows the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.
The number of TCP connections that can be established is constrained by the size of the nonpaged pool. When the nonpaged pool is depleted, no new connections can be established.

*TCPv4\Connection Failures*	An increasing number of failures, or a consistently increasing rate of failures, can indicate a bandwidth shortage.
*TCPv6\Connection Failures*
Shows the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.

*TCPv4\Connections Reset*	An increasing number of resets or a consistently increasing rate of resets can indicate a bandwidth shortage.	Some browsers send TCP reset (RST) packets, so be cautious when using this counter to determine reset rate.
*TCPv6\Connections Reset*
Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

Exchange Domain Controller Connectivity Counters

**MSExchange ADAccess Caches()\LDAP Searches/Sec***	Not applicable.	Used to determine current LDAP search rate.
Shows the number of Lightweight Directory Access Protocol (LDAP) search requests issued per second.	Not applicable.	Used to determine current LDAP search rate.

*MSExchange ADAccess Domain Controllers()\LDAP Read Time**	Should be below 50 ms on average.
Shows the time in milliseconds (ms) to send an LDAP read request to the specified domain controller and receive a response.	Spikes (maximum values) should not be higher than 100 ms.

*MSExchange ADAccess Domain Controllers()\LDAP Search Time**	Should be below 50 ms on average.
Shows the time (in ms) to send an LDAP search request and receive a response.	Spikes (maximum values) should not be higher than 100 ms.

*MSExchange ADAccess Processes()\LDAP Read Time**	Should be below 50 ms on average.
Shows the time (in ms) to send an LDAP read request to the specified domain controller and receive a response.	Spikes (maximum values) should not be higher than 100 ms.

*MSExchange ADAccess Processes()\LDAP Search Time**	Should be below 50 ms on average.
Shows the time (in ms) to send an LDAP search request and receive a response.	Spikes (maximum values) should not be higher than 100 ms.

*MSExchange ADAccess Domain Controllers()\LDAP Searches timed out per minute**	Should be below 10 at all times for all roles.
Shows the number of LDAP searches that returned LDAP_Timeout during the last minute.	Higher values may indicate issues with Active Directory resources.

*MSExchange ADAccess Domain Controllers()\Long running LDAP operations/Min**	Should be less than 50 at all times.
Shows the number of LDAP operations on this domain controller that took longer than the specified threshold per minute. (Default threshold is 15 seconds.)	Higher values may indicate issues with Active Directory resources.

-Dario

Blog de Dario Leonel May

Tuesday, March 22, 2011

Performance and Threshold Counters for Exchange Server 2010 - Common Counters

No comments:

Post a Comment

Total Pageviews

About Me