I am going to take you through my journey for Capacity Planning in this entry. Given the topic, this will be quite a lengthy post so if you’re really in a big hurry and don’t care to understand the intricate details of the capacity planning equation you could just skip this article and go directly to the link forSPCAP – SharePoint Capacity Planning Tool.
Now as any seasoned SharePoint Architect would tell you, one of the key components to planning your SharePoint deployment, is Capacity Planning. Unfortunately, Capacity Planning is such a confusing topic and there’s not a whole lot of really good documentation around it. What is available can be confusing to say the least. In addition, there are so many factors that affect your planning in this area that it is almost impossible for anyone to come up with a solid number. Your performance would be affected by the server hardware, the client hardware, the concurrency rate of your users, the throughput rate required, the response time your organization deems acceptable etc. These are just the factors that we actually CAN calculate into our formulae. Additionally, your general network load and other factors that could cause interference with your expected performance will from time to time wreck havoc with your stats, but for the most part we can make a good judgment call as to the type of farm setup that would be required in most cases.
I STRONGLY recommend that you at least try and read the Capacity Planning Guide on TechNet, found here: http://technet2.microsoft.com/Office/en-us/library/eb2493e8-e498-462a-ab5d-1b779529dc471033.mspx?mfr=true
Now once you read this, you may come away with a clear understanding of your capacity needs, but if you’re like most of us, me included, you’ll come way from this even more confused than before.
So why did I recommend you read it?
Because I do not want you to blindly rely on my formulae as the golden rule for capacity planning.
My formulae generally are on the conservative side. For example, instead of using the “Read” transaction numbers in my calculations, I use the “Mixed” numbers which is about half as much and as a result, my formulae would recommend you into a 2×1 farm long before a calculation done using the “Read” numbers. Of course, I believe that, especially on an Intranet scenario, transactions will in fact be “Mixed” rather than “Read”. On an Internet scenario, the “Read” numbers would in most cases be accurate, but by playing it safe, the formulae would rather give you too much horsepower than too little. Now on to the guide…
When you get to the “Estimate performance and capacity requirements…” section you’ll note firstly the hardware recommendations. As you can see, the server setup that was used to provide us with actual transaction throughput benchmarks are all 64 bit servers. As a result, their performance is going to be far superior to 32 bit servers so if you’re deploying 32 bit servers, be aware of this as everything is subject to the hardware specified in the guide. We find the recommended hardware being:
WFE Server - 2 x Dual Core Intel Xeon 2.8 GHz CPU, 4 GB RAM, 64 bit SQL Server - 4 x Dual Core Intel Xeon 2.8 GHz CPU, 32 GB RAM, 64 bit Client PC - 1 x Pentium 3 1.2 GHz CPU, 1 GB RAM, 32 bit
Note the massive amount of memory in the SQL Server. If you server has less memory, be aware that it will impact your performance.
Next you’ll find a section on “Usage profile” which is really not relevant to our calculations but is only intended to provide you some perspective on the test environment from which the statistics was drawn. Go ahead and skip by this section.
Next you’ll find a section on “Hardware recommendations”. This is really confusing for many because we now have two sets of hardware specs on the same page. All this section is doing is to list the recommended specs for your SharePoint farm even though the test environment hardware was vastly superior. Go ahead and skip by this section.
The “Estimating throughput targets” section provides you a description of how to estimate your required throughput. What makes this section confusing is the fact that it contains references to both Requests per Second as well as Requests per Hour. The important section here is the Analyzing Log Files (IIS 6.0) section which will help you analyze your current SharePoint 2003 traffic and determine the targets you need.
The “Estimate throughput targets” section is an important section for us because it contains the performance matrix that we will use as the basis of our formulae in calculating our farm requirement. We’re only really interested in the first two columns noting the farm configuration in the first and the Requests per Second in the second column. These are the base numbers we use in our formulae. Given the drop off in performance beyond a 5×1 configuration, we don’t even consider anything beyond that.
In the “Estimate user response time” section the important thing to note is the three levels of response time i.e. 3-5 seconds, 1-2 seconds and sub 1 second response time that are available. There’s a grid that attempts to draw the throughput and response times together, but it just confuses most people. Just note the response time options for now.
In the “Estimate concurrency rate” section, the really tricky question comes up. How many users are using the system at the same time? We have to estimate this unless you have some cool tools that does stats analysis that can actually give you the numbers, but generally using a 10% concurrency rate is a good rule of thumb.
The subsequent sections talk about the indexing window, disk space requirements and performance monitoring. You can skip over this at this point.
OK, so we’ve got all this data. Now how do we convert it into useful information i.e. a server farm requirement number? This is where I had to dust of the good old high school algebra. The first principle in calculating algebraic formulae is to get everything into a consistent unit of measure i.e. compare apples to apples not oranges.
We begin with the Estimated Throughput Target (ETT) which is simply a question of how heavily your users use the environment. ETT is usually expressed in Requests per Hour (RPH) and is based on a Throughput Concurrency Rate (TCR). As we mentioned, a TCR of 10% is a good rule of thumb value to use. Based on a TCR of 10%, usage levels are defined as follows:
- Light usage = 20 RPH
- Typical usage = 36 RPH
- Heavy usage = 60 RPH
- Extreme usage = 120 RPH
Taking the Farm Throughput Capacity (FTC) grid we have a RPS value for each farm configuration. Apples to apples remember? So we need to multiply that number by 3,600 to get it into a RPH number that we can use with our existing ETT values. Doing that we get:
- 1×1 = 180,000 RPH
- 2×1 = 356,400 RPH
- 3×1 = 414,000 RPH
- 4×1 = 432,000 RPH
- 5×1 = 489,600 RPH
Now we take the Estimate Response Time (ERT) values grid. If we use the 1,000 user line as our benchmark we take the 0.7 RPS, multiply it by 3,600 seconds in an hour and divide it by 1,000 users and we’d get a RPH value per user. We do the same for all three response time levels and we get:
- Slow = 3-5 seconds, server target = 2.52 RPH per user
- Recommended = 1-2 seconds, server target = 3.6 RPH per user
- Fast = <1 second, server target = 4.32 RPH per user
Lastly we are back to User Concurrency (UC) i.e. 1000 users with 100 actively using the system = 10% concurrency. Don’t mistake this for the Throughput Concurrency Rate (TCR). They are not the same. Each will be represented in our formulae but they will each have the same value i.e. if you have a UC of 20%, then your TCR must also be adjusted to 20% in order to keep your statistics accurate.
As the guide mentions, determine the volume of data that can be indexed during the off hours window when no users are on the system so as to avoid indexing affecting performance during the day.
It is now time to put all this into an equation. We know that the Farm Throughput Capacity (FTC) divided by the number of concurrent users times the Estimated Throughput Target (ETT) times the User Concurrency (UC) proportionate to the Throughput Concurrency Rate (TCR) is equal to the Estimated Response Time (ERT). Since we have a stats matrix of Farm Throughput Capacity (FTC) values, all we need to do is determine our scenario’s FTC value and match it up with the table to determine the farm we need. The equation translates thus:
FTC/((Users*UC)*ETT*(UC/TCR))=ERT FTC ----------------------- = ERT (Users*UC)*ETT*(UC/TCR) FTC = ERT*(Users*UC)*ETT*(UC/TCR) FTC ERT*Users*UC*ETT*UC --- = ------------------- 1 TCR
In our example, we’re trying to solve the question of a 2000 user count with 25% concurrency requiring Normal response times under Heavy usage would require what farm? So substituting the values into the equation we get
FTC = (3.6*(2000*0.25)*60*0.25)/0.25 FTC = 108,000 which can be handled by a 1x1 farm.
OK, so to make this whole thing easier and to save everyone else a whole lot of time, I created a quick web application that prompts you for the input and then determines the final farm configuration for you. I call it SPCAP. Try it and leave your feedback here.
Trackback from your site.