What a difference a CPU makes
Beyond a certain point, increasing SQL Server's performance is very
difficult--and expensive. You get bottlenecks that you can't get rid of no
matter what you do. SQL Server bottlenecks exist in many places both inside and
outside the physical box. Inside, SQL Server can choke on disk I/O, CPU
resources, memory, and network performance (or other interrupt-driven
peripherals). Outside, SQL Server can suffer from client performance
(workstation horsepower), overall network capacity, software efficiency, and the
ever-present user element, dragusperformancedownium. Filtering particles
of this annoying element out of your system is every administrator's nightmare.
But you can compensate for bottlenecks by tuning your server to its maximum
potential. (For tuning tips, see "More Easy SQL Server Performance Tips," page 88.)
In the Windows NT Magazine Lab, I tweaked and tuned my way to some
performance numbers that show the differences--or lack thereof--between the
Pentium and Pentium Pro microprocessors and Windows NT Server 3.51 and NT Server
4.0 running SQL Server 6.5 on single, dual, and quad-processor systems. For
information about the systems I tested, see "Compaq ProLiant 5000,"
and "NEC ProServa SH." For information about my testbed, see "The Testbed."
Graphs 1A through 8B show two tests on the different systems. Graphs 1A through 4B show performance for single-read activity, and graphs 5A through 8B show mixed read/write activity. The results were surprising:
- You get a performance boost with faster CPUs, but only under heavy load.
- More cache is better. However, 256KB of Level 2 cache on a Pentium Pro is
not enough; 512KB is better.
- SQL Server 6.5 scales evenly to four-CPU configurations. The transactions
per second (TPS) graphs, Graphs 1A through 8A, show that with more CPUs, you can
generally handle more transactions.
- NT Server 4.0 is not faster than NT 3.51 under this type of application
load.
- Pentium Pro scales better than Pentium, and response times are faster. But
the reported enhancements in NT Server 4.0 for Pentium Pro optimization don't
show themselves under SQL Server. On both Pentium and Pentium Pro, NT 3.51 and
NT 4.0 perform virtually the same, as TPS graphs 1A through 8A show.
- Five or six drives for a data volume is enough. Beyond that number, you
won't improve performance. The biggest difference comes from separating all the
types of I/O activity onto different physical drives.
Cache
More cache is better. In a test described in "Diamond Flower Doubleshot"
(February 1996), Andy Smith found that a dual-Pentium server with a shared 512KB
cache (the DFI Doubleshot) performed much worse than a dual-processor server
with independent cache modules (the ProLiant 4500). The latter is a more
expensive solution, but it significantly boosts performance. Now you have to
choose how much cache per CPU you need.
My tests for this month's issue showed that a Pentium Pro (the NEC ProServa
SH) with the on-chip 512KB Level 2 cache did not produce a significantly faster
response time on a higher transaction count than the Pentium Pro with the 256KB
on-chip cache under heavy load at the same clock speed. For example, Graphs 8A
and 8B show no marked improvement over Graphs 7A and 7B under heavy load.
However, the design of the NEC may have been the limiting factor: For example,
the ProLiant 5000 with the 512KB Level 2 cache showed a significant difference
in scaleability from the NEC ProServa SH, as you can see by comparing Graph 6A
to Graph 8A. The Compaq ProLiant 4500--a Pentium system with 2MB cache modules
for each CPU--showed lower TPS and higher average response time (ART) than the
Pentium Pros, but it still performed well in these tests. As Screen 1 shows, the
4500's overall CPU utilization for single read and mixed read/write activity was
higher than the ProLiant 5000. However, although the ProLiant 4500 did not have
as high a TPS value or as low a response time, it handled the load without
breaking down.
You really see a difference at four CPUs: The ProLiant Pentium Pro with the
512KB Level 2 cache wins by a landslide, with the best TPS and response times.
The 256KB Pentium Pro has a real problem in two- and four-CPU configurations
under heavy load, as graphs 3 and 7 show. CPU utilization is 100 percent at 300
users on all systems we tested, but the 256KB Pentium Pro system falls down on
the job. The right side of Screen 2 shows that the 256KB Pentium Pro simply
can't handle the load. Toward the end of each test run under heavy load, the
system basically stopped processing requests.
Intel is aware of this problem and recommends that you use 256KB Pentium
Pros only in single- and dual-processor servers and workstations. Many vendors,
such as NEC and Compaq, have responded by ceasing sales of symmetric
multiprocessing (SMP) enterprise servers with more than two CPUs with 256KB
cache. The price differential to get 512KB is more than $2000 per CPU, but the
performance problems you face otherwise make increased cache essential.
Disks
For your data volumes, five or six drives are the effective limit for
improving performance. Compaq has found that more than six drives in a RAID 0
stripe set do not improve performance. The limitations on SCSI (six devices on
one standard SCSI channel) and performance hits for duplexing multiple
controllers mean that adding drives to a data volume won't help.
My tests confirmed this result: For my 500MB data set, I saw no system
enhancement going from two, to four, to five drives for the data volume on the
Compaq ProLiant 4500. Screen 3 shows %Disk Time maxed out during the mixed
read/write tests. You can see that disk usage on the Compaq ProLiant 5000 was
high, although I/Os per second (not shown) were within an acceptable level (less
than 20 per second, well within the tolerance of 80 I/Os per second for a system
with a hardware RAID controller).
If you need to increase capacity, look at controllers with multiple
channels on which you can create large arrays, use multiple controllers, or use
larger drives such as 4.3GB Fast and Wide drives or the new 9GB ones. To improve
performance, you can use faster drives and faster controllers, but remember that
disk I/O is only one component of overall system performance.
Operating System
Microsoft made significant claims about the performance enhancements in
Windows NT 4.0 over NT 3.51, including better SMP scaleability, optimization for
Pentium Pro, and overall speed improvements. Pre-release and beta information
claimed that NT 4.0 would be as much as two times faster than NT 3.51.
I found that with SQL Server 6.5, NT 4.0 offers only slightly better
performance than NT 3.51, on either Pentium or Pentium Pro, as TPS graphs 1A
through 8A show. Is this lack of improvement a limitation of NT or the
application running on top of it? The answer is both.
Windows NT 4.0 has a smaller footprint and better network throughput than
NT 3.51. But additional features, such as the new GUI and extra services (new
network protocols, etc.) negate these advantages.
The OS can go only as fast as the application it runs, and even Microsoft
has found that SQL Server doesn't perform better on NT 4.0 than on NT 3.51. If
an application isn't programmed to use the new features and enhancements in NT
4.0, the OS isn't going to run better. Perhaps SQL Server 7.0 or another
database program will respond differently--we'll just have to wait and see.
What about NT 4.0's optimization for Pentium Pro? This feature means that
NT 4.0 runs faster than NT 3.51 on a Pentium Pro than on the Pentium. But I
didn't get that result. The improvement I saw was only about 5 percent to 10
percent, and even that improvement wasn't consistent. On the Pentium Pro with
the 256KB Level 2 cache, NT 4.0 responded with lower TPS and higher ART at high
loads than did NT 3.51, as graphs 3A and 7A and 3B and 7B show. In screens 2 and
4, you can see that the NEC ProServa SH with 256KB of Level 2 cache performed
much better under NT 3.51 than under NT 4.0, as evidenced by the smooth scaling
of CPU usage. Upgrading to 512KB of cache smoothed performance under both NT
3.51 and NT 4.0. You get more features with NT 4.0, but not better performance.
Pentium Pro offers about a 60 percent capacity improvement over the Pentium.
Under mixed read/write activity, the ProLiant Pentium Pro ran at 33 TPS, and the
ProLiant Pentium stopped at about 21 TPS. The differences were about the same
for single-read activity.
Again, CPU utilization shows the differences: The Pentium worked harder
with less performance than the Pentium Pro, as shown in screens 1 and 5. As you
can see in Screen 5, the ProLiant 5000 scaled evenly from low to high user
loads, and overall performance didn't degrade at the high end, although response
times rose. Machine slowdown under heavy load is expected, but the computer
shouldn't choke. The Pentium Pro system operated at much lower response times,
even under very heavy load. The cache in the Pentium Pro makes a difference to
TPS values, but not so much to response time.
Many systems will perform the same under light load, because the system is
not fully stressed. So when you look for differences between architectures or
systems from different vendors, you need to fully stress each subsystem. For
example, if you want to see how much better one CPU will perform than another,
you have to run it at nearly 100 percent CPU usage, or you won't see any
difference in performance. You can see this difference in my TPS graphs. For 150
or fewer users, all systems perform similarly, so a faster CPU doesn't buy you
performance improvement. Under light load, upgrading CPUs from Pentium to
Pentium Pro will give you slightly lower response times, but you won't get a
huge jump in performance. Under heavy load, you can add capacity and
significantly improve performance by upgrading the CPUs.
SMP Scaleability
The good news is that SQL Server 6.5 on both Windows NT 3.51 and NT 4.0
scales, and unlike Exchange Server 4.0 (described in "Optimizing Exchange
to Scale on NT," November 1996), SQL Server scales consistently from one to
four CPUs. Throwing more CPUs at a SQL system is a good investment for handling
more transactions and reducing response times.
Contrary to what you'd expect, scaleability is the same for both NT 3.51
and NT 4.0. And, let me repeat that the 256KB Pentium Pro has problems at
four-CPU configurations.
Under a Microscope
What does all this information mean to you, your upgrading and purchasing
decisions, and your users? Well, how much money do you have?
Upgrading database servers is a complicated issue. With some thoughtful
analysis, you can cut through the hype and mumbo jumbo to find what's right for
you. Here are some things to keep in mind.
My tests are by no means the be-all and end-all of the SQL world, but they
point to some interesting issues. If you are not already bottlenecking on CPU or
memory resources, disk I/O, or network throughput, how do you improve database
server performance? You can redesign and optimize your SQL applications (better
code, more efficient procedures) as I describe in "More Easy SQL Server
Tuning Tips," you can tune SQL somewhat, or you can throw more memory at
the application. But you'll reach a point after which you can't improve
performance. Only buying more stuff for your server will make a difference. So,
what do you buy?
In midrange environments (about 300 users) with moderate data sets (5GB to
20GB), extra CPUs are a good investment, as is memory. Get plenty of memory so
that SQL has enough to operate in, maybe even enough to run with TempDB fully in
RAM, and enough so that NT does not do any paging. Use Perfmon to watch SQL and
system counters, and tune appropriately.
Upgrading to Windows NT 4.0 will buy you features but not performance.
However, upgrading won't hurt you, either. You need to weigh the cost of new
licensing fees against the easier and more efficient administration of NT 4.0.
Do you need to replace your existing Pentium servers with Pentium Pro
systems? This decision depends on how much money you've got. If your Perfmon
analyses don't show that you are already bottlenecking on CPU resources, adding
faster CPUs will give you only an incremental performance improvement. Weigh
this option against the cost of replacing an entire server. If you have a heavy
load, Pentium Pros will probably improve response times and give you more
capacity, depending, of course, on your transaction mix. Compare graph pairs 1
and 5 to 2 and 6 as an example of how a move to Pentium Pros can improve
performance. A mostly disk-intensive environment still won't get a big boost
from more CPU resource. Be aware of the cache issues described earlier: 200MHz
Pentium Pros with 256KB cache modules drag in SMP systems.
The decision to add drives is a little easier because you can do so without
replacing your whole system. A new, faster disk or RAID controller will help,
especially if you use NT's software-based RAID capabilities or have an old
server. Many new systems ship with Fast and Wide SCSI controllers. Improving
disk I/O is generally a great way to improve system performance inexpensively,
but only to a point (six drives for one data volume). Try breaking up SQL and
system operations across multiple drives and controllers or SCSI channels.
Beyond upgrading CPUs, you can change the brand of your server. Compare
graph pairs 2 and 6 to 4 and 8. My tests show you'll find performance
differences between platforms from certain vendors using the same types of
hardware (CPUs, disk controllers, etc.). The scope of the differences depends on
your load and the specific design and architectures of the machines.
CPU-to-memory bus width, bus design, and many other factors influence system
performance. You'll also find differences in upgradeability, feature sets,
service and support, and price. The choice depends on your needs.
Capacity vs. Raw Performance or Speed
When analyzing your system's performance or the new system you want to buy,
be clear about what you're looking for. Capacity addresses how many users you
can support or how many transactions per second your server can offer.
Performance or speed, in my mind, represents how fast your server processes
requests and returns results (i.e., average response time): What is your
processing power, and what overhead is available?
You, of course, want both capacity and speed. But you generally can't have
your cake and eat it, too. You must consider the tradeoffs between how much work
your system can do and how fast your system can do that work.
Graphs 1 through 8 show that, with certain system types and configurations
(and, depending on load and transaction types), you can have low response times
with steadily increasing TPS values. You can throw more work at the system and
not take a performance hit.
You reach a point where the TPS values level off and the response times
start rising markedly, that point is where your capacity runs out. If you're
lucky, your system won't start losing transactions at that point; it will just
get slower. Now is when you need to add CPUs, memory, or other resources
according to where you find the first performance bottlenecks.
You can squeeze more performance out of your database server with some
intelligent tuning, system configuration, and upgrades. First, look at your
setup to determine whether you have the optimal configuration. Do you have
enough network bandwidth (do you need to add a segment to the server?) and
enough memory allocated to Windows NT and SQL Server (is it balanced?), and are
the disks laid out properly? Next, have you done all you can through software to
fully optimize your system: Have you tuned SQL parameters and NT and written
efficient SQL code?
If you've done all these things, look at upgrading the hardware: Add disks,
add memory, add CPUs, or replace the server with a faster architecture (new
generation Intel chips or RISC). Always look for bottlenecks in your whole
system, on the client side and server side, before buying new stuff. Be sure the
components you buy will actually improve system performance; if you aren't
already maxing out a given resource, upgrading it won't help much.
End of Article
--Kirk R. Erichsen
Kirk R. Erichsen August 12, 1999