40/sec to 500/sec

Introduction

Surprised, by the title? well, this is a tour of how we cracked the scalability jinx from handling a meagre 40 records per second to 500 records per second. Beware, most of the problems we faced were straight forward, so experienced people might find this superfluous.
Contents

* 1.0 Where were we?

1.1 Memory hits the sky
1.2 Low processing rate
1.3 Data loss :-(
1.4 Mysql pulls us down
1.5 Slow Web Client

* 2.0 Road to Nirvana

2.1 Controlling memory!
2.2 Streamlining processing rate
2.3 What data loss uh-uh?
2.4 Tuning SQL Queries
2.5 Tuning database schema
2.5 Mysql helps us forge ahead!
2.6 Faster...faster Web Client

* 3.0 Bottom line

Where were we?

Initially we had a system which could scale only upto 40 records /sec. I could even recollect the discussion, about "what should be the ideal rate of records? ". Finally we decided that 40/sec was the ideal rate for a single firewall. So when we have to go out, we atleast needed to support 3 firewalls. Hence we decided that 120/sec would be the ideal rate. Based on the data from our competitor(s) we came to the conclusion that, they could support around 240/sec. We thought it was ok! as it was our first release. Because all the competitors talked about the number of firewalls he supported but not on the rate.

Memory hits the sky

Our memory was always hitting the sky even at 512MB! (OutOfMemory exception) We blamed cewolf(s) inmemory caching of the generated images.But we could not escape for long! No matter whether we connected the client or not we used to hit the sky in a couple of days max 3-4 days flat! Interestingly,this was reproducible when we sent data at very high rates(then), of around 50/sec. You guessed it right, an unlimited buffer which grows until it hits the roof.

Low processing rate

We were processing records at the rate of 40/sec. We were using bulk update of dataobject(s). But it did not give the expected speed! Because of this we started to hoard data in memory resulting in hoarding memory!

Data Loss :-(

At very high speeds we used to miss many a packet(s). We seemed to have little data loss, but that resulted in a memory hog. On some tweaking to limit the buffer size we started having a steady data loss of about 20% at very high rates.

Mysql pulls us down

We were facing a tough time when we imported a log file of about 140MB. Mysql started to hog,the machine started crawling and sometimes it even stopped responding.Above all, we started getting deadlock(s) and transaction timeout(s). Which eventually reduced the responsiveness of the system.

Slow Web Client

Here again we blamed the number of graphs we showed in a page as the bottleneck, ignoring the fact that there were many other factors that were pulling the system down. The pages used to take 30 seconds to load for a page with 6-8 graphs and tables after 4 days at Internet Data Center.

Road To Nirvana

Controlling Memory!

We tried to put a limit on the buffer size of 10,000, but it did not last for long. The major flaw in the design was that we assumed that the buffer of around 10000 would suffice, i.e we would be process records before the buffer of 10,1000 reaches. Inline with the principle "Something can go wrong it will go wrong!" it went wrong. We started loosing data. Subsesquently we decided to go with a flat file based caching, wherein the data was dumped into the flat file and would be loaded into the database using "load data infile". This was many times faster than an bulk insert via database driver. you might also want to checkout some possible optimizations with load data infile. This fixed our problem of increasing buffer size of the raw records.

The second problem we faced was the increase of cewolf(s) in memory caching mechanism. By default it used "TransientSessionStorage" which caches the image objects in memory, there seemed to be some problem in cleaning up the objects, even after the rerferences were lost! So we wrote a small "FileStorage" implementation which store the image objects in the local file. And would be served as and when the request comes in. Moreover, we also implmentated a cleanup mechanism to cleanup stale images( images older than 10mins).

Another interesting aspect we found here was that the Garbage collector had lowest priority so the objects created for each records , were hardly cleaned up. Here is a little math to explain the magnitude of the problem. Whenever we receive a log record we created ~20 objects(hashmap,tokenized strings etc) so at the rate of 500/sec for 1 second, the number of objects was 10,000(20*500*1). Due to the heavy processing Garbage collector never had a chance to cleanup the objects. So all we had to do was a minor tweak, we just assigned "null" to the object references. Voila! the garbage collector was never tortured I guess ;-)

Streamlining processing rate

The processing rate was at a meagre 40/sec that means that we could hardly withstand even a small outburst of log records! The memory control gave us some solace,but the actual problem was with the application of the alert filters over the records. We had around 20 properties for each record, we used to search for all the properties. We changed the implementation to match for those properties we had criteria for! Moreover, we also had a memory leak in the alert filter processing. We maintained a queue which grew forever. So we had to maintain a flat file object dumping to avoid re-parsing of records to form objects! Moreover, we used to do the act of searching for a match for each of the property even when we had no alert criteria configured.

What data loss uh-uh?

Once we fixed the memory issues in receiving data i.e dumping into flat file, we never lost data! In addition to that we had to remove a couple of unwanted indexes in the raw table to avoid the overhead while dumping data. We hadd indexes for columns which could have a maximum of 3 possible values. Which actually made the insert slower and was not useful.

Tuning SQL Queries

Your queries are your keys to performance. Once you start nailing the issues, you will see that you might even have to de-normalize the tables. We did it! Here is some of the key learnings:

* Use "Analyze table" to identify how the mysql query works. This will give you insight about why the query is slow, i.e whether it is using the correct indexes, whether it is using a table level scan etc.

* Never delete rows when you deal with huge data in the order of 50,000 records in a single table. Always try to do a "drop table" as much as possible. If it is not possible, redesign your schema, that is your only way out!

* Avoid unwanted join(s), don't be afraid to de-normalize (i.e duplicate the column values) Avoid join(s) as much as possible, they tend to pull your query down. One hidden advantage is the fact that they impose simplicity in your queries.

* If you are dealing with bulk data, always use "load data infile" there are two options here, local and remote. Use local if the mysql and the application are in the same machine otherwise use remote.

* Try to split your complex queries into two or three simpler queries. The advantages in this approach are that the mysql resource is not hogged up for the entire process. Tend to use temporary tables. Instead of using a single query which spans across 5-6 tables.

* When you deal with huge amount of data, i.e you want to proces say 50,000 records or more in a single query try using limit to batch process the records. This will help you scale the system to new heights

* Always use smaller transaction(s) instead of large ones i.e spanning across "n" tables. This locks up the mysql resources, which might cause slowness of the system even for simple queries

* Use join(s) on columns with indexes or foreign keys

* Ensure that the the queries from the user interface have criteria or limit.

* Also ensure that the criteria column is indexed

* Do not have the numeric value in sql criteria within quotes, because mysql does a type cast

* use temporary tables as much as possible, and drop it...

* Insert of select/delete is a double table lock... be aware...

* Take care that you do not pain the mysql database with the frequency of your updates to the database. We had a typical case we used to dump to the database after every 300 records. So when we started testing for 500/sec we started seeing that the mysql was literally dragging us down. That is when we realized that the typicall at the rate of 500/sec there is an "load data infile" request every second to the mysql database. So we had to change to dump the records after 3 minutes rather than 300 records.

Tuning database schema

When you deal with huge amount of data, always ensure that you partition your data. That is your road to scalability. A single table with say 10 lakhs can never scale. When you intend to execute queries for reports. Always have two levels of tables, raw tables one for the actual data and another set for the report tables( the tables which the user interfaces query on!) Always ensure that the data on your report tables never grows beyond a limit. Incase you are planning to use Oracle, you can try out the partitioning based on criteria. But unfortunately mysql does not support that. So we will have to do that. Maintain a meta table in which you have the header information i.e which table to look for, for a set of given criteria normally time.

* We had to walk through our database schema and we added to add some indexes, delete some and even duplicated column(s) to remove costly join(s).

* Going forward we realized that having the raw tables as InnoDB was actually a overhead to the system, so we changed it to MyISAM

* We also went to the extent of reducing the number of rows in static tables involved in joins

* NULL in database tables seems to cause some performance hit, so avoid them

* Don't have indexes for columns which has allowed values of 2-3

* Cross check the need for each index in your table, they are costly. If the tables are of InnoDB then double check their need. Because InnoDB tables seem to take around 10-15 times the size of the MyISAM tables.

* Use MyISAM whenever there is a majority of , either one of (select or insert) queries. If the insert and select are going to be more then it is better to have it as an InnoDB

Mysql helps us forge ahead!

Tune your mysql server ONLY after you fine tune your queries/schemas and your code. Only then you can see a perceivable improvement in performance. Here are some of the parameters that comes in handy:

* Use the buffer pool size which will enable your queries to execute faster --innodb_buffer_pool_size=64M for InnoDB and use --key-bufer-size=32M for MyISAM

* Even simple queries started taking more time than expected. We were actually puzzled! We realized that mysql seems to load the index of any table it starts inserting on. So what typically happened was, any simple query to a table with 5-10 rows took around 1-2 secs. On further analysis we found that just before the simple query , "load data infile" happened. This disappeared when we changed the raw tables to MyISAM type, because the buffer size for innodb and MyISAM are two different configurations.

for more configurable parameters see here.

Tip: start your mysql to start with the following option --log-error this will enable error logging

Faster...faster Web Client

The user interface is the key to any product, especially the perceived speed of the page is more important! Here is a list of solutions and learnings that might come in handy:

* If your data is not going to change for say 3-5 minutes, it is better to cache your client side pages

* Tend to use Iframe(s)for inner graphs etc. they give a perceived fastness to your pages. Better still use the javascript based content loading mechanism. This is something you might want to do when you have say 3+ graphs in the same page.

* Internet explorer displays the whole page only when all the contents are received from the server. So it is advisable to use iframes or javascript for content loading.

* Never use multiple/duplicate entries of the CSS file in the html page. Internet explorer tends to load each CSS file as a separate entry and applies on the complete page!

BottomlineYour queries and schema make the system slower! Fix them first and then blame the database!

See Also

* High Performance Mysql

* Query Performance

* Explain Query

* Optimizing Queries

* InnoDB Tuning

* Tuning Mysql

Categories: Firewall Analyzer | Performance TipsThis page was last modified 18:00, 31 August 2005.

-Ramesh-

RELATED ARTICLES

Microsoft Great Plains eCommerce: overview for developer
Microsoft Business Solutions Great Plains was designed back in the earlier 1990th as first graphical ERP/accounting system for mid-size businesses. The architects of Great Plains Dexterity ? this is the internal mid-shell, all Great Plains was written on, designed it to be easily transferable between graphical operating systems (MAC, Windows, Solaris ? potentially) and database platforms ? initially Great Plains was available on Ctree (both Mac and PC) and Btrieve, a bit later high end version Dynamics C/S+ was available on Microsoft SQL Server 6.5. But the idea was to catch or switch winning/losing database platform ? nobody could predict if MS SQL Server, Oracle or DB 2 become a dominant DB platform, like Windows among OS. All these trade-ins for being potentially cross-platform application make the life of nowadays eCommerce developer difficult.

Microsoft CRM Custom Design & Development: SDK, C#, SQL, Exchange, Integration, Crystal Reports
Microsoft CRM is new player on the CRM software market.� The whole conception behind CRM seems to be different.� In case of traditional CRM software (Siebel, Oracle) - the application was designed with platform independence in mind.� Microsoft CRM is dedicated to Microsoft technology and so deploys all the Microsoft tools: Windows Active Directory, Microsoft Exchange 2003/2000, SQL Server, Crystal Reports Enterprise, Biztalk server, Microsoft Outlook, Internet Explorer, Microsoft Great Plains as backend, etc.If you are software developer, database administrator or web designer who is asked: how do we customize Microsoft CRM ? we are giving you directions in this article.Microsoft CRM SDK ? this is software development kit with C# and partly VB.net code samples ? it is supported by Microsoft Business Solutions technical support.� It is based on web service calls, if you are C# .NET developer ? you are excellently positioned to do this type of customizations.� This is the preferred modification scenario and this should be easily upgradeable customization.� VB.Net examples will be available soon. Legacy SQL Data integration.� This is also easy and safe.� If you have SQL database, sitting on the same or linked SQL Server ? you can create ASPX .Net application and simply integrate it into CRM.� You can place it on the navigation bar or menu in isv.config ? please refer to MS CRM SDK Legacy ASP integration ? this is somewhat more sophisticated.� You have to deploy HTTP handler to be a middle party between CRM which is .Net based and ASP which is legacy IIS.� The trick is ? you have to have INI file with security settings to penetrate into MS CRM with proper credentials, calling web service. Microsoft Exchange Programming.� Microsoft CRM has Exchange connector ? which moves CRM incoming email to MS if it has GUID in its subject.� You can alter this logic (for instance - move email to CRM if it doesn't have GUID but it is from the sender who is contact or account in MS CRM).� Refer to MS Exchange SDK onsyncsave event handling.� Then simply apply some MS CRM SDK programming - you need some COM+ objects creation and VB programming experience. Direct SQL touch ? in #4 above I described you the scenario with MS Exchange handlers ? this would be ideal world if MS CRM SDK does the job.� But ? in real world this is not always true ? you have to do direct flags correction in CRM database (like making Activity closed, moving email attachments/octet streams, etc).� This is not supported by MBS technical support ? but you can rescue to this technique if you have to get job done. MS CRM Customization tool ?� this is rather end-user tool and we don't describe it here ? read the manual.� We've described above the options to use when this tool doesn't do the job Crystal Reports - feel free to create Crystal report - tables and views structure is self explanatory.� Try to avoid the temptation to create your own SQL view or stored procedure in MS CRM database, instead - create custom database and place your view and stored proc in it.�Happy modifying! If you want us to do the job - give us a call 1-866-528-0577! help@albaspectrum.com

Accessing XML Using Java Technologies
The most important benefit of XML is its simplicity. Though it is simple it is powerful enough to express complex data structures. Java is one of most important programming languages that is used for creating your web pages.

Kick-Ass Performance For Your PC? Its Easy
Is your PC is slow and wimpy? Then you need some PC steroids - software applications that will cleanse the registry, optimize RAM and make that puppy fly. Here is what you should start with.

Information Products: A Business Owners Best Friend
We live in a post-industrial age where information is the coin of the realm. Knowledge is the most valuable asset that a business owns. For most businesses, that knowledge exists primarily in the heads of the people who work there. For entrepreneurs and sole practitioners, what's in their head usually is the business. That's both limiting and dangerous.

SQL scripts for Project Accounting: Microsoft Great Plains series ? overview for developer
Microsoft Business Solutions Great Plains has Project Accounting module where you can budget the project, assign it to customer contract and then log expenses, timesheets, inventory spending, and finally bill the customer against the project or contract. Microsoft Great Plains Project Accounting does excellent job, but there are cases when you need developer or MS SQL DB Admin touches.

Examining the Substance of Studio MX
To all web designers out there, this article is for you! I guess you already heard about Studio MX (I think so!) ? the ideal bundle for professional web designers, bringing together Dreamweaver MX for page design, Flash MX for animation and interactivity, and Fireworks MX for editing and optimizing graphics. With all these components, it certainly provides professional functionality for every aspect of web development.

ERP Remote Support: Microsoft Great Plains Analysis ? Pluses & Minuses
Former Great Plains Software Dynamics/eEnterprise and currently Microsoft Business Solutions Great Plains serves midsize and corporate clients as ERP system in the following countries and regions: USA, Canada, Mexico and Latin America, Brazil (where MBS actually promotes Navision and has GP for multinational corporations), Saudi Arabia, OAE, Egypt and the rest of Middle East, South Africa, Nigeria and the whole African continent, U.K. and Ireland, partially France and Belgium, Poland, Pakistan, South East Asia, Philippines and Pacific. The fact that Ernst & Young consulting subdivision was specializing in supporting Great Plains eEnterprise for clients in remote locations, such as Bermuda, having small offices over there and later on E&Y had to stop it ? due to the fact that Microsoft purchased GPS ? left large number of so-called orphan clients, who are still without support and using Dynamics, eEnterprise or even old DOS-based Great Plains Accounting. As the result ? there is very lucrative market niche for Microsoft Great Plains remote support. In this small article we'll give you pluses and minuses of GP remote support ? so to say optimistic and pessimistic points of view.

Partitioning, Formatting and Reinstalling in Windows 98
Formatting and reinstalling windows 98 is very easy if you have the right know-how.

Not All Project Management Software is Created Equal
The purpose of Project Management Software is to provide an environment in which a group of people can work together on joint projects. Most projects involve the development and implementation of new ideas, and these ideas have to be presented, evaluated, and revised.

Microsoft CRM Lotus Notes Domino Connector FAQ
Microsoft Business Solutions CRM and IBM Lotus Notes Domino, being two groupware products from competing software development leaders, however could coexist within one organization computer network and even work together in collaboration. There maybe multiple reason why corporation would use both products: licensing, commitment to IBM Lotus Notes as legacy product, risk balancing ? staking on both Microsoft and Java/EJB/J2EE platforms, deploying Lotus advanced workflow to automate document management, etc. The need to synchronize MS CRM and Lotus Notes Domino databases is dictated by the ERP market and the connector is available. In this small article we'll answer on the frequently asked questions.

Microsoft Great Plains: exchange & brokerage ? implementation notes
If you company is small or mid-size special products or materials exchange broker, you probably have custom in-house made exchange application. Nowadays exchange is done over the internet, so you might have advanced web-based exchange application. Microsoft Business Solutions Great Plains could play the role of the backend: accounting, sales and purchasing ordering, backordering, allocation, collection. In this case we expect tight integration between your exchange application and Great Plains. In this small article we consider industry / market niche specifics and the ways to realize these requirements:

PHP On-The-Fly!
Introduction

Dashboard Widgets for Windows
For a windows user like me, just can watch with envy the new eye candy features in Mac OS X and wondering how I can have the same feature on my belove windows.

MySQL for Beginners ? How to Create a MySQL Database
Whether you are an experienced web programmer or a complete novice attempting to provide data interactivity with your web site, MyQSL is an easy to use and free database solution that can allow you to store and configure data to be displayed on your web site.

Where to Find Free Fleet Maintenance Software
Costs of fleet maintenance software can vary widely. It is generally expected that the fleet manager will look at the needs of the company to determine what software package is best suited for their particular needs. Depending upon the size of the company, number of vehicles to be maintained and services of the company, you may be able to locate free fleet management software for your needs.

The Software 2005 Conference - A Review
The Software 2005 conference is now a wrap. This conference, presented by M.R. Rangaswami and The Sandhill Group, is now an annual event and attendance increased 35% this year over 2004. It is an ideal opportunity for those in the enterprise software industry to see what's new and what's coming, as well as to catch up with old colleagues and make new connections. It is also a perfect forum for startups to gain exposure as well as solicit funding and key partnerships.

When is a Software Engineer Not a Software Engineer?
The title of "software engineer" has got to be among the most highly abused in the corporate high-tech world. It's also one of the most popular.

Microsoft Great Plains Integration Manager ? Working With Text File
Microsoft Business Solutions main middle market ERP application - Microsoft Great Plains has multiple integration options: MS SQL Scripting (stored procedures and views), ADO.Net programming, Microsoft SQL Server DTS packages. You certainly can deploy such SDK tools as eConnect. However here we would like to show you how to program the simplest user friendly tool: Microsoft Great Plains Integration Manager. Multiple times in our consulting practice we saw the need to integrate General Ledger transactions from one text file and here we give you this and even more complex case, when credit and debit amounts are present on the same line with their own account numbers. Let's assume that we have tab delimited text file, GLSOURCE.txt. Here is how the line looks:

Dont Choose Adobe When Working With PDF
While Adobe is the most known maker of PDF tools, your business will be much better off, if you select third party tools to work with PDF documents.

home | site map | www.1001topwords.com