1.0 Warm up.
Firstly my mentor shared the following question with me:
Why the image gets reversed in mirror horizontally, but vertically.
I asked many people about this question, but seldom gave me the accurate answer in 3 mins. Some interesting answers here:
1. People’s eye is horizontal not vertical. (Also cited in some “interview bible”)
2. If we change the mirror by 90 degree, then the result changes
3. We live in the Northern Hemisphere
From the tech part, the knowledge involved here is only mirror reflection, far simple than Windows memory management. This case just shows, besides the knowledge itself, problem solving needs the way of clear thinking.
1.1 Hopeless performance. The ADO.NET 2.0 is slower than ADO 1.1!
Problem Description:
Based on the following article, the customer decides to upgrade to CLR2 for performance:
DataSet and DataTable in ADO.NET 2.0
http://msdn.microsoft.com/msdnmag/issues/05/11/DataPoints/default.aspx
But based the customer’s test, the performance get worse with CLR2.
Look at the code, quite simple:
OracleConnection conn = new OracleConnection();
conn.ConnectionString = "...";
conn.Open();
OracleCommand cmd = new OracleCommand();
cmd.Connection = conn;
OracleDataAdapter dap = new SOracleDataAdapter("select * from mytesttable", conn);
DataTable dt = new DataTable();
DateTime start = System.DateTime.Now;
dap.Fill(dt);
TimeSpan span = DateTime.Now - start;
conn.Close();
Console.WriteLine(span.ToString());
Console.WriteLine("The Columns.Count is" +dt.Columns.Count.ToString());
Console.WriteLine("The Rows.Count is"+dt.Rows.Count.ToString());
The data used for test is quite simple either. 250000 rows, 4 fields. By checking the result of span.ToString, ADO.NET 2 is 1 time slower than ADO1.1. dap.Fill method takes 6 seconds in CLR2, comparing with 3 seconds in CLR1.
Hopeless
How to handle it?
When I finished the test locally, my feeing is hopeless only. Look at what I could do:
1. The backend datatable is quite simple. 4 fields are int, no index, no primiary key, no foreigen key. In other words, the problem does not relate to data schema, purely client side problem.
2. The code is very simple. There is nothing we could further fine tune.
3. The test machines gets both CLR1 and CLR2 installed. The test and comparing are done in the same machine. The hardware and software environment is the same. The only difference is the CLR version, which is the customer’s concern.
Hopeless, no way to try. I have to confess that “No idea. ADO.NET 2 dev did a bad thing! Sorry”. And also say “BTW, the MSDN paper is based on CLR2 beta2, not accurate.”
Change the Role
The good thing is that I did not give up my trying. After careful research, I found:
1. The sample code in MSDN does do a better than CLR 1.1. However, the schema of the data is a bit complex, while the customers is simple
2. The performance is worse in above sample. However, in other cases, the ADO.NET 2.0 does a better job to gain 10 times better result in the same customer’s project.
Meanwhile, when discussing with my colleague, they do not care about the percentage of how worse it is. They concern about why the customer wants to operate on 250000 rows of data at the same time. After careful discussion, got the following points:
1. The performance is 3 seconds worse with 250000 rows, 8 microseconds per row. From the tech part, db reading operation needs to send request to DB client engine firstly. The DB client engine pushes the request to DB server, and then fill the result back to client application. In the whole chain, any little change is likely to be enlarged, based on butterfly effect. In other words, if we look at percentage, the performance is indeed 100% worse. However, if we look at the delay of seconds, and compare with the data amount, we get further understanding on this problem.
2. The customer’s code reads 250000 rows at the same time. What kind of situation that the dev wants to write code like this? Such kind of code is likely to run in performance sensitive situation? Let’s cacuate, 250000 rows* 4 fileds*20 bytes each=19MB data. If each request involves 3 seconds delay, plus 20MB memory cost, and heavy network operations, the design is bad. In normal condition, such kind of code should be used to initialize some global data, which may be put in cache. And this kind of code should only run once per application’s life. In other words, the performance hit in real situation is really visible?
3. We did not evaluate all the data operations. The customer only tests the performance of fetching data. We do not consider how to use the data later. If ADO.NET 2 sacrifice the performance on loading, to benefit the performance in other places like iteration or deleting, we gain better overall performance.
Based on above analysis, I decide to ask the customer what the real situation it is. Meanwhile, I used the following method to seek the root cause:
Troubleshooting:
First I use reflector to check the implementation on DataAdapter.Fill. (I am not CPR and I cannot see the source code)
Reflector for .NET
http://www.aisto.com/roeder/dotnet/
I found the DataAdapter.Fill method can be divided in the following two parts:
1. Use DataReader to read data
2. Construct DataTable and fill the data in.
It means the DataAdapter.Fill just combine DataReader.Read and DataTable.Insert. So I can create my own app to use the two functions to replace DataAdapter.Fill for test:
static void TestReader()
{
&