Thursday, April 22, 2010

Programming Literacy In Physics


















Lately, I am being exposed more and more to the lack of computer programming literacy in the field of physics. Today, most physicists will need to be able to program and will be programming at least 1/4 if not 1/3 of the time. This is because in today's day and age, there is so much data being acquired that you have no choice but you use a computer to process it all. And yet, there are far too many physicists who don't know anything about programming. They know enough of the basics to be able to write their algorithms, but they are nowhere near the quality that they should be.

This rant was triggered by a conversation I had with a colleague yesterday. I am currently developing software to sort experimental data coming from our data acquisition system in real time. I was telling him that I had originally designed the code to be implemented using threads and so alot of my design choices had been due to the fact that I was trying to avoid race conditions wherever possible (it turns out that the algorithm is fast enough that we don't need to thread it but that's beside the point). At this point in the conversation, he indicated to me that he didn't know what a race condition was. I was a little disappointed but I tried to explain it to him using an example.

I started off by saying, "Suppose you have a binary tree..." At that point he chuckled a little bit and I could see the confused look in his eyes; he didn't know what a binary tree was. I then decided to explain to him how a binary tree works. I described to him how the complexity of searching a binary tree is O(log n) as opposed to a linear search which is O(n). Again, he chuckled because he had no idea what I meant by complexity. At this point I was almost in shock. How can a person who has to deal with C code all day have no idea what computational complexity is. However, this is no an isolated occurrence. A few days ago, I had to show my boss that there is a pre-implemented quick sort algorithm in the C standard library. If I hadn't shown him that it was there, he was most likely going to implement a bubble sort.

Unfortunately, I don't see this problem going away any time in the near future. The problem stems from the fact that most physics programs include only very basic computer programming classes. At the University of Windsor, you have 2 choices when it comes to computer courses. The hard option is to take Introduction to Algorithms I & II. The easy option is to take C for Beginners and then Introduction to the Internet.

First of all, there should be no easy option that lets you off taking a course on how to write HTML and a C class that teaches you nothing except perhaps what a for loop is. In a field where programming is a mandatory part of the job, that sort of knowledge is insufficient. Second of all, Introduction to Algorithms I & II are woefully lacking in the number of algorithms that they teach. I think the most complex algorithm we wrote was a bubble sort. This is not a problem that I can attribute to the physics department, it is the fault of the computer science department. However, in the undergraduate physics curriculum, there is no opportunity to do any sort of scientific computing at any point. Thus, unless you are self-motivated enough to do programming yourself, you get no experience writing any sort of code except in those two VERY basic programming courses.

From what I have heard from other physics students at other universities, this problem seems to be the norm more than the exception (even though it comes in may forms). Some students will get a scientific computing class, but they have to write everything in FORTRAN. This stems from the fact that alot of physicists did most of their computer 20 years ago in FORTRAN and never bothered to learn a new programming language. Just to give perspective, FORTRAN was originally run on punch cards. Others are writing all of their code in Maple (a computer algebra system for solving equations analytically). This is fine if you are teaching some mathematical analysis however you will never learn about performance concerns and efficiency since these issues are handled by the Maple backend.

If it were up to me, I would make it mandatory that all physics students learn about basic complexity analysis and simple sorting and searching algorithms. If you teach them, I can guarantee that half of the scientific computing code out there would probably reduce their run-time by about 25% if not 33%. This wouldn't be a hard change, simply make them take a computer science course teaches these two concepts. Also, if you are going to teach a scientific computer course, do NOT teach it in FORTRAN. I would say the best language to teach that sort of course in would be Python, either that or C/C++.

Either way, I think this is one of the biggest problem areas in all of physics today. If you don't know even the most basic of algorithms, you are doomed to implement inefficient, limited solutions that will leave you re-inventing the wheel every step of the way. Also, this type of code becomes near impossible to fix because it is typically nigh unreadable to even the program's author.

No comments:

Post a Comment