Tuesday, June 24, 2008

Ruby -- How to turn a csv file into a list of has

There are convenience libraries for this type of thing, but I thought I would plunk down my solution.
Say you have a csv string and you need to just turn it into a proper Ruby data structure. A list of hashes would be nice since this is the same type of data structure you typically work with when you query a DBMS. Here is a function to do this.

def turn_csv_into_list_of_hashes(string)
  returned_list = []
  #the first line should be header
  rows = string.split("\n")
  header = rows.shift.split(',')
  rows.each do |row|
    row_hash = {}
    row.split(",").each_with_index { |item, i| row_hash[header[i]] = item }
    returned_list << row_hash
  return returned_list

Thursday, June 12, 2008

Erlang movie and other cultural turning points

Not sure if you've seen it. The Erlang movie shatters all notions of what human beings can accomplish with a bit of film, some astounding acting, and a script that descended right out of heaven. There are two time periods in human evolution, pre and post Erlang movie.

Tuesday, June 3, 2008

How environment variables really work (in POSIX systems)

Environment variables are strange animals, straddling the system (not kernel) and the application world. I will use the analogy from the movie "The Matrix". If you have not seen "The Matrix", stop reading this and run to a video procurement establishment and get it. In the Matrix, Morpheus shows Neo "the construct". This was a blank space or "environment" from which to load anything they needed. From huge racks of guns to grenades to cool leather jackets and sunglasses.
From that environment, they could have all the tools they needed to take on the agents. The supplies would always be there in the no matter where they went inside the matrix. This an excellent example of what environment variables really are. They are containers or "racks" that hold stuff or "information" so that actors (like the real actors) can use when inside the environment (matrix).
In POSIX systems like Linux, there are some rules that we must know about to know where and when environment variables are loaded.
Environment variables cannot just hang out without "being hosted" by another application. In Linux, there is ALWAYS a hierarchy of processes (applications, if you like). The Linux kernel is the mac daddy "process", but it is not a process per se because it runs entirely in a space that humans cannot access. I brought up the kernel as process because it really does run on the computer and launches the real first usable process called: init. Init is always process number 1 and ends up spawning all the other processes in the system such as X11, sshd, everything that runs in userspace. If you are a knowledgeable in Linux topics, you may have noticed some holes in the above, but for beginners, this explanation will give a decent primer.
Generally, applications in the Linux world know about the concept of environment variables and can make use of them, but the init process is different. It does not know about environment variables (because essentially there is no environment when it starts, except from the kernel). On the other hand init does know about arguments that are sent in before the kernel is booted. If you launch the kernel such as: vmlinuz foo=bar, the kernel will boot, examine the key foo, discover foo means nothing to the kernel and send foo=bar to the init process.
So init is special because it has no traditional environment. When init "spawns" its processes as dictated by init scripts or configuration in inittab, those processes will launch what is known as an interactive or non-interactive shell. This is where, I think, people get confused.

Now, the below information is sort of BASH concentric. This is the primary shell these days for Linux/Unix type systems. Lots and lots of people use other shells too, but if you are reading this, you probably are not interested in learning other shells other than BASH. The concepts for the other shells are surprisingly similar, but read slightly different files.

There are only two types of shells that you can have in a Unix-type system: interactive and non-interactive. The difference between interactive and non-interactive shells is that interactive shells involve a human or a process that needs the same things as a human would need when working inside a bash shell, for example. In other words, interactive shells require that something interact with the shell directly and not simply fork off and do its own thing.

Interactive shells:

To further confuse the issue, there are two types of interactive shells, login and non-login. Now pay attention, this is the good part. The files that interactive login shells read and interactive non-login shells read are DIFFERENT. This is why you should care to read this section. Let's look at the files that login shells read:


These files are read in that order. Other files can be read, but they will be referenced in the above files. Now long timers might exclaim:
"But what about if you run bash with the command 'sh' and use --norc option."
To this I say fooey. We aren't remaking the bash man page here.

For non-login shells (we are still interactive here) these files are read:


You might be asking when an interactive non-login shell might be used. Well, that is an excellent question. Non-login shells are used when you already are logged into a shell or even X and you need another interactive shell from which to launch scripts or issue commands. The system assumes that you do not need to reload /etc/profile or the above list because it loaded when you logged in! Now you might ask, so what if I changed something in my /etc/profile, but now I need the new variable and it is not there? This is because when you launched the shell, it was a interactive *non-login* shell. So now you are thinking "Christ, why so complicated?" No answer for that one, but you are singing to the choir, brother/sister. There are two things you can do for this.
1. The application you are starting may have an arg that allows a login shell startup....xterm does by launching, xterm -ls
2. Just throw your environment variables in an rc file such as ~/.bashrc and call it a day.
For number 2, I am sure shell purists are just ready to shoot me, but you know what? The whole thing is overly complex and kind of silly, so my theory is understand how the system works and then make it work the way you think it should and tell people why your system is better. If your way is not better, than people will fill you in as too why.

Non-interactive shells

A non-interactive shell would be for system "users", such as a web server, mail server, or a cron daemon. We like to put these users of the system into a shell environment where they can have just as much access to the system as they need to do their jobs and no more. They do not get to read /etc/profile because:
1. There is stuff in there about a human's environment (where the games are, possibly) and is of no consequence to them.
2. We do not want them knowing too much. If a cracker were to compromise that account. We do not want too much info there in that construct.
When you think of non-interactive shells, think of launching a shell script or running a PERL/Ruby/Python script from the shell. Also, you can think of your rc scripts that run when the system is booted. Those scripts still need an environment to work from, but should not be given the same environment as your bash prompt. You can also have non-interactive login and non-interactive non-login scripts. Wow this is confusing. The difference in the non-interactive context is that the login version simply looks to read:


and the non-login version looks for a $BASH_ENV environment variable and attempts to source this file. The $BASH_ENV variable must have the full path because there is no $PATH in this environment, yet.

Bottom line for this post is that environment variables can be absolutely maddening. If you understand the constraints of the types of environments that you can have (interactive, non-interactive), then this goes a ways in figuring out where you variables will be loaded from and when. To help yourself keep all of this straight, I recommend one of two things.

1. make a cheat sheet for yourself.
2. memorize this info (or at least some of it) by taking an hour and experimenting with your shell.

The way shells and environments were laid out is very difficult to keep straight, but if you do not have the gumption to redesign it all, then I hope this post helps.

Sunday, June 1, 2008

Rails Conf -- Impressions

Really smart people, but unfortunately most were reinventing the wheel. Some knew they were creating things already available, but didn't care; others toiled needlessly. Obie's talk was clearly the best by pointing out that using your abstractions properly is clearly something to be valued.

Ruby VM's are not interesting. At least not to me. If you want a VM like the JVM, just use the JVM, you will be much happier in the end and maybe get to enjoy life more. Creating a faster/multi-threaded VM is a good learning experience, but does not mean much even in the short term.
Ruby does nothing for software safety than any other imperative computer codes. Although Ruby "makes programmers happy", this does not mean a hill of beans in improving our customers lives. If happy programmer == well-tested code that meets the specs, then great. But as Obie Fernandez points out, this is not frequently the case. Living the 80/20 rule through a world full of broken code stinks. I really like some of the research going on to allow Ruby to make applications that more concurrent, fault tolerant, and still be, well, Ruby. I hope some of these things make it into Rails Conf next year.