The Sleepless Geek: August 2010

Sunday, August 22, 2010

Seeing where Ruby methods come from

When it comes to reflection - asking an object or class to tell you about itself - Ruby makes a lot of sense. If you want to know what methods an object has, for example, you just ask it:

someobject.methods

.

Any array has lots of handy, built-in methods, which it gets from the Kernel module (mixed in by every object), the Enumerable module, and Array itself. (For example, somearray.shuffle returns a randomized version of the array.)

To see what comes from where, we'll make an array, and ask it: "array, please get me an alphabetized list of your methods, go through that list, and print a statement saying where each one was defined." (Thanks to several respondents on Stackoverflow for showing me this.)

The resulting list presents all kinds of opportunities for exploration. (For example, "tainted?" is very interesting.)

irb(main):001:0> a = Array.new
=> []
irb(main):002:0> a.methods.sort.collect {|m| puts "#{m} defined by #{a.method(m).owner}"}
& defined by Array
* defined by Array
+ defined by Array
- defined by Array
<< defined by Array
<=> defined by Array
== defined by Array
=== defined by Kernel
=~ defined by Kernel
[] defined by Array
[]= defined by Array
__id__ defined by Kernel
__send__ defined by Kernel
all? defined by Enumerable
any? defined by Enumerable
assoc defined by Array
at defined by Array
choice defined by Array
class defined by Kernel
clear defined by Array
clone defined by Kernel
collect defined by Array
collect! defined by Array
combination defined by Array
compact defined by Array
compact! defined by Array
concat defined by Array
count defined by Array
cycle defined by Array
delete defined by Array
delete_at defined by Array
delete_if defined by Array
detect defined by Enumerable
display defined by Kernel
drop defined by Array
drop_while defined by Array
dup defined by Kernel
each defined by Array
each_cons defined by Enumerable
each_index defined by Array
each_slice defined by Enumerable
each_with_index defined by Enumerable
empty? defined by Array
entries defined by Enumerable
enum_cons defined by Enumerable
enum_for defined by Kernel
enum_slice defined by Enumerable
enum_with_index defined by Enumerable
eql? defined by Array
equal? defined by Kernel
extend defined by Kernel
fetch defined by Array
fill defined by Array
find defined by Enumerable
find_all defined by Enumerable
find_index defined by Array
first defined by Array
flatten defined by Array
flatten! defined by Array
freeze defined by Kernel
frozen? defined by Array
grep defined by Enumerable
group_by defined by Enumerable
hash defined by Array
id defined by Kernel
include? defined by Array
index defined by Array
indexes defined by Array
indices defined by Array
inject defined by Enumerable
insert defined by Array
inspect defined by Array
instance_eval defined by Kernel
instance_exec defined by Kernel
instance_of? defined by Kernel
instance_variable_defined? defined by Kernel
instance_variable_get defined by Kernel
instance_variable_set defined by Kernel
instance_variables defined by Kernel
is_a? defined by Kernel
join defined by Array
kind_of? defined by Kernel
last defined by Array
length defined by Array
map defined by Array
map! defined by Array
max defined by Enumerable
max_by defined by Enumerable
member? defined by Enumerable
method defined by Kernel
methods defined by Kernel
min defined by Enumerable
min_by defined by Enumerable
minmax defined by Enumerable
minmax_by defined by Enumerable
nil? defined by Kernel
nitems defined by Array
none? defined by Enumerable
object_id defined by Kernel
one? defined by Enumerable
pack defined by Array
partition defined by Enumerable
permutation defined by Array
pop defined by Array
private_methods defined by Kernel
product defined by Array
protected_methods defined by Kernel
public_methods defined by Kernel
push defined by Array
rassoc defined by Array
reduce defined by Enumerable
reject defined by Array
reject! defined by Array
replace defined by Array
respond_to? defined by Kernel
reverse defined by Array
reverse! defined by Array
reverse_each defined by Array
rindex defined by Array
select defined by Array
send defined by Kernel
shift defined by Array
shuffle defined by Array
shuffle! defined by Array
singleton_methods defined by Kernel
size defined by Array
slice defined by Array
slice! defined by Array
sort defined by Array
sort! defined by Array
sort_by defined by Enumerable
taint defined by Kernel
tainted? defined by Kernel
take defined by Array
take_while defined by Array
tap defined by Kernel
to_a defined by Array
to_ary defined by Array
to_enum defined by Kernel
to_s defined by Array
transpose defined by Array
type defined by Kernel
uniq defined by Array
uniq! defined by Array
unshift defined by Array
untaint defined by Kernel
values_at defined by Array
zip defined by Array
| defined by Array

Sunday, August 15, 2010

Visualizing our example setup

Some people reading my post on setting up Apache, PHP, MySQL, Filezilla Server and PHPMyAdmin have seemed confused about how the whole setup works, or what the pieces are for. I put together this diagram a while back for a coworker and thought it might be helpful to readers here, too. Click the image for a larger version, suitable for zooming or printing out.

The basic idea that I want to convey is that web traffic has two sides: a client and a server. All conversation between them passes over a network. When someone visits a web page, their client, the browser, says "hey, send me your web page, please!" And on the other end of the network, a server gets the request and responds with the web page.

What's a server?

Before we go further, let's clear up some confusion about terms. Specifically, what is a server?

People use the term "server" to mean different things. In some cases, we mean "the program that responds to requests." With that meaning in mind, we'd call Apache a web server, MySQL a database server, and Filezilla Server an FTP server. In other cases, we mean "the computer that the server program(s) run on." With that in mind, we might say "our server sits on a rack in the server room."

In some setups, MySQL might be running on one server machine and Apache on another. In complex setups, both will be running on multiple machines.

The tutorial was all about setting up the various pieces of server software. If you set them up on your own computer, then visit your site from that same computer, then you refer to the server you're connecting to as "localhost" - local in the sense that it's on the same machine. In that case, the top part (the server) and the bottom part (the client) are really the same computer, and the network request never travels outside the computer you're working on. But of course most web sites you visit are served from elsewhere, and the network request might travel around the world.

Now, to look at the components we set up. Let's start by looking at the right side of the diagram, the HTTP side.

HTTP

This is the conversation between a browser and the server program - in our case, Apache. HTTP is the protocol they use to communicate. This just means that each of them expects requests and responses to be formatted a certain way - with headers, a body, and various sub-components of each.

When a user requests a URL, it's up to Apache to decide what to send back. In basic cases, if the user asks for "http://servername/foo.html", Apache will look for foo.html, read it, and send back its contents. If you ask for "bar.html" and no such file exists, it will tell you so. But you could have Apache do anything you like. For example, it could send back the same message no matter what URL was requested. Or it could map URLs that match certain patterns to certain responses, without there being a one-to-one relationship between requests and files on the server. (Ruby on Rails has a complex "routes" system that does this.)

Now imagine that the browser has requested a page, and Apache sends it back. What exactly does it send back? Just the HTML (in an HTTP "envelope", of course). The browser then looks at the HTML and says, "oh, I see that this lists some images, and a CSS file, and a Javascript file." It's then up to the browser (and the user's settings) whether it will ask for those other resources or not.

If the browser requests, say, an image on the page, the server sends it back, too. As far as the server is concerned, each request is totally separate. You can imagine the conversation like this:

Browser: I'd like this URL, please.
Server: Oh, hi, stranger! OK, here's some HTML.
Browser: (Hmmm, there's an image listed.) Server, please send me the image at this URL.
Server: Oh, hi, stranger! OK, here's that image.

Notice that the server doesn't "remember" the browser's previous request. In fact, it might even be a different server responding to the second request. This shows that HTTP is "stateless" - meaning that each request is separate, and doesn't depend on any previous ones. (This gets more complicated when you have logins and cookies and sessions, etc, but at bottom, it's the way HTTP works.)

In our example setup, when you ask for an HTML file, Apache just finds it and sends back the contents "raw" - without changing them. The same thing is true for CSS and Javascript files. But PHP is different.

Adding PHP

In our setup, we configured Apache to treat requests for .php files differently. When it sees that type of request, it asks php.exe to process the request and tell it how to respond.

PHP.exe will look at the file and execute any of the PHP instructions, like including headers from another file, running functions, pulling information from classes or objects, etc.

If the script requires it to, PHP will talk with a database server (in our case, MySQL) to get some of the information it needs to complete the request. For example, maybe it has to ask MySQL for a list of products, then output image tags and product descriptions for a catalog.

When it has finished running the script, PHP.exe gives the complete web page output to Apache, which sends it back to the browser, which can then ask for the images if it pleases.

One more thing to notice: as far as the browser is concerned, PHP and MySQL don't exist. It asks for a URL, and it gets back HTML (or an image, or other resource). It doesn't know or care how that gets put together. When it asks for "http://somesite.com/foo.php", it doesn't "know" that "foo.php" is a PHP script (and in fact, depending on how the server is configured, it might not be). The browser could ask for "foo.jpg" and get back HTML and it wouldn't be surprised, as long as the HTTP headers in the server's response said "what I'm sending you back is HTML."

In our setup, "foo.php" gets run when the browser requests a URL and the URL path is the file path to that script. But this is just a convenience for us, which makes it easy to see how the process works. We could make it more complicated, and as long as the browser could make a request and get a response it understood, it wouldn't know or care.

FTP

So where does Filezilla Server come in? If your server is on localhost, it may not. The basic purpose for Filezilla is to let you move files between the client and server computers. If they're the same computer, you can just copy the files there directly. But if not, you'll have to do it over the network, which is what FTP (File Transfer Protocol) is for.

Looking at the left side of the diagram, you'll see that there's another client-server relationship. This time, Filezilla Server is the server, and an FTP client (which could be Filezilla Client) is the client. If Filezilla Server is configured to use the same folder of files as Apache, you'll be able to change the content of your web site using this FTP conversation. To add or update a script, you'll upload it via FTP. The next time Apache gets a request for it, it will find it, get PHP to process it and send the results to the browser.

Saturday, August 7, 2010

Quick Ruby Experiment

Here's an interesting Ruby experiment:

Class.ancestors # [Class, Module, Object, Kernel]

Module.ancestors # [Module, Object, Kernel]

Object.ancestors # [Object, Kernel]

Kernel.ancestors # uninitialized constant Kernel

Ruby is to Esperanto as PHP is to English

In 1887, L.L. Zamenhof published a book called "Unua Libro," describing a brand-new language that he hoped would change the world. A language anyone could learn to speak. One without political affiliation, without a messy history. And, I have to imagine he thought: finally, one that made sense.

English speakers know that our language is a mess - comedy routines have been built around it.

Here's Brian Regan on learning plurals in school.

So she asked this kid who knew everything. Irwin. “Irwin, what’s the plural for ox?”

“Ox. Oxen. The farmer used his oxen.”

“Brian?”

“What?”

“Brian, what’s the plural for box?”

“Boxen. I bought 2 boxen of doughnuts.”

“No, Brian, no. Let’s try another one. Irwin, what’s the plural for goose?”

“Geese. I saw a flock of geese.”

“Brian?”

[Exasperated laughing]“Wha-a-at?”

“What’s the plural for moose?”

“Moosen! I saw a flock of MOOSEN! There were many of ‘em. Many much moosen. Out in the woods…in the wood-es…in the woodsen. The meese want the food in the woodesen…food is the eatenesen…the meese want the food in the woodesenes…food in the woodesenes.”

We learn English rules, like "to make a past tense, add 'ed' to a verb: sailed, repeated, succeeded, constructed, cleaned." But many common verbs don't follow this. I ate, not eated; I ran, not runned; I went, not goed; I slept, not sleeped.

Other languages are the same. Spanish, for example, has tons of irregular verbs. Like many languages, it also has genders for all its nouns. Mountains are feminine. Trees are masculine. Why? Nobody knows. Mark Twain had hilarious complaints about German, which adds another gender called "neuter," among other complications.

Every noun has a gender, and there is no sense or system in the distribution; so the gender of each must be learned separately and by heart. There is no other way. To do this one has to have a memory like a memorandum-book. In German, a young lady has no sex, while a turnip has. Think what overwrought reverence that shows for the turnip, and what callous disrespect for the girl. See how it looks in print -- I translate this from a conversation in one of the best of the German Sunday-school books:

Gretchen: "Wilhelm, where is the turnip?"
Wilhelm: "She has gone to the kitchen."
Gretchen: "Where is the accomplished and beautiful English maiden?"
Wilhelm: "It has gone to the opera."

The reason for all this messiness is simple: nobody planned this. Languages are organic, evolving things. We forget words, make up new ones, pronounce things lazily, and the mistakes become normal. The rules change, and the exceptions pile up.

But not in Esperanto. Esperanto was designed to make sense.

Esperanto is much easier to learn than other languages because:

The different letters are always pronounced in the same way and every letter in a word is pronounced. Therefore there are no difficulties with spelling and pronunciation, as one knows that the penultimate syllable is always stressed.

The grammar is simple, logical and without exceptions. The grammatical exceptions are often what make it so difficult to learn a new language.

Most of the words in Esperanto are international and are found in languages around the world.

It is easy to make new words with prefixes or suffixes. Thus, if one learns one word, ten or more usually come as part of the package.

And while I'm sure it has its peculiarities, it's fundamentally different from English or Russian or Chinese or Swahili, which basically became what they are by chance.

PHP is like English

PHP is a very useful language for web development. But it's a bit haphazard, like English. Just look at some of its string functions:

count_chars() should have a counterpart called count_words(), right? But it doesn't. The counterpart is str_word_count() .
strip_tags() is named with words separated by underscores. stripslashes() has no underscores.
hebrev() will "convert logical Hebrew text to visual text." I don't know what that means, but really: a whole function for this, in the global namespace? Could something similar be done with Korean or Arabic or Portuguese or Tagalo? If so, would we create korev() and arabiv() and portugv()? Wouldn't it make more sense to have something like langv('languageName') and be done with it? For that matter, why include such a specific function in the base langauge, when 99.9% of users won't need it and those who do could use a library?

This haphazard design reflect's PHP's history: once called "Personal Home Page/Forms Interpreter," it was made with one vision and has been amended and revised and rewritten over and over. I get the impression that somebody said, "hey, I want to add a function for messing with Hebrew." And the PHP team said, "sure, knock yourself out. We'll put it in there."

Now don't get me wrong. I'm not smart enough to design a language myself. And these days, you can write solid, test-driven, object-oriented code in PHP. But fundamentally, it feels like a language that happened, not one that was made.

Ruby is like Esperanto

Ruby, on the other hand, was designed by Yukihiro "Matz" Matsumoto, a Japanese computer scientist who wanted to draw on the strengths of Perl and Python, and above all, to write a language for human beings. As Matz wrote in the foreword to Hal Fulton's The Ruby Way:

...Programming languages are ways to express human thought... Machines do not care whether programs are structured well; they just execute them bit by bit. Structured programming is not for machines, but for humans... So to design a human-oriented langauge, Ruby, I followed the Principle of Least Surprise. I consider that everything that surprises me less is good. As a result I feel a natural feeling, even a kind of joy, when programming in Ruby.

The language feels almost philosophical. It starts with principles: everything is an object. This has deep implications: classes are objects. True and false are objects. "Class" itself is an object, of the class "Class." Odd, yes, but not haphazard. Where PHP seems sloppy, Ruby seems mysterious, like real life. How can light be both a particle and a wave? How can "nil" be an object? We don't know, but we suspect there are satisfying reasons if we dig deep enough.

This logical consistency generally extends to the particulars of the language. Want to know the number of items in an array, or the number of characters in a string? .length is your method, in either case. (PHP employs count() and strlen(), respectively.)

If anything can be converted to a string, you can rely on .to_s to do it; to_a and to_i will likewise convert to an array or integer, if possible. There are some unexpected or duplicate methods: str.to_str is the same as str.to_s, but Array doesn't have .to_str. But on the whole, it seems easier to guess what a method will be called than in PHP.

Ruby, like Esperanto, is quirky. It considers 0 to be true, for example, unlike any other language I know about. But when I encounter its quirks, I assume that they're by design; a necessary condition for some beautiful aspect of the language. In PHP, I think, "somebody didn't think that through."

But perhaps I should stop there. Who am I to judge? I'm still a beginner in this field, with plenty to learn in both PHP and Ruby. Judgment can wait.

For now, I should go back to my studies. I should open my copy of "The Well-Grounded Rubyist" with a curious and open mind. And with the enjoyable expectation that, if I read and think and tinker and practice and ask questions, it will all make sense in the end.

The Sleepless Geek