What Is OO Really All About?
I personally came quite late to learning object-oriented programming. For the first eight years or so of my career I had been programming exclusively in procedural languages: C and Oracle PL/SQL. That was not out of choice, it just worked out that way. I was interested in learning OO but most of what I found to read on the subject began with concepts such as encapsulation and polymorphism, would explain that objects interacted by sending each other messages, and generally extolled OO programming as a better way to model real world objects, all of which sounded very good but left me none the wiser how you would go about writing object oriented code.
The first explanation I found really useful was in Bruce Eckel's book Thinking In C++, in the chapter entitled Data Abstraction, wherein he explains that a class is a struct with functions in it.
Holy cow!
This simple extra bit of knowledge was all I needed for the light to flick on. I had been programming C for a while so I was familiar with structs. I wondered why couldn't anyone have explained it to me like this before?
There is no need to bore you with the rest. I did not truly begin to learn OO until a project came along that required me to learn it: one day my employer asked me to write a web service when I had never even heard the words "web service" before; thus it proved to be Java and not C++ that provided my entry point into OO programming. Later on I heard about design patterns, I learned some of them, and began to apply them to my own programs where it seemed good. With some knowledge and experience under my belt I reflected on the explanations I had read years ago about OO programming, and how unhelpful they had mostly been.
Recently I read this blog by Uncle Bob about the difference between OO and Functional Programming, and how it is a misconception that they are mutually exclusive. He explains the essence of both, and why they are good things. He explains object orientation like this:
OO imposes discipline on function pointers.
Quite so. He goes into some detail about it in this lecture as well. But without some background knowledge, it may not be obvious why it is so. I wanted to explain that a little more in this blog with an example.
Imagine we have a struct that contains a person's first and last names:
typedef struct {
char *first_name;
char *last_name;
} person_t;
I must admit to missing the simplicity of a struct. Java seems so cumbersome when you generate getters and setters for a class that contains no other code, and all you want is a simple data structure. I am sometimes tempted to make my instance variables public for this reason. But I digress. We could write a function to output a greeting like this:
void greet(person_t person) {
printf("Hello %s %s!\n", person.first_name, person.last_name);
}
We can write another one to populate a person_t struct:
person_t create_person(char *first_name, char *last_name) {
person_t person;
person.first_name = first_name;
person.last_name = last_name;
return person;
}
Assuming that all the code above exists in the file structured.h we can string the whole thing together in the program file structured.c which populates three structs, puts them in an array, loops over the array and calls greet() for each struct:
#include <stdio.h>
#include <stdlib.h>
#include "structured.h"
person_t people_array[3];
person_t *create_people(person_t people[]) {
people[0] = create_person("Tony", "Blair");
people[1] = create_person("David", "Cameron");
people[2] = create_person("Theresa", "May");
return people;
}
int main(int argc, char *argv[]) {
person_t *person_ptr = create_people(people_array);
for (int i=0; i<3; i++) {
greet(*person_ptr++);
}
exit(EXIT_SUCCESS);
}
Compile it, then run it, and the output is what you'd expect:
$ make structured cc structured.c -o structured $ ./structured Hello Tony Blair! Hello David Cameron! Hello Theresa May!
Already in this code we see an example of a pointer. A pointer is a variable whose value is a memory address; it is declared by prefixing the variable name with an asterisk, and you dereference the pointer also by prefixing it with an asterisk. C allows you to specify a type for a pointer, which informs the compiler what type of data to expect at the memory address referenced by the pointer. It also permits pointer arithmetic; when you increment or decrement a pointer the compiler ensures that it is in/decremented by the size in bytes of the data type referenced by the pointer. Therefore the C expression *person_ptr++ will return the person_t struct that is referenced by person_ptr and then increment the pointer to reference the following person_t struct in memory.
But pointers in C can reference functions as well. If that sounds strange, consider that a function is nothing more than a named subroutine. The compiler turns that function into an assembly language subroutine, the assembler turns it into machine code, and the linker places it at an addressable location in the executable. When you call the function, the calling code places the function arguments plus the return address onto the stack; a branch instruction then jumps to the subroutine, where the arguments and the return address are read back off the stack. You could just as easily alter the branch instruction in the calling code to make it jump to another subroutine at a different location in memory. Provided the second subroutine expects the same function arguments on the stack as the first one does, one subroutine can be swapped out for the other. This is how polymorphism works.
We can give ourselves polymorphism in C by adding a function pointer to our structure. It would look like this:
typedef struct {
char *first_name;
char *last_name;
void (*greet)(void *self);
} person_t;
The function pointer is called greet and it accepts a single argument, which is a pointer to void called self. Then we can define a number of different functions that have the same signature:
void english_greeting(void *self) {
person_t *this = (person_t *) self;
printf("Hello %s %s!\n", this->first_name, this->last_name);
}
void french_greeting(void *self) {
person_t *this = (person_t *) self;
printf("Bonjoir %s %s!\n", this->first_name, this->last_name);
}
void german_greeting(void *self) {
person_t *this = (person_t *) self;
printf("Guten Tag %s %s!\n", this->first_name, this->last_name);
}
The -> notation is a shorthand to dereference a member of a struct via a pointer to the struct. We can then modify the create_person function to accept a pointer to the greeting function we want to use. It sets up a reference to the function in the struct:
person_t create_person(char *first_name, char *last_name, void (*greeting_strategy)(void *)) {
person_t person;
person.first_name = first_name;
person.last_name = last_name;
person.greet = greeting_strategy;
return person;
}
You might notice that this is, essentially, the strategy pattern. We then modify create_people to pass a function pointer with each call:
person_t *create_people(person_t people[]) {
people[0] = create_person("Tony", "Blair", &english_greeting);
people[1] = create_person("David", "Cameron", &french_greeting);
people[2] = create_person("Theresa", "May", &german_greeting);
return people;
}
The & operator performs the opposite action to the * operator; it returns the memory address of its operand, which in this case is a function.
Finally, we can rewrite our main function so that it calls the greet function via the function pointer held on the struct:
int main(int argc, char *argv[]) {
person_t *person_ptr = create_people(people_array);
for (int i=0; i<3; i++) {
person_ptr->greet(person_ptr);
person_ptr++;
}
exit(EXIT_SUCCESS);
}
Compile and execute the program, and you get this:
$ make objectoriented cc objectoriented.c -o objectoriented $ ./objectoriented Hello Tony Blair! Bonjoir David Cameron! Guten Tag Theresa May!
Hurrah! Each person was greeted with their own polymorphic greeting. Would you actually code this way? In the past, people have done exactly that. But it's clearly very cumbersome and there are many ways you could screw up this code. Particular problems are:
- The syntax for managing pointers is very awkward.
- In C there is no concept of this, therefore we have to implement it ourselves by passing a reference to the structure ourselves in the function argument. If you want to add several functions to your "class," you have to do this for all of them, in addition to whatever other arguments they require.
- The function cannot be declared to accept a pointer to the type person_t because at the time the compiler sees the function pointer declaration, it hasn't seen what person_t is yet! Therefore we are obliged to make this ugly cast.
- Because the onus is on you to provide the self reference, you have to provide the correct reference! If you get it wrong, the compiler cannot catch this mistake for you.
- Indeed, because of the cast, we could supply a pointer to anything in the self argument, and the compiler would not catch this mistake either!
By comparison, the equivalent code in Java would look like this:
import java.util.function.Function;
import java.util.stream.Stream;
import static java.lang.String.format;
public class Person {
private String firstName = null;
private String lastName = null;
private Function<Person, String> greetingStrategy;
public Person(String firstName, String lastName, Function<Person, String> greetingStrategy) {
this.firstName = firstName;
this.lastName = lastName;
this.greetingStrategy = greetingStrategy;
}
public void greet() {
System.out.println(greetingStrategy.apply(this));
}
static Function<Person, String> englishGreeting = (p) -> format("Hello %s %s!", p.firstName, p.lastName);
static Function<Person, String> frenchGreeting = (p) -> format("Bonjoir %s %s!", p.firstName, p.lastName);
static Function<Person, String> germanGreeting = (p) -> format("Guten Tag %s %s!", p.firstName, p.lastName);
public static void main(String[] args) {
Stream.of(
new Person("Tony", "Blair", englishGreeting),
new Person("David", "Cameron", frenchGreeting),
new Person("Theresa", "May", germanGreeting))
.forEach(Person::greet);
}
}
This is much neater, although I daresay in a language like Clojure it would be neater still. The awkward casts are gone, as is the self argument to the polymorphic greet function. There is nothing in the code above to suggest that function pointers are involved at all. (The only time this reality intrudes in Java is when you encounter a NullPointerException). The Java compiler hides all this complexity from you, and allows you to get on with making use of polymorphism.
This is what Bob means when he says that OO imposes discipline on function pointers. The Java compiler imposes the discipline so that you may not even be aware it's there! It ensures that you only ever provide a greeting strategy for a Person that accepts a Person argument and returns a String. The C compiler lets you do whatever you want, and if you do the wrong thing then you'll probably only find out when your code blows up at runtime! If your program is very big and complex then it may be very difficult to discover your mistake. OO languages give you features like inheritance that allow you to manage your polymorphic functions in structured ways; ways that make sense to the programmer and enable you to increase the cohesion of your code. It allows you to easily invert your dependencies so that you can decouple your designs.
You could do all this yourself in C, but in any reasonably complex program you would have to be very disciplined to pull it off successfully. OO languages impose the discipline for you. That is what object orientation is all about.