Jul 17

Java Generics III

Last time I talked about using gener­ics to make get­ting val­ues out of col­lec­tions nicer (and a pro­pos­al that would obvi­ate their use) so this time I want to talk about the oth­er half–passing items into a col­lec­tion. This encom­pass­es all meth­ods that take a para­me­ter of the col­lec­tion’s gener­ic typ­ing includ­ing those that add items to the col­lec­tion.

If you look at the byte-​codes gen­er­at­ed for call­ing any of these meth­ods you will see that there is no run­time check­ing of the objects being passed. This is because the meth­ods of the base object are defined as tak­ing Object types. The only check­ing hap­pens at com­pile time. So as long as the sta­t­ic type is cor­rect every­thing is fine but it is easy to over­ride the sta­t­ic type (acci­den­tal­ly or on pur­pose) so what hap­pens? Well, if you attempt to retrieve a val­ue from the col­lec­tion and the type is not assign­ment com­pat­i­ble with the tar­get then you will get a ClassCastException.

Let’s think about that for a moment. The excep­tion is not thrown when the invalid val­ue is added to the col­lec­tion (when track­ing down the error would have the con­text of the sit­u­a­tion) but when it is removed (or exam­ined). This defeats the basic rule of “fail as ear­ly as pos­si­ble”. It also means that a state­ment with no cast oper­a­tor can fail with a class cast excep­tion. This seems counter to the spir­it of the lan­guage and in fact leads to con­fu­sion. You look at the line that the stack trace points to and say to your­self “there is noth­ing there that can throw a class cast excep­tion.” After a this hits you a time or two you will remem­ber but why should you have to make that effort?

I will grant that the gener­ic def­i­n­i­tion does make it hard­er to acci­den­tal­ly put in the wrong thing, but it does­n’t elim­i­nate it entire­ly. If we are going to have a checkcast byte-​code on retriev­ing the val­ue then why don’t we have it on putting the val­ue in as well?

Jul 13

Java Generics II

In Part I I showed how gener­ics have no effect on the byte-​codes gen­er­at­ed by the com­pil­er and that as far as the get­ters of the col­lec­tion class­es the only tan­gi­ble ben­e­fit is that you no longer have to add casts to the get expres­sion. So some­thing like this:

becomes this:

Now don’t get me wrong, I think that this is a good thing. As far as I’m con­cerned, casts are just wast­ed typ­ing that is only for the com­pil­er. So get­ting rid of casts is great. But the prob­lem is that to get this fea­ture you have use an even ugli­er syn­tax (gener­ics) in oth­er places. So the method this line is in might look like this:

Now I will grant you that the casts were in the body which obfus­cates the code you’re try­ing to read but now are in the func­tion def­i­n­i­tion and that if you are using the val­ues more than once you’ve trad­ed _​N_​casts for 1 def­i­n­i­tion. But the flip side of this is that you have now moved an imple­men­ta­tion detail into the pub­lic inter­face. In the old style this method would have said “I take a Map and do some­thing with it” but the new one says “I take a Map of String to String and do some­thing with it.” More on this in a lat­er post.

So I have an idea for a pro­posed lan­guage change that I wish I could have made (and got­ten imple­ment­ed 😉 years ago. It is a sim­ple change but I think it would have made a major pos­i­tive change to the lan­guage.

Why not remove the need for casts for down cast­ing?

If you take the basic assign­ment

The types X and Y are relat­ed in one of four ways (assum­ing nei­ther is a prim­i­tive):

  1. X and Y are the same type.
  2. X is not direct­ly relat­ed to Y.
  3. X is a super­class or inter­face of Y (upcast­ing).
  4. X is a sub­class of Y (down­cast­ing).

In case #1 no cast is nec­es­sary. In case #2 the state­ment is invalid and adding a cast will not change that. In case #3 no cast is nec­es­sary. In case #4 a cast is need­ed to com­pile and a checkcast byte-​code is gen­er­at­ed to val­i­date the assign­ment at run­time. If you look at the casts it turns out that down cast­ing is the only one that we can change and it accounts for most of the casts in a pre-​generics pro­gram.

The inter­est­ing thing about the down­cast case is that the com­pil­er checks to see if a cast could work (are the types relat­ed) but it still emits the checkcast byte-​code. So my ques­tion is if the com­pil­er already checks the assign­ment is valid and gen­er­ates a checkcast byte-​code any­way why can’t we just tell the com­pil­er “if you see a valid down­cast, just emit a checkcast byte-​code with­out requir­ing the cast”?.

To make this con­crete I want to change this

into this:

If the com­pil­er allowed this the sys­tem would be no less safe as the checkcast byte-​code would still be emit­ted to check at run­time but the code is clean­er. As far as I can tell the only rea­son the cast is required (here I’m putting on my Mind Reading Through Time Helmet) is that some­one said some­thing like “If we tell the pro­gram­mer that they are down­cast­ing and that this is a dicey oper­a­tion they will check their code to ver­i­fy that this down­cast is valid at this point and then add a cast to tell the com­pil­er that they val­i­dat­ed the code.” This sounds good but let’s be hon­est, do you real­ly check your code to ver­i­fy the cast in all cas­es or do you most­ly just add the cast because the com­pil­er wants it?

I thought so. But even if you check your code now there is no way to pre­vent a change lat­er doing the wrong thing or if you take a para­me­ter there is no way to pre­vent some­one else from mess­ing you up. This is why the com­pil­er emits the checkcast byte-​code even though you added a cast. It is too easy make mis­takes.

So giv­en that most of the time the cast is just added to shut up the com­pil­er and that the sys­tem still adds a byte-​code to pre­vent mis­takes why not just get rid of the cast require­ment? Just think how sim­ple the col­lec­tion class­es would be to use. They already do not need a cast to put object in and with this change they would not need a cast to get the val­ue out and the gen­er­at­ed code would be iden­ti­cal to the code cur­rent­ly gen­er­at­ed (see Part I).

Casts are intru­sive, ugly, unnec­es­sary for under­stand­ing the code, require dupli­cate typ­ing

and do not make the code any safer. If this change had been made to the lan­guage in any ver­sion before 1.5 I believe that there would have been a lot less demand for Generics.

Jul 13

Java Generics I

One of the biggest addi­tions to the Java lan­guage in the past few releas­es was Generics. Nearly every­one asked, begged, or demand­ed that they be added pret­ty much since Java 2. I was one of those who want­ed them. I felt that my life would be eas­i­er and that I could write bet­ter code if I had them. But now that I have them (actu­al­ly I only have them for my own per­son­al projects as my work must still sup­port 1.4) I’m not so sure that they are real­ly that impor­tant or use­ful in most sit­u­a­tions. In fact, they may actu­al­ly be a detri­ment.

For most peo­ple the major rea­son for hav­ing Generics is the col­lec­tion class­es. How many times have we had to cast the val­ues retrieved from a col­lec­tion? All those extra key­strokes and it seemed extreme­ly sil­ly that a “strong­ly typed” lan­guage would have us con­vert most of our objects to Object and then cast them back again lat­er. All of that strong typ­ing down the drain.

So Generics, when used with the col­lec­tion class­es, give us two basic fea­tures:

  1. Objects do not need casts when removed from a col­lec­tion. Removing cast expres­sions from state­ments makes them eas­i­er to read.
  2. Object types are sta­t­i­cal­ly checked when added to a col­lec­tion. Now you can’t acci­den­tal­ly put the wrong object on a col­lec­tion (unless you inten­tion­al­ly try to cheat the sys­tem).

Let’s take the cast­ing one first. Removing casts from the code is always a good thing. They don’t real­ly con­vey any use­ful infor­ma­tion to peo­ple, they just make the com­pil­er hap­py. But there is an inter­est­ing detail of Generics dis­cussed by oth­ers but is still not com­mon knowl­edge. That is (unlike C# and the .NET run-​time) there is no sup­port for gener­ics in the JVM. No byte-​codes (read instruc­tions) or any oth­er def­i­n­i­tions. What this means is that when the com­pil­er sees your gen­er­al­ized code it com­piles it to some­thing that could run on a 1.4 JVM.

Here is a sim­ple exam­ple to show what I mean. I took this Java file and com­piled it using the 1.6 JDK.

This class has two meth­ods, one that takes a sin­gle para­me­ter which is a Map from String to String the oth­er is an old-​style Map with no type. Note that the gen­er­al­ized Map.get() does not require a cast while the old style does. Now things get inter­est­ing when we dis­as­sem­ble the .class file (I used javap -c).

pub­lic void doGeneric(java.util.Map);
0: aload_​1
1: ldc #18; /​/​String abc
3: invokein­ter­face #20, 2; /​/​InterfaceMethod java/util/Map.get:(Ljava/lang/Object;)Ljava/lang/Object;
8: check­cast #26; /​/​class java/​lang/​String
11: astore_​2
12: return

pub­lic void doRaw(java.util.Map);
0: aload_​1
1: ldc #18; /​/​String abc
3: invokein­ter­face #20, 2; /​/​InterfaceMethod java/util/Map.get:(Ljava/lang/Object;)Ljava/lang/Object;
8: check­cast #26; /​/​class java/​lang/​String
11: astore_​2
12: return

These are the byte-​codes gen­er­at­ed for the two func­tions. As you can see they are iden­ti­cal. Let that sink in a bit. They aren’t just sim­i­lar, they are exact­ly the same.

The two lines of impor­tance for this dis­cus­sion #6 and #15. They are the checkcast byte-​codes. This byte-​code is what a down cast com­piles to in Java. It makes sure that the tar­get object is of the giv­en type or an assign­a­ble type (i.e. sub­class). Even though we have a strong­ly typed Map using gener­ics the code gen­er­at­ed still has a checkcast byte-​code. The sys­tem knows that you can sub­vert the type with a cast so to guar­an­tee safe­ty it still must check that the type is assign­a­ble. So the only thing that the gener­ic def­i­n­i­tion has bought us is we don’t need the cast to String.

This post is get­ting long so we will con­tin­ue in Part II.