Why clone is faster than constructor copy

This post is a followup on my previous post about copying objects in Java. After I published that post I got a question from Sven Reimers (@SvenNB) why there is a big performance different between clone and copying via constructor. In this post I will try to answer this question.

Code in question

Just to recap what we are looking at. There are 2 classes implementing copy() method:

  • Copy via clone():
    package com.vyazelenko.blog.copyobject.primitives.clone;
    import com.vyazelenko.blog.copyobject.primitives.BaseClass;
    public class CloneCopy extends BaseClass implements Cloneable {
    public static final CloneCopy INSTANCE;
    static {
    INSTANCE = new CloneCopy();
    INSTANCE.init();
    }
    @Override
    protected CloneCopy clone() {
    try {
    return (CloneCopy) super.clone();
    } catch (CloneNotSupportedException e) {
    throw new Error(e);
    }
    }
    @Override
    public CloneCopy copy() {
    return clone();
    }
    }

    view raw
    CloneCopy.java
    hosted with ❤ by GitHub

  • Copy via constructor:
    package com.vyazelenko.blog.copyobject.primitives.constructor;
    import com.vyazelenko.blog.copyobject.primitives.BaseClass;
    public class ConstructorCopy extends BaseClass implements Cloneable {
    public static final ConstructorCopy INSTANCE;
    static {
    INSTANCE = new ConstructorCopy();
    INSTANCE.init();
    }
    public ConstructorCopy() {
    super();
    }
    public ConstructorCopy(ConstructorCopy copyFrom) {
    super(copyFrom);
    }
    @Override
    public ConstructorCopy copy() {
    return new ConstructorCopy(this);
    }
    }

    view raw
    ConstructorCopy.java
    hosted with ❤ by GitHub

  • Both of these classes inherit from the common base class that defines state to be copied:

    package com.vyazelenko.blog.copyobject.primitives;
    import com.vyazelenko.blog.copyobject.Copyable;
    import com.vyazelenko.blog.copyobject.HashUtils;
    abstract class Root implements Copyable {
    private int field1;
    private char field2;
    public boolean field6;
    byte abc;
    public long min;
    public long max;
    private double maxExponent;
    public Root() {
    }
    public void init() {
    field1 = 100;
    field2 = '\t';
    field6 = false;
    abc = 100;
    min = Long.MIN_VALUE;
    max = Long.MAX_VALUE;
    maxExponent = Double.MAX_VALUE;
    }
    public Root(Root copyFrom) {
    field1 = copyFrom.field1;
    field2 = copyFrom.field2;
    field6 = copyFrom.field6;
    abc = copyFrom.abc;
    min = copyFrom.min;
    max = copyFrom.max;
    maxExponent = copyFrom.maxExponent;
    }
    @Override
    public boolean equals(Object obj) {
    if (obj == this) {
    return true;
    } else if (!(obj instanceof Root)) {
    return false;
    } else {
    Root tmp = (Root) obj;
    return field1 == tmp.field1 && field2 == tmp.field2 && field6 == tmp.field6
    && abc == tmp.abc && min == tmp.min && max == tmp.max
    && Double.compare(maxExponent, tmp.maxExponent) == 0;
    }
    }
    @Override
    public int hashCode() {
    int hash = 17;
    hash += 31 * hash + field1;
    hash += 31 * hash + field2;
    hash += 31 * hash + (field6 ? 1 : 0);
    hash += 31 * hash + abc;
    hash += 31 * hash + HashUtils.longHash(min);
    hash += 31 * hash + HashUtils.longHash(max);
    hash += 31 * hash + HashUtils.doubleHash(maxExponent);
    return hash;
    }
    }
    public abstract class BaseClass extends Root {
    private double anotherField;
    private int field1;
    protected long youCanSeeMe;
    private short m1;
    public short m2;
    public short m3;
    public short m4;
    short m5;
    public short m6;
    public short m7;
    protected short m8;
    public short m9;
    public short m10;
    private char x;
    public BaseClass() {
    super();
    }
    @Override
    public void init() {
    super.init();
    anotherField = 10.5;
    field1 = Integer.MIN_VALUE;
    youCanSeeMe = 1;
    m1 = 10;
    m2 = 20;
    m3 = 30;
    m4 = 40;
    m5 = 50;
    m6 = 60;
    m7 = 70;
    m8 = 80;
    m9 = 90;
    m10 = 100;
    x = '\n';
    }
    public BaseClass(BaseClass copyFrom) {
    super(copyFrom);
    anotherField = copyFrom.anotherField;
    field1 = copyFrom.field1;
    youCanSeeMe = copyFrom.youCanSeeMe;
    m1 = copyFrom.m1;
    m2 = copyFrom.m2;
    m3 = copyFrom.m3;
    m4 = copyFrom.m4;
    m5 = copyFrom.m5;
    m6 = copyFrom.m6;
    m7 = copyFrom.m7;
    m8 = copyFrom.m8;
    m9 = copyFrom.m9;
    m10 = copyFrom.m10;
    x = copyFrom.x;
    }
    @Override
    public boolean equals(Object obj) {
    if (obj == this) {
    return true;
    } else if (!(obj instanceof BaseClass)) {
    return false;
    } else {
    BaseClass tmp = (BaseClass) obj;
    return super.equals(tmp) && Double.compare(anotherField, tmp.anotherField) == 0
    && field1 == tmp.field1 && youCanSeeMe == tmp.youCanSeeMe
    && m1 == tmp.m1 && m2 == tmp.m2 && m3 == tmp.m3 && m4 == tmp.m4
    && m5 == tmp.m5 && m6 == tmp.m6 && m7 == tmp.m7 && m8 == tmp.m8
    && m9 == tmp.m9 && m10 == tmp.m10 && x == tmp.x;
    }
    }
    @Override
    public int hashCode() {
    int hash = super.hashCode();
    hash = 31 * hash + HashUtils.doubleHash(anotherField);
    hash = 31 * hash + field1;
    hash = 31 * hash + HashUtils.longHash(youCanSeeMe);
    hash = 31 * hash + m1;
    hash = 31 * hash + m2;
    hash = 31 * hash + m3;
    hash = 31 * hash + m4;
    hash = 31 * hash + m5;
    hash = 31 * hash + m6;
    hash = 31 * hash + m7;
    hash = 31 * hash + m8;
    hash = 31 * hash + m9;
    hash = 31 * hash + m10;
    hash = 31 * hash + x;
    return hash;
    }
    }

    view raw
    BaseClass.java
    hosted with ❤ by GitHub

Clone under the hood

java.lang.Object defines clone() method as native thus giving JVM possibility to use intrinsics. And in fact this is what OpenJDK JVM implementation is doing under the hood:

/*
* Defined in the hotspot/src/share/vm/classfile/vmSymbols.hpp
*/
#define VM_SYMBOLS_DO(template, do_alias) \
/* commonly used class names */ \
template(java_lang_System, "java/lang/System") \
template(java_lang_Object, "java/lang/Object") \
#define VM_INTRINSICS_DO(do_intrinsic, do_class, do_name, do_signature, do_alias) \
do_intrinsic(_hashCode, java_lang_Object, hashCode_name, void_int_signature, F_R) \
do_name( hashCode_name, "hashCode") \
do_intrinsic(_getClass, java_lang_Object, getClass_name, void_class_signature, F_R) \
do_name( getClass_name, "getClass") \
do_intrinsic(_clone, java_lang_Object, clone_name, void_object_signature, F_R) \
do_name( clone_name, "clone") \

view raw
CloneIntrinsics.hpp
hosted with ❤ by GitHub

Unfortunately I was not able to find exactly how such intrinsified clone() method call would look like. If any of you knows the answer I would be more than happy to hear about it!

Test code and results

This time I won’t be using JMH running my tests, because I just need to force JVM to compile methods in question. For each case there is a dedicated test class (i.e. TestClone.java and TestConstructor.java) that invokes copy() method 500 000 times during warmup phase and then another 10 000 000 during actual test phase. These numbers are not particularly relevant and they were chosen to ensure that JVM will compile copy methods into native code.
I will use 1.7.0_45 JDK version.

Here are test classes:

package com.vyazelenko.blog.copyobject;
import com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy;
import java.util.ArrayList;
import java.util.List;
public class TestClone {
public static List<Copyable> results;
public static void main(String[] args) {
runTest();
}
public static void runTest() {
warmup();
test();
}
private static void warmup() {
doCopy(500_000, "warmup");
}
private static void doCopy(int iterations, String message) {
results = new ArrayList<>(iterations);
System.out.println("\n\n>>> In " + message);
for (int i = 0; i < iterations; i++) {
results.add(callCopy());
}
System.out.println("<<< " + message + " completed");
}
private int resultsHash() {
return results.hashCode();
}
private static Copyable callCopy() {
return CloneCopy.INSTANCE.copy();
}
private static void test() {
doCopy(10_000_000, "test");
}
}

view raw
TestClone.java
hosted with ❤ by GitHub

package com.vyazelenko.blog.copyobject;
import com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy;
import com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy;
import java.util.ArrayList;
import java.util.List;
public class TestConstructor {
public static List<Copyable> results;
public static void main(String[] args) {
runTest();
}
public static void runTest() {
warmup();
test();
}
private static void warmup() {
doCopy(500_000, "warmup");
}
private static void doCopy(int iterations, String message) {
results = new ArrayList<>(iterations);
System.out.println("\n\n>>> In " + message);
for (int i = 0; i < iterations; i++) {
results.add(callCopy());
}
System.out.println("<<< " + message + " completed");
}
private int resultsHash() {
return results.hashCode();
}
private static Copyable callCopy() {
return ConstructorCopy.INSTANCE.copy();
}
private static void test() {
doCopy(10_000_000, "test");
}
}

view raw
TestConstructor.java
hosted with ❤ by GitHub

I ran both tests with -XX:+PrintCompilation option and got the following results:

  • Clone:
    java -XX:+PrintCompilation com.vyazelenko.blog.copyobject.TestClone
    59 1 java.lang.String::hashCode (55 bytes)
    61 2 java.lang.String::indexOf (70 bytes)
    69 3 sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
    77 4 java.util.ArrayList::add (29 bytes)
    77 5 java.util.ArrayList::ensureCapacityInternal (23 bytes)
    77 7 n java.lang.Object::clone (native)
    78 6 java.util.ArrayList::ensureExplicitCapacity (26 bytes)
    78 8 com.vyazelenko.blog.copyobject.TestClone::callCopy (7 bytes)
    78 9 com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy (5 bytes)
    79 10 ! com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone (18 bytes)
    79 11 % com.vyazelenko.blog.copyobject.TestClone::doCopy @ 38 (92 bytes)
    105 12 com.vyazelenko.blog.copyobject.TestClone::doCopy (92 bytes)

  • Constructor:
    java -XX:+PrintCompilation com.vyazelenko.blog.copyobject.TestConstructor
    59 1 java.lang.String::hashCode (55 bytes)
    61 2 java.lang.String::indexOf (70 bytes)
    70 3 sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
    77 4 java.lang.Object::<init> (1 bytes)
    80 5 java.util.ArrayList::add (29 bytes)
    80 6 java.util.ArrayList::ensureCapacityInternal (23 bytes)
    81 7 java.util.ArrayList::ensureExplicitCapacity (26 bytes)
    81 8 com.vyazelenko.blog.copyobject.TestConstructor::callCopy (7 bytes)
    81 9 com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy (9 bytes)
    83 10 com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init> (6 bytes)
    83 11 com.vyazelenko.blog.copyobject.primitives.BaseClass::<init> (118 bytes)
    83 12 com.vyazelenko.blog.copyobject.primitives.Root::<init> (61 bytes)
    83 13 % com.vyazelenko.blog.copyobject.TestConstructor::doCopy @ 38 (92 bytes)
    113 14 com.vyazelenko.blog.copyobject.TestConstructor::doCopy (92 bytes)

This by itself is not telling us much except that in the second (constructor) case there are 2 more entries that were compiled (i.e. BaseClass::<init> and Root::<init>).

The real fun is to look into generated assembler code (i.e. -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly). If you want to know how to dump assembly you can use java-print-assembly instructions provided by Nitsan Wakart on his blog. Links to proper binaries saved me tons of time today. 😉

I will show here only relevant part of the assembly dumps which contain only compiled callCopy() method from each test class, because it is the one we are interested in:

  • Clone (complete ASM code):
    Decoding compiled method 0x00000001063b9390:
    Code:
    [Entry Point]
    [Verified Entry Point]
    [Constants]
    # {method} 'callCopy' '()Lcom/vyazelenko/blog/copyobject/Copyable;' in 'com/vyazelenko/blog/copyobject/TestClone'
    # [sp+0x20] (sp of caller)
    0x00000001063b94e0: mov %eax,-0x14000(%rsp)
    0x00000001063b94e7: push %rbp
    0x00000001063b94e8: sub $0x10,%rsp
    0x00000001063b94ec: mov 0x60(%r15),%rsi
    0x00000001063b94f0: mov %rsi,%r10
    0x00000001063b94f3: add $0x58,%r10
    0x00000001063b94f7: cmp 0x70(%r15),%r10
    0x00000001063b94fb: jae 0x00000001063b9557
    0x00000001063b94fd: mov %r10,0x60(%r15)
    0x00000001063b9501: prefetchnta 0xc0(%r10)
    0x00000001063b9509: mov $0xd7610ea1,%r11d ; {oop('com/vyazelenko/blog/copyobject/primitives/clone/CloneCopy')}
    0x00000001063b950f: mov 0xb0(%r12,%r11,8),%r10
    0x00000001063b9517: mov %r10,(%rsi)
    0x00000001063b951a: movl $0xd7610ea1,0x8(%rsi) ; {oop('com/vyazelenko/blog/copyobject/primitives/clone/CloneCopy')}
    0x00000001063b9521: mov %rsi,%rbx
    0x00000001063b9524: add $0x8,%rsi
    0x00000001063b9528: mov $0xa,%edx
    0x00000001063b952d: movabs $0x7957a9d90,%rdi ; {oop(a 'com/vyazelenko/blog/copyobject/primitives/clone/CloneCopy')}
    0x00000001063b9537: add $0x8,%rdi
    0x00000001063b953b: movabs $0x106398120,%r10
    0x00000001063b9545: callq *%r10 ;*invokespecial clone
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone@1 (line 16)
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy@1 (line 24)
    ; – com.vyazelenko.blog.copyobject.TestClone::callCopy@3 (line 38)
    0x00000001063b9548: mov %rbx,%rax
    0x00000001063b954b: add $0x10,%rsp
    0x00000001063b954f: pop %rbp
    0x00000001063b9550: test %eax,-0x1bcc556(%rip) # 0x00000001047ed000
    ; {poll_return}
    0x00000001063b9556: retq
    0x00000001063b9557: movabs $0x6bb087508,%rsi ; {oop('com/vyazelenko/blog/copyobject/primitives/clone/CloneCopy')}
    0x00000001063b9561: xchg %ax,%ax
    0x00000001063b9563: callq 0x00000001063b4fe0 ; OopMap{off=136}
    ;*invokespecial clone
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone@1 (line 16)
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy@1 (line 24)
    ; – com.vyazelenko.blog.copyobject.TestClone::callCopy@3 (line 38)
    ; {runtime_call}
    0x00000001063b9568: mov %rax,%rsi
    0x00000001063b956b: jmp 0x00000001063b9521
    0x00000001063b956d: mov 0x8(%rax),%r10d
    0x00000001063b9571: cmp $0xd7610fdb,%r10d ; {oop('java/lang/CloneNotSupportedException')}
    0x00000001063b9578: je 0x00000001063b9587
    0x00000001063b957a: mov %rax,%rsi
    0x00000001063b957d: add $0x10,%rsp
    0x00000001063b9581: pop %rbp
    0x00000001063b9582: jmpq 0x00000001063b7e20 ; {runtime_call}
    0x00000001063b9587: mov %rax,%rbp ;*areturn
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone@7 (line 16)
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy@1 (line 24)
    ; – com.vyazelenko.blog.copyobject.TestClone::callCopy@3 (line 38)
    0x00000001063b958a: mov $0x5,%esi
    0x00000001063b958f: callq 0x000000010638ef20 ; OopMap{rbp=Oop off=180}
    ;*new ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone@9 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy@1 (line 24)
    ; – com.vyazelenko.blog.copyobject.TestClone::callCopy@3 (line 38)
    ; {runtime_call}
    0x00000001063b9594: callq 0x0000000105c165de ;*new
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::clone@9 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.clone.CloneCopy::copy@1 (line 24)
    ; – com.vyazelenko.blog.copyobject.TestClone::callCopy@3 (line 38)
    ; {runtime_call}
    0x00000001063b9599: hlt
    0x00000001063b959a: hlt
    0x00000001063b959b: hlt
    0x00000001063b959c: hlt
    0x00000001063b959d: hlt
    0x00000001063b959e: hlt
    0x00000001063b959f: hlt
    [Exception Handler]
    [Stub Code]
    0x00000001063b95a0: jmpq 0x00000001063b50a0 ; {no_reloc}
    [Deopt Handler Code]
    0x00000001063b95a5: callq 0x00000001063b95aa
    0x00000001063b95aa: subq $0x5,(%rsp)
    0x00000001063b95af: jmpq 0x000000010638eb00 ; {runtime_call}
    0x00000001063b95b4: hlt
    0x00000001063b95b5: hlt
    0x00000001063b95b6: hlt
    0x00000001063b95b7: hlt

  • Constructor (complete ASM code):
    Decoding compiled method 0x000000010c064b50:
    Code:
    [Entry Point]
    [Verified Entry Point]
    [Constants]
    # {method} 'callCopy' '()Lcom/vyazelenko/blog/copyobject/Copyable;' in 'com/vyazelenko/blog/copyobject/TestConstructor'
    # [sp+0x20] (sp of caller)
    0x000000010c064ca0: mov %eax,-0x14000(%rsp)
    0x000000010c064ca7: push %rbp
    0x000000010c064ca8: sub $0x10,%rsp ;*synchronization entry
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@-1 (line 39)
    0x000000010c064cac: mov 0x60(%r15),%r8
    0x000000010c064cb0: mov %r8,%r10
    0x000000010c064cb3: add $0x58,%r10
    0x000000010c064cb7: cmp 0x70(%r15),%r10
    0x000000010c064cbb: jae 0x000000010c064de3
    0x000000010c064cc1: mov %r10,0x60(%r15)
    0x000000010c064cc5: prefetchnta 0xc0(%r10)
    0x000000010c064ccd: mov %r8,%rdi
    0x000000010c064cd0: add $0x10,%rdi
    0x000000010c064cd4: mov $0xd7610e71,%r10d ; {oop('com/vyazelenko/blog/copyobject/primitives/constructor/ConstructorCopy')}
    0x000000010c064cda: mov 0xb0(%r12,%r10,8),%r10
    0x000000010c064ce2: mov %r10,(%r8)
    0x000000010c064ce5: movl $0xd7610e71,0x8(%r8) ; {oop('com/vyazelenko/blog/copyobject/primitives/constructor/ConstructorCopy')}
    0x000000010c064ced: mov %r12d,0xc(%r8)
    0x000000010c064cf1: mov $0x9,%ecx
    0x000000010c064cf6: xor %rax,%rax
    0x000000010c064cf9: shl $0x3,%rcx
    0x000000010c064cfd: rep rex.W stos %al,%es:(%rdi) ;*new
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@0 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d00: mov %r8,%rax ;*areturn
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@6 (line 39)
    0x000000010c064d03: movabs $0x7957aa098,%r9 ; {oop(a 'com/vyazelenko/blog/copyobject/primitives/constructor/ConstructorCopy')}
    0x000000010c064d0d: mov 0xc(%r9),%r11d
    0x000000010c064d11: mov %r11d,0xc(%r8) ;*putfield field1
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@9 (line 29)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d15: movzwl 0x54(%r9),%r10d
    0x000000010c064d1a: mov %r10w,0x54(%r8) ;*putfield x
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@114 (line 120)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d1f: movswl 0x52(%r9),%r11d
    0x000000010c064d24: mov %r11w,0x52(%r8) ;*putfield m10
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@106 (line 119)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d29: movswl 0x50(%r9),%r10d
    0x000000010c064d2e: mov %r10w,0x50(%r8) ;*putfield m9
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@98 (line 118)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d33: movswl 0x4e(%r9),%r11d
    0x000000010c064d38: mov %r11w,0x4e(%r8) ;*putfield m8
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@90 (line 117)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d3d: movswl 0x4c(%r9),%r10d
    0x000000010c064d42: mov %r10w,0x4c(%r8) ;*putfield m7
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@82 (line 116)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d47: movswl 0x4a(%r9),%r11d
    0x000000010c064d4c: mov %r11w,0x4a(%r8) ;*putfield m6
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@74 (line 115)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d51: movswl 0x48(%r9),%r10d
    0x000000010c064d56: mov %r10w,0x48(%r8) ;*putfield m5
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@66 (line 114)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d5b: movswl 0x46(%r9),%r11d
    0x000000010c064d60: mov %r11w,0x46(%r8) ;*putfield m4
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@58 (line 113)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d65: movswl 0x44(%r9),%r10d
    0x000000010c064d6a: mov %r10w,0x44(%r8) ;*putfield m3
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@50 (line 112)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d6f: movswl 0x42(%r9),%r11d
    0x000000010c064d74: mov %r11w,0x42(%r8) ;*putfield m2
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@42 (line 111)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d79: movswl 0x40(%r9),%r10d
    0x000000010c064d7e: mov %r10w,0x40(%r8) ;*putfield m1
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@34 (line 110)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d83: mov 0x38(%r9),%r10
    0x000000010c064d87: mov %r10,0x38(%r8) ;*putfield youCanSeeMe
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@26 (line 109)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d8b: vmovsd 0x30(%r9),%xmm0
    0x000000010c064d91: vmovsd %xmm0,0x30(%r8) ;*putfield anotherField
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@10 (line 107)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d97: mov 0x2c(%r9),%r11d
    0x000000010c064d9b: mov %r11d,0x2c(%r8) ;*putfield field1
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@18 (line 108)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064d9f: movsbl 0x2b(%r9),%r10d
    0x000000010c064da4: mov %r10b,0x2b(%r8) ;*putfield abc
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@33 (line 32)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064da8: movzbl 0x2a(%r9),%r11d
    0x000000010c064dad: mov %r11b,0x2a(%r8) ;*putfield field6
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@25 (line 31)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064db1: movzwl 0x28(%r9),%r10d
    0x000000010c064db6: mov %r10w,0x28(%r8) ;*putfield field2
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@17 (line 30)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064dbb: vmovsd 0x20(%r9),%xmm0
    0x000000010c064dc1: vmovsd %xmm0,0x20(%r8) ;*putfield maxExponent
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@57 (line 35)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064dc7: mov 0x18(%r9),%r10
    0x000000010c064dcb: mov %r10,0x18(%r8) ;*putfield max
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@49 (line 34)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064dcf: mov 0x10(%r9),%r10
    0x000000010c064dd3: mov %r10,0x10(%r8) ;*putfield min
    ; – com.vyazelenko.blog.copyobject.primitives.Root::<init>@41 (line 33)
    ; – com.vyazelenko.blog.copyobject.primitives.BaseClass::<init>@2 (line 106)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::<init>@2 (line 18)
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@5 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064dd7: add $0x10,%rsp
    0x000000010c064ddb: pop %rbp
    0x000000010c064ddc: test %eax,-0x10b4de2(%rip) # 0x000000010afb0000
    ; {poll_return}
    0x000000010c064de2: retq
    0x000000010c064de3: movabs $0x6bb087388,%rsi ; {oop('com/vyazelenko/blog/copyobject/primitives/constructor/ConstructorCopy')}
    0x000000010c064ded: xchg %ax,%ax
    0x000000010c064def: callq 0x000000010c05efe0 ; OopMap{off=340}
    ;*new ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@0 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    ; {runtime_call}
    0x000000010c064df4: mov %rax,%r8
    0x000000010c064df7: jmpq 0x000000010c064d00 ;*new
    ; – com.vyazelenko.blog.copyobject.primitives.constructor.ConstructorCopy::copy@0 (line 23)
    ; – com.vyazelenko.blog.copyobject.TestConstructor::callCopy@3 (line 39)
    0x000000010c064dfc: mov %rax,%rsi
    0x000000010c064dff: add $0x10,%rsp
    0x000000010c064e03: pop %rbp
    0x000000010c064e04: jmpq 0x000000010c061e20 ; {runtime_call}
    0x000000010c064e09: hlt
    0x000000010c064e0a: hlt
    0x000000010c064e0b: hlt
    0x000000010c064e0c: hlt
    0x000000010c064e0d: hlt
    0x000000010c064e0e: hlt
    0x000000010c064e0f: hlt
    0x000000010c064e10: hlt
    0x000000010c064e11: hlt
    0x000000010c064e12: hlt
    0x000000010c064e13: hlt
    0x000000010c064e14: hlt
    0x000000010c064e15: hlt
    0x000000010c064e16: hlt
    0x000000010c064e17: hlt
    0x000000010c064e18: hlt
    0x000000010c064e19: hlt
    0x000000010c064e1a: hlt
    0x000000010c064e1b: hlt
    0x000000010c064e1c: hlt
    0x000000010c064e1d: hlt
    0x000000010c064e1e: hlt
    0x000000010c064e1f: hlt
    [Exception Handler]
    [Stub Code]
    0x000000010c064e20: jmpq 0x000000010c05f0a0 ; {no_reloc}
    [Deopt Handler Code]
    0x000000010c064e25: callq 0x000000010c064e2a
    0x000000010c064e2a: subq $0x5,(%rsp)
    0x000000010c064e2f: jmpq 0x000000010c038b00 ; {runtime_call}
    0x000000010c064e34: hlt
    0x000000010c064e35: hlt
    0x000000010c064e36: hlt
    0x000000010c064e37: hlt

As you can see clone case has much shorter assembler code and basically it is just an *invokespecial clone invocation. Whereas in the constructor case we see much bigger assembler output and in essence it contains multiple *putfield invocations.

CPU counters

Eventually I managed to compile Intel Performance Counter Monitor 2.5.1 on my OS X 10.9.
Here are the results of running clone and constructor code under PCM (NB: I changed number of iterations to 20'000'000 in test() method for this run):

  • Clone:
    pcm.x "java -Xms4g -Xmx5g com.vyazelenko.blog.copyobject.TestClone" -nc -ns
    Intel(r) Performance Counter Monitor V2.5.1 (2013-06-25 13:44:03 +0200 ID=76b6d1f)
    Copyright (c) 2009-2012 Intel Corporation
    Num logical cores: 8
    Num sockets: 1
    Threads per core: 2
    Core PMU (perfmon) version: 3
    Number of core PMU generic (programmable) counters: 4
    Width of generic (programmable) counters: 48 bits
    Number of core PMU fixed counters: 3
    Width of fixed counters: 48 bits
    Nominal core frequency: 2700000000 Hz
    Package thermal spec power: 45 Watt; Package minimum power: 36 Watt; Package maximum power: 0 Watt;
    Detected Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz "Intel(r) microarchitecture codename Ivy Bridge"
    Executing "java -Xms4g -Xmx5g com.vyazelenko.blog.copyobject.TestClone" command:
    EXEC : instructions per nominal CPU cycle
    IPC : instructions per CPU cycle
    FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
    AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
    L3MISS: L3 cache misses
    L2MISS: L2 cache misses (including other core's L2 cache *hits*)
    L3HIT : L3 cache hit ratio (0.00-1.00)
    L2HIT : L2 cache hit ratio (0.00-1.00)
    L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
    L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
    READ : bytes read from memory controller (in GBytes)
    WRITE : bytes written to memory controller (in GBytes)
    TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
    Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP
    ——————————————————————————————————————-
    TOTAL * 0.39 0.65 0.60 1.30 51 M 67 M 0.23 0.30 0.57 0.05 N/A N/A N/A
    Instructions retired: 10 G ; Active cycles: 16 G ; Time (TSC): 3425 Mticks ; C0 (active,non-halted) core residency: 46.24 %
    C1 core residency: 11.19 %; C3 core residency: 0.01 %; C6 core residency: 0.00 %; C7 core residency: 42.57 %
    C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %
    PHYSICAL CORE IPC : 1.29 => corresponds to 32.32 % utilization for cores in active state
    Instructions per nominal CPU cycle: 0.78 => corresponds to 19.43 % core utilization over time interval
    ———————————————————————————————-

    view raw
    TestClonePCM.txt
    hosted with ❤ by GitHub

  • Constructor:
    "java -Xms4g -Xmx5g com.vyazelenko.blog.copyobject.TestConstructor" -nc -ns
    Intel(r) Performance Counter Monitor V2.5.1 (2013-06-25 13:44:03 +0200 ID=76b6d1f)
    Copyright (c) 2009-2012 Intel Corporation
    Num logical cores: 8
    Num sockets: 1
    Threads per core: 2
    Core PMU (perfmon) version: 3
    Number of core PMU generic (programmable) counters: 4
    Width of generic (programmable) counters: 48 bits
    Number of core PMU fixed counters: 3
    Width of fixed counters: 48 bits
    Nominal core frequency: 2700000000 Hz
    Package thermal spec power: 45 Watt; Package minimum power: 36 Watt; Package maximum power: 0 Watt;
    Detected Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz "Intel(r) microarchitecture codename Ivy Bridge"
    Executing "java -Xms4g -Xmx5g com.vyazelenko.blog.copyobject.TestConstructor" command:
    EXEC : instructions per nominal CPU cycle
    IPC : instructions per CPU cycle
    FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
    AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
    L3MISS: L3 cache misses
    L2MISS: L2 cache misses (including other core's L2 cache *hits*)
    L3HIT : L3 cache hit ratio (0.00-1.00)
    L2HIT : L2 cache hit ratio (0.00-1.00)
    L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
    L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
    READ : bytes read from memory controller (in GBytes)
    WRITE : bytes written to memory controller (in GBytes)
    TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
    Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP
    ——————————————————————————————————————-
    TOTAL * 0.35 0.64 0.55 1.31 51 M 68 M 0.26 0.29 0.53 0.06 N/A N/A N/A
    Instructions retired: 11 G ; Active cycles: 17 G ; Time (TSC): 3988 Mticks ; C0 (active,non-halted) core residency: 41.93 %
    C1 core residency: 11.82 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 46.25 %
    C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %
    PHYSICAL CORE IPC : 1.27 => corresponds to 31.76 % utilization for cores in active state
    Instructions per nominal CPU cycle: 0.70 => corresponds to 17.38 % core utilization over time interval
    ———————————————————————————————-

What this output shows is that clone case is faster because amount of instructions executed is lower, i.e. there is less code to execute:

  • TestClone:

    Instructions retired: 10 G ; Active cycles: 16 G ; Time (TSC): 3425 Mticks

  • TestConstructor:

    Instructions retired: 11 G ; Active cycles: 17 G ; Time (TSC): 3988 Mticks


Copy object in Java (performance comparison)

This blog came about because I was looking into performance issue and profiling showed that a method that was copying objects was very slow. (NB: The fact that this method was called millions of times was an actual bug not the slowness of the copy routine.)

The method to copy object was performing field by field copy using class.getDeclaredFeilds() method to obtain all fields of the class and then doing copy recursively for all super classes. Code below shows entire thing:

public static void copyFieldByField(Object src, Object dest) {
copyFields(src, dest, src.getClass());
}
private static void copyFields(Object src, Object dest, Class<?> klass) {
Field[] fields = klass.getDeclaredFields();
for (Field f : fields) {
f.setAccessible(true);
copyFieldValue(src, dest, f);
}
klass = klass.getSuperclass();
if (klass != null) {
copyFields(src, dest, klass);
}
}
private static void copyFieldValue(Object src, Object dest, Field f) {
try {
Object value = f.get(src);
f.set(dest, value);
} catch (ReflectiveOperationException e) {
throw new RuntimeException(e);
}
}

Profiling showed that this method spend 70% of it’s time in Field#copy() method. As it turns out that Class#getDeclaredFields() returns a copy of the Field[] array on every call, ouch! BTW, the JavaDoc of Class#getDeclaredFields() does not mention copy behavior at all.

List of approaches

Looking at the profiling results I started wondering what would be a better way to create a shallow copy of the object. Hence I needed to test performance of different approaches. For the simplicity I decided to look only on what is possibly with JDK 7 only (no fancy libraries) and came up with the following list:

  1. Clone object – implement Cloneable interface and publish clone() method
  2. Copy with copy constructor – copy fields from the source object directly in the constructor
  3. Copy via reflection (field by field)
    • getDeclaredFieds() as in original code
    • getDeclaredFields() cached, i.e. call it once and remember list based on the Class
  4. Serialization

    • Default serialization – just implementing Serializable
    • Custom serialization – implement Serialization and implement readObject() and writeObject() methods for reading and writing fields directly
    • Implementing Externalizable interface
  5. MethodHandles
    • Use MethodHandle#invoke(Object... args) method
    • Use MethodHandle#invokeWithArguments(Object... arguments) method
    • Use MethodHandle#invokeExact(Object... args) method

Notes on implementation

Entire project with source code and test results is available on github (copy-object-benchmark.git).

Benchmarks are written using JMH framework from OpenJDK. For each class tested there is a method in the benchmark class that simply invokes copy() method on the constant object representing the class. For example:

@GenerateMicroBenchmark
public Object copyFieldByFieldGetFieldsEveryTime() {
return FieldsCopy.INSTANCE.copy();
}
@GenerateMicroBenchmark
public Object copyFieldByFieldUseCacheFields() {
return CachedFieldsCopy.INSTANCE.copy();
}
@GenerateMicroBenchmark
public Object copyByClone() {
return CloneCopy.INSTANCE.copy();
}

view raw
CopyBenchSample.java
hosted with ❤ by GitHub

Every approach from the list above was benchmarked for 2 cases:

  • Primitive fields
  • Object fields

Because most of the approaches would incur significant overhead via boxing/unboxing.

For every kind of copy method there is a class that extends common super class (i.e. BaseClass) and implements copy() method. Since there are 2 use cases tests had to be duplicated in 2 different packages.

One last thing that is worth mentioning before I show the results is the handling of MethodHandle#invokeExact() case. This method is very special when it gets to invocation. Here is an example:

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
class Base {
int x = 13;
}
class Derived extends Base {
public long m;
}
public class TestMH {
public static void main(String[] args) throws Throwable {
MethodHandle mh = MethodHandles.lookup().findGetter(Base.class, "x", int.class);
Derived obj = new Derived();
// mh.invokeExact(obj); => throws WrongMethodTypeException:expected(Base)int but found (Derived)void
// mh.invokeExact((Base) obj); => throws WrongMethodTypeException:expected(Base)int but found (Base)void
int value = (int) mh.invokeExact((Base) obj);
}
}

As you can see in order to call MethodHandle#invokeExact() it is necessary to match exactly arguments and return type of the method, otherwise such invocation will fail at runtime.

However it is not possible to know upfront all possible classes and return types if you want to write generic copy method. Therefore we need to adjust MethodHandle so it can be invoked in a generic manner. This is done by adjusting MethodType signatures of the MethodHandle as shown below:

private static MethodHandle prepareGetter(MethodHandle mh) {
return mh.asType(mh.type().changeParameterType(0, Object.class)
.changeReturnType(Object.class));
}
private static MethodHandle prepareSetter(MethodHandle mh) {
return mh.asType(mh.type().changeParameterType(0, Object.class)
.changeParameterType(1, Object.class));
}

Basically this erases original information about declaration class and return types and instead uses Object for both thus allowing calling MethodHandle#invokeExact() providing instance of any class and expecting result as Object.

Results

I was running the benchmarks on my MacBook Pro laptop: OS X 10.9, 2.7 GHz Intel Core i7, 16 GB of RAM. Tests were performed on the following JDKs:

  • 1.7.0_25 (1.7.0_25-b15)
  • 1.7.0_45 (1.7.0_45-b18)
  • JDK 8 build 112 (1.8.0-ea-b112)

Results are reported as throughput (operations/ms) with biggest numbers being the best. All charts are in logarithmic scale.

Results:

  • JDK 1.7.0_25:
    jdk1.7.0_25_results
    Raw results are available here results-1.7.0_25.txt.
    jdk1.7.0_25

    So in this version of JDK the best 3 methods were:

    1. Clone
    2. Copy constructor
    3. Reflection

    Also MethodHandles are very slow comparing to other approaches.

  • JDK 1.7.0_45:
    jdk1.7.0_45_results
    Raw results results-1.7.0_45.txt.
    jdk1.7.0_45

    In the 1.7.0_45 release we see that MethodHandle#invokeExact() got much faster (~5x times) to the point that it made it to top 3 copy methods (for primitives case). Also MethodHandle#invokeWithArguments() got slower. But otherwise the rest remains the same.

  • JDK 8 build 112:
    jdk8_b112_results
    Raw results results-1.8.0-ea-b112.txt.
    jdk8_b112

    JDK 8 build 112 brings significant performance improvements over JDK 7 results in two areas:

    • Reflection
    • MethodHandles

    Most dramatic change was in the MethodHandler#invoke() case which now is more than 120x times faster! The changes were done originally proposed by John Rose on a JDK mailing list and implemented as part of the JDK-8024761: JSR 292 improve performance of generic invocation issue.

    Along the way MethodHandle#invokeWithArguments() got faster as well.

Conclusions

  • New JDK releases bring performance improvements
  • New JDK versions bring new functionality that can be used in place of old approaches (e.g. MethodHandles)
  • JVM is extremely good at optimizing existing code, i.e. reflection is still faster than invoke dynamic (at least for Object case)
  • Read JavaDoc but step through the code in the debugger and not just through your own code but also JDK code
  • Write performance tests upfront to know whether selected approach meets performance requirements
  • Run performance tests against new JDK versions to avoid/detect performance regressions

Updates

I’ve published second blog post Clone vs copy constructor – a closer look in which I analyze performance difference of clone and copy constructor approaches.


Devoxx 2012 trip report

Here it was another amazing Devoxx conference. Tons of new stuff to learn and a lot of new and familiar faces to discover.

As already became a tradition each year of Devoxx brings some innovation and this year wasn’t exceptional. The big theme of this year was NFC, it was used in the wristbands that each attendee had to wear. With such chips it was possible to get food but also to vote for talks after they complete. This was done via amazing piece of machinery (picture) that had and NFC reader and was sending HTTP POST to a server running and a Raspberry Pi device. The server was putting votes to the MongoDB and cost of the server is only 35 EUR (i.e. cost of Raspberry Pi itself)! It was so COOL!!!

It always pays off to go to Devoxx for full 5 days as during first 2 University days one can get a lot more deeper insights and have more access to the presenters. This year was not different in this respect as one could attend some sessions that were not repeated during conference. For example “Performance optimization methodology” (Performance Methodology part 1/2 and Performance Methodology part 2/2) by Kirk Pepperdine (@kcpeppe) and Alexei Shipilev (@shipilev) which was simply brilliant. If you want to learn more on the subject of performance tuning and how to find performance problems watch videos on Parleys.

Conference itself started on Wednesday and as already became a tradition opening keynote by Stephan Janssen had some surprises.
For one thing he showed video from the first Devoxx4Kids an event for kids from 10 to 14 years old which were hacking some robots and learning some programming. If you watch video you’ll see how happy were those kids. Later during JUG’s BOF it was agreed that JUG leaders will do such events in their cities. Which I think is a great way to bring more young boys and girls into programming.

Second announcement was about Devoxx UK an event which will take place in March 26-27, 2013 and will be organized by London JUG. The idea is to bring Devoxx experience to UK and also leverage great Java community of London (they have 3000 JUG members and are very active in the JCP and OpenJDK realms). Devoxx UK will be followed by Devoxx France (28.03-30.03) and thus speakers and attendees will have possibility to enjoy a week of Devoxx in 2 different countries and cultures! Just hop on EuroStar train and in 2 hours you are in Paris! (just bear in mind that Devoxx France is 75% done in French language)

Third announcement was about complete re-write of Parleys in HTML 5. So Flash is gone and now entire web site will leverage latest and greatest technologies. For one thing it will retain all functionality but will add more features. For example there is one full blown editor that allows editing presentation with slides and video. Have you ever tried to edit video clips in HTML, now you can. This version will be available in March 2013 before Devoxx UK. Which is great because with new Parleys they will also change upload rates of recorded videos – videos will be uploaded after sessions directly so one could enjoy conference experience real time! 😉

Next keynote was handed off to Oracle. Aside from corporate mantra about billions of devices running Java etc. there were couple of interesting demos. For one it was JavaFX demo that showed kiosk that was created for JavaOne (picture) and which features were used (re-write this sentence). Next was Joe Darcy talking briefly about upcoming JDK8 with lambdas and new Date-time API (JSR-310) besides that there will be plenty of other features like annotation on types (JSR-308) and repeating annotations (JEP-120). Afterwards we heard about upcoming Java EE 7 and it’s features and also plans for Java EE 8.

After Oracle stage was given to Neal Ford (@neal4d) and his “When Geek Leaks” presentation (video, unfortunately full version available to subscribers only however it would become available for free before Devoxx 2013). As always quality of the presentation was superb. Neal talked about concept of leaking or cross-pollination between different domains and how it can be good and bad. One of the examples was article in Forbes Now Every Company Is A Software Company which basically says that any company that wants to succeed in modern world needs to invest in software (e.g. should have strong mobile and web presence to stay ahead of competition). Neal used it as an example of software geek leaking into other domains and transforming them.

After the keynote the conference itself officially started and went for great 2.5 days. Devoxx did it again – delivered great experience and content. 😉 For the rest please check out videos on Parleys. Most of them are already free and more and more will be made available for free before Devoxx 2013.

P.S. Here is my list of best sessions (without any particular order):


Where to put tools.jar in JRE 7?

If you are distributing JRE with your application and you happen to also distribute tools.jar as well. The you have probably copied tools.jar to the jre/lib/ext folder.

This is viable option and have worked for me in the past (i.e. with Java 5 and Java 6). However this does not work in Java 7. If you do this in Java 7 you won’t get system compiler instance for example (i.e. executing code javax.tools.ToolProvider.getSystemJavaCompiler() will return null)

What I found out (painfully) that in Java 7 you have to put tools.jar directly to the jre/lib folder!!!

When you do this you can execute javax.tools.ToolProvider.getSystemJavaCompiler() and get a compiler instance as expected.


Updated (2012-07-10):
Oracle has accepted bug report for this problem. You can see it in Oracle’s bug database as 7181951.

Updated (2012-07-14):
Oracle has closed the bug with the following evaluation:

Closing as not a bug. That behavior was neither specified nor intended.
Posted Date : 2012-07-13 15:57:50.0

So I guess it means that the only true place for tools.jar is the jre/lib folder!


Ouch…multi-catch

In my previous post on Java 7 I’ve outlined some problems one might face when converting project to use it. Here I would like to outline issues I’ve discovered when I actually migrated code from Java 6 to Java 7.

Contrary to what I’ve been expecting, actual upgrade to Java 7 turned out to be bumpier. One of the reasons was converting code to use new Java 7 features. For that I’ve used NetBeans 7.2 RC1 and it’s “Inspect and transform” feature. (By the way Intellij IDEA also has capability of automatically converting code to Java 7)

Overall the tool did a great job and I was able to convert big project of several thousand classes in a matter of minutes. However since conversion resulted in more than 1000 classes modified it was not feasible to manually check all of them. As the result I’ve found about problems out only after nightly CI build have failed. Which brings me to the core of the post – a multi-catch conversion issues.

Multi-catch conversion pitfalls

So here is existing code from one of the classes before conversion to Java 7:

And here is the result of the conversion:

As you can clearly see the updated code is not semantically equivalent to the original code!!!

Before the change method rollbackTransaction(Throwable) would have been called in case of any exception (checked or unchecked), but after change it is called only in case if IOException or XMLStreamException is thrown.

As the result after such a change you get leaking transactions (zombies that are not closed). And this is kind of a bug that one would call a blocker or if you wish absolute top priority.

To me this conversion issue seems like a bug in NetBeans and a scary one. But it should be also a remainder to all of us to check the results of any automagic stuff that our tools are doing for us. 😉


Updated (2012-07-14):
I’ve submitted bug report to NetBeans team as Bug 215546. Let’s wait for evaluation on this one.


Recursive generics to the rescue

In this post I will show how something as non-intuitive (read ugly) as recursive generics can help to solve problem of creating extendable Builders.

But first we need to define what are we trying to achieve. We need to build set of Builder objects that facilitate creation of non-trivial domain objects and provide fluent interface (for the definition of fluent interface see Martin Fowler’s article) for user along the lines.

So let’s start with defining our 3 builder classes:

  • BaseBuilder
    BaseBuilder class
  • ConstraintBuilder
    ConstraintBuilder class
  • And finally ConcreteBuilder
    ConcreteBuilder class

Now we want to write client code that uses ConcreteBuilder. The code we want to have is the following:
Client code

Notice however that this code has compilation error:

"The method equalTo(int) is undefined for the type BaseBuilder".

Java compiler tells us that we are trying to invoke method on the class BaseBuilder whereas we are assuming to be working with class ConcreteBuilder.

OK, so how do we suppose to fix the code. I know of 2 possible solutions:

  1. Use covariant return types and override methods from base classes
  2. Use recursive generics when defining builder classes

Covariant return types

From wikipedia:

In object-oriented programming, a covariant return type of a method is one that can be replaced by a “narrower” type when the method is overridden in a subclass.

Thus knowing this we can fix our builders by overriding methods from parent classes and forcing them to return instance of the leaf class instead. So here is how our classes end up looking in the end:

  • BaseBuilder (not changed)
    BaseBuilder class
  • ConstraintBuilder
    ConstraintBuilder class
  • ConcreteBuilder
    ConcreteBuilder class

With these changes in place our client code now compiles and runs, but we had to do too much work in my opinion. Notice for instance that we had to override all methods from base classes in each and every derived class.

We are forced to do this whenever subclass adds new methods, cause we need to provide end user with flexibility of invoking methods in any order irrespective of where they are defined.
Also in this example it was not necessary to do this for ConcreteBuilder class but I showed it here for completeness, mainly to stress how much involved first approach is.

Enough for the ugliness, so let’s checkout second approach.

Recursive generics

OK, so is there better way to achieve the same goal without repeating same steps for each builder subclass? The answer is yes.

We can use generics while defining our builders to tell Java that return type of methods is not the builder’s class but rather the subclass of the builder, hence recursive generic definition.

Let’s see what it means for our builders:

  • BaseBuilder
    BaseBuilder class
  • ConstraintBuilder
    ConstraintBuilder class
  • ConcreteBuilder
    ConcreteBuilder class

And the client code haven’t changed at all, it still compiles and runs as before:
Client code

Let’s have a quick recap of what we just did. We changed definition of the BaseBuilder class to contain recursive generic parameter E (i.e. BaseBuilder<E extends BaseBuilder>). This allowed us change return type of the methods from BaseBuilder to E, which effectively tells Java compiler that method returns some subclass of BaseBuilder class.

Then we did the same for ConstraintBuilder class (i.e. ConstraintBuilder<E extends ConstraintBuilder>) which again tells javac that return type is some subclass of ConstraintBuilder class.

And finally we stopped recursion for class ConcreteBuilder by specifying itself as the generic parameter while extending ConstraintBuilder class (i.e. ConcreteBuilder extends ConstraintBuilder<ConcreteBuilder>). Since this class is at the bottom of our inheritance hierarchy we could stop using generic parameter. However if it would have subclasses on it’s own then we would have to repeat declaration similar to that of ConstraintBuilder class.

Notice how much less effort was to implement second solution. Despite of some minor noise such as usage of @SuppressWarnings("unchecked") annotation to eliminate compilation warnings and casts to E this solution is much cleaner, easier to understand and maintain. There is no code duplication that we saw in first solution and it scales well when the inheritance hierarchy is getting bigger and deeper.

To conclude:
Java’s generics can be real pain (especially when wildcards are used), but in some cases clever usage of generics can save your day! 😉


Is your project ready for Java 7?

Java 7 has been released last year, but how many of you use it in production or even in development environment? Have you even tried it at home? Why not?

Well, I’ll leave all this questions to the reader as in this post I would like to discuss issues that one might encounter when trying to upgrade existing project to Java 7.

Here is the list of questions I want you to ask yourself about your project:

  • Will it compile?
  • Will it run?
  • Will it execute with the same runtime semantics, i.e. will it run as before?

Some of the questions might sound silly but all they come from my attempt to upgrade existing project from Java 6 to Java 7.

Oracle has released interesting document Java SE 7 and JDK 7 Compatibility. This document defines Binary, Source and Behavioral compatibility policies as well as list exceptions to defined compatibility rules. What is interesting is that it defines incompatibilities between Java SE 7 and Java SE 6 as such:

Java SE 7 is strongly compatible with previous versions of the Java platform. Almost all existing programs should run on Java SE 7 without modification. However, there are some minor potential source and binary incompatibilities in the JRE and JDK that involve rare circumstances and “corner cases” that are documented here for completeness.

Compiling it

Back to our questions, so how can project not compile with Java 7 compiler (here I assume using -source=1.7 -target=1.7 compiler options)? The answer is source incompatibility which with Java 7 can have surprising incarnations. For example consider the following class:

class MessageService {
  String getMessage(String msgKey, Object...args);
  String getMessage(String msgKey, boolean includeKey, Object...args);
}

And it’s usage by some client class:

  messageService.getMessage("MSG108", false, arg1, arg2, arg3);

Here client attempts to invoke second method which allows him control whether or not message will have message prefix added. In this particular example client explicitly requests that he wants only actual message with prefix.

This code is perfectly OK for Java 6, but does not compile on Java 7. When I first saw it I went as far as to create bug report for Oracle (see 7115209), but as was expected Oracle closed the issue as “not a bug”. They provided explanation that this behavior is according to JLS and pointed me to the Java SE 7 and JDK 7 Compatibility document for explanation. And indeed this document contains issue named “Changes in Most Specific Varargs Method Selection” which describes in great detail the rationale behind the changes and use-cases.

So how can we fix code so it compiles. There are 2 possible solutions:

  1. Fix MessageService class definition so that client code compiles
  2. Change MessageService API and all clients that are using it (for example by creating two different methods getMessage and getMessageWithoutKey)

Which is the best approach depends on whether or not you have access to client code. In my case I didn’t have such luxury so I had to go with solution 1 which resulted in the following definition of the MessageService class:

class MessageService {
  String getMessage(String msgKey, Object...args);
  String getMessage(String msgKey, Boolean includeKey, Object...args);
}

As you can see the only different from the original declaration is that parameter includeKey is declared as Boolean instead of boolean. This is pretty cheap fix and will leave users of your class happy as they can continue using old API!

Running it

OK, so what about second question: “Will it run?”. The answer to this question basically depends on whether or not the libraries used in the project are ready for Java 7. This is especially true for libraries that manipulate bytecode. As described in the document in the section “Verification of Version 51.0 Class Files”:

Classfiles with version number 51 are exclusively verified using the type-checking verifier, and thus the methods must have StackMapTable attributes when appropriate. For classfiles with version 50, the Hotspot JVM would (and continues to) failover to the type-inferencing verifier if the stackmaps in the file were missing or incorrect. This failover behavior does not occur for classfiles with version 51 (the default version for Java SE 7).
Any tool that modifies bytecode in a version 51 classfile must be sure to update the stackmap information to be consistent with the bytecode in order to pass verification.

And that exactly what I observed with AspectJ library version 1.6.11 which failed at startup with the “java.lang.VerifyError: Expecting a stackmap frame at branch target 26 in method …“.

However luckily for me upgrade to version 1.6.12 solved the issue. The moral here is that you should check libraries that are in use within your project! 😉

Checking results

Last but not least question about runtime semantics. Surprisingly I came across an issue when running unit tests as some of the tests started to fail on Java 7 but are fine on Java 6. The answer to this behavior could also found in the document, section “Order of Methods returned by Class.get Methods can Vary”:

In JDK 7, build 129, the following reflective operations in java.lang.Class changed the fixed order in which they return the methods and constructors of a class:

getMethods
getDeclaredMethods
getDeclaredConstructors

This may cause issues for code that assumes (contrary to the specification) a particular order for methods or constructors.

So with this change the order of tests execution changed and as the result showed actual problem with my code – i.e. dependencies between tests. Which is btw an anti-pattern as defined in the book by Gerard Meszaros xUnit Test Patterns: Refactoring Test Code. Thus I was alble to fix actual problem looking at the symptoms! 🙂

Moral of this story: be ready to spend some effort when upgrading to Java 7. It is better to start early and prepare infrastructure without conducting actual update. For instance I started using Java 7 on the CI server which actually showed all the issues listed above and now I’m pretty certain that actual switch from Java 6 to Java 7 will be smooth.

I hope this information will be useful for some of you. And I’m waiting for your feedback!


Updated 2012-07-18:

Beware that JDK 7 uses new version of JAXB. As of this writing latest JDK version from Oracle is 1.7.0_05 which is shipped with the JAXB 2.2.4u1.

This version of JAXB contains a bug JAXB-871 which leads to broken serialization of the objects (if base class property is overridden by any single class, it won’t be delivered to all other sub-classes of the base class sub-classes)!!!

The workaround for this issue is to use standalone version of JAXB that does not have it. For example one can use version 2.2.3u2. Just add jars into classpath of your application!

Note: In order to find out which version of the JAXB and JAX-WS is in your JDK installation run the following commands:

jdk_home/bin/wsgen -version
jdk_home/bin/xjc -version

Here is sample output:

c:\Program Files\Java\jdk1.7.0_05\bin>wsgen -version
JAX-WS RI 2.2.4-b01

c:\Program Files\Java\jdk1.7.0_05\bin>xjc -version
xjc 2.2.4


Latency Tip Of The Day

"Nothing is more dangerous than an idea when it is the only one you have." (Emile Chartier)

Psychosomatic, Lobotomy, Saw

"Nothing is more dangerous than an idea when it is the only one you have." (Emile Chartier)

"Nothing is more dangerous than an idea when it is the only one you have." (Emile Chartier)

Mechanical Sympathy

"Nothing is more dangerous than an idea when it is the only one you have." (Emile Chartier)