Friday, December 19, 2008

Java Memory Management

Notes: 
  • Soft Ref:
    • GC is called after
      • JVM needs to claim more space.
      • Just before MemoryOutOfException
    • Usage
      • Memory sensitive caches.
  • Weak Ref:
    • GC is called after
      • They are reachable by client.
    • Usage:
      • Container who takes the responsibility of cleanup.
  • Phantom Ref:
    • GC is called after
      • Finalization.
    • Usage:
      • Special clean up

  1. Soft references can be deleted from a container if the clients are no longer referencing them and memory is tight.
  2. Weak references are automatically deleted from a container as soon clients stop referencing them.
  3. Phantom references point to objects that are already dead and have been finalised.
Soft vs Weak vs Phantom References
Type Purpose Use When GCed Implementing Class
Soft Reference Keeps objects alive provided there's enough memory. to keep objects alive even after clients have removed their references (memory-sensitive caches), in case clients start asking for them again by key. After a first gc pass, the JVM decides it still needs to reclaim more space. java.lang.ref.SoftReference
Weak Reference Keeps objects alive only while they're in use (reachable) by clients. Containers that automatically delete objects no longer in use. After gc determines the object is only weakly reachable java.lang.ref.WeakReference
java.util.WeakHashMap
Phantom Reference Lets you clean up after finalization but before the space is reclaimed (replaces or augments the use of finalize()) Special clean up processing After finalization. java.lang.ref.PhantomReference
Type of Codes
  • Uncode - 16 bit backed by ISO
  • ASCII Code - 8 bit backed by ASCII
  • Hashcode - 2 bit
Memory allocation
Stack -
  • created per thread to store variables.
  • for method processing. method variable.
Heap -
  • instance variable, new () and objects.
  • Reason for OutOfMemoryException .
Notes:
  • Memory Leak - Only in thead dead lock or cyclic ref. 

References: 

Tuesday, December 16, 2008

JVM Exception Handling

Two type of exceptions:
  • Checked Exception (Runtime)
  • Unchecked Exception (Compile time)
  • Error: Env related JVM exception
Class structure
  • Throwable
    • Error
    • Exception
      • RuntimeException
  • Coding Guideline:
    • If you have valid reason to catch all the exception, catch Throwable instead of Exception.
    • It is required in managed runtime applications where you execute external component which may be buggy.
  • Checked Exception: A method throws that exception and forces user to deal with it.
  • Unchecked Exception: A method does not throw this exception as it can deal with this exception except Error/Runtime exception.
  • Error/Runtime exception are uncheck exception and given to JVM to deal with.
  • Exception Table: Each class maintain this table. If a exception is throw, jvm immediatly look up to this table for further action.
    • Start: PC counter start of block
    • End: PC counter end of block + 1
    • Target: Constant Pool index for class signature
    • Type: Class Type

  • JVM Process of Exception handling
  1. If exception is found, jvm set pc to new offset location to continue the execution, otherwise, it pops the current stack frame, and rethrow same exception.
  2. After current frame pop off, it return to frame who call this method.
  3. Again it throw same exception on parent method and let it handle.
  4. Process continues.
References: 

JVM Method Invocation

Java virtual machine has changed the dynamics of normal computer processing in terms of understanding JVM functions. Unless, you dont understand the structure and flow of this VM, it will be very difficult to write a right code. Here I talk about method invocation of VM.
Java provides two type of method calls.
Instance methods
It requires an instance before they can be invoked and uses dynamic (late) binding. When JVM invokes an instance method, it selects the method to invoke based on the actual class of the object, which may only be known at run time.
invokevirtual
indexbyte1, indexbyte2
pop objectref and args, invoke method at constant pool index
invokestatic
indexbyte1, indexbyte2
pop args, invoke static method at constant pool index

Class methods
It does not requires require instance and uses static/early binding. When JVM invokes a class method, it selects the method to invoke based on the type of the object reference, which is always known at compile-time.

Instance method
Class method (static)
Runtime
Compile Time
Dynamic/late binding
Static/early binding
Invoked based on actual obj.
Invoked based on type of obj ref.
jvm opcode: invokevirtual
jvm opcode: invokestatic
 Method Execution:
JVM is stack driven. Each method has a dedicated stack frame on a thread's stack and invocation of method calls to push a new frame onto existing stack and return popping of it.
Instance Method: Pushing a Frame 
  1. Pop of objref from current frame.
  2. Pop all arguments from current frame's operand stack. 
  3. Create a new stack frame on current frame.
  4. Push the objref on local var stack location 0.
  5. Push the args on local var stack from location 1 onward.
  6. JVM changes the PC to new frame method code.
    Static method: Pushing a Frame 
    1. Pop all arguments from current frame's operand stack. 
    2. Create a new stack frame on current frame.
    3. Push the args on local var stack from location 0 onward.
    4. JVM changes the PC to new frame method code.
      Memory allocation for stack frame can be contiguous/overlapping. This mean, top of oprand current frame is bottom of new frame to allow sharing of objRef and vars in same fashion as specified above.

      For native method, calling frame is just left as it is and it waits to resumption.
      Two more Method Invocations:
      • invokespecial: Based on type of ref. To invoke



        • invocation of instance initialization () methods
        • invocation of private methods
        • invocation of methods using the super keyword

        • invokeinterface: is used to invoke an instance method given a reference to an interface.
        Method Table: 
        JVM creates them for each method to locate actual bytecode location/offset for execution. It makes difference in speed of execution depending upon the type of ref. Method call invokevirtual/static lookup gets resolved directly through direct reference and can maintain this offset in this table, but for invokeinterface, each call has to scroll through this table to find ref each time.
        But all these speed issues are left to JVM designer to solve. So, nothing is guaranteed and a statement unless you base it with some version.
        Dynamic Linking:
        Java programs are dynamically linked, references to methods initially are symbolic. All invoke instructions, such as invokevirtual and invokestatic, refer to a constant pool entry that initially contains a symbolic reference. The symbolic reference is a bundle of information that uniquely identifies a method, including the class name, method name, and method descriptor. A method descriptor is the method's return type and the number and types of its arguments. The first time the Java virtual machine encounters a particular invoke instruction, the symbolic reference must be resolved.
        To resolve a symbolic reference, the JVM locates the method being referred to symbolically and replaces the symbolic reference with a direct reference. A direct reference, such as a pointer or offset, allows the virtual machine to invoke the method more quickly if the reference is ever used again in the future.  
        References: 

        Java Garbage Collection

        Java memory management does not specify any methodology rather mention objective of GC which cleans up JVM to free memory. GC is a background thread which can become active with higher priority at a non-predictable time. But developers have programmatic options to force a GC at required time.
        System.gc() or Runtime.gc()
          Although, separate GC methodology frees up JVM from memory management and let it stay focus on sheer execution of program, but GC has overhead of tracking object life-cycle and doing the cleanup behind the scene without disrupting program execution.

          GC has three major responsibilities:
          1. Object allocation and Cleanup
          2. Preventing Heap fragmentation
          3. Reducing page fault in virtual memory
          GC main job is detect the garbage object and reclaim the heap by destroying these objects. In JVM, all the objects are under one root and if object is reachable from this root they are live object otherwise garbage. So, reachability from is the root is the measure of GC operations.

          For GC, its all about ref, if you have, stay alive, or get killed. And there are three types of ref can be maintained to stay alive
          1. Direct Object Ref.
          2. Stack variable obj ref.
          3. Constant pool ref. - string or objs.
          Reference counting is a continuous process for GC for each object and it has to keep updating it. Here I present first GC version algorithm for GC.

          if (obj = new()) refCount++;
          if (obj = objRef) refCount++; 
          if (obj = null) refCount--;
          if (obj = newObjRef) refCount--;
          if (refCount == 0) { 
              this.GC();
              removeRefThis();
          }

          This algorithm is good for real-time applications where pause to do GC is prohibited. But this has many disadvantages like it may fail for cyclic reference situation.
            It followed by next version of Mark and Sweep Algorithm which works on tracking collector phenomenon . Here I present marking and sweeping process.

            Marking:
            • Travel obj graph to all reachable object and mark them.
            Sweeping:
            • Clean up unmarked obj.
            • Runs finalize on all the unreachable objs.
            • Maintain all referenced objects on heap till finalize method completes.
              Above algorithm bring another issue of Heap Fragmentation. To solve this, there is another processing of Compacting collectors. Its job is to move marked object over free space. This process becomes easy because Java never deals with objects directly rather through a obj ref table only through object ref table. To achieve this, it divides heap memory into two and utilize only one part of it for its operation. When this gets filled, it stops all operations and copies all live objects to other part and resumes operations. It is simpler way to de-fragment heap but drawback is that it keeps only half of memory ready for allocation.
              ======================= Old Notes ====================================
              When Java was originally developed, the JDK shipped with a mark-and-sweep garbage collector. A mark-and-sweep garbage collector proceeds in two phases:
              1. Mark: identifies garbage objects
              2. Sweep: reclaims the memory for the garbage objects
              Garbage objects are identified by traversing references from the current application stack frames; unreachable objects are assumed to be garbage.
              Mark and sweep is a "stop-the-world" garbage collection technique; that is, all application threads stop until garbage collection completes, or until a higher-priority thread interrupts the garbage collector. If the garbage collector is interrupted, it must restart, which can lead to application thrashing with little apparent result. The other problem with mark and sweep is that many types of applications can't tolerate its stop-the-world nature. That is especially true of applications that require near real-time behavior or those that service large numbers of transaction-oriented clients.
              Because of these problems, Sun Microsystems' Java HotSpot VM split the heap into three sections and added three garbage collection techniques. Splitting the heap allows different algorithms to be used for newly created objects and for objects that have been around for a while. This technique is based on the observation that most Java objects are small and short-lived. The heap's three sections are:
              1. Permanent space: used for JVM class and method objects
              2. Old object space: used for objects that have been around a while
              3. New (young) object space: used for newly created objects
              The new object space is further subdivided into three parts: Eden, where all newly created objects go, and survivor spaces 1 and 2, where objects go before they become old. The survivor spaces make it easier to use copy-compaction with young objects; more details later.
              The J2SE 1.3 garbage collection techniques are:
              1. Copy-compaction: used for new object space.
              2. Mark-compact: used in old object space. Similar to mark and sweep, mark-compact marks all unreachable objects; in the second phase, the unreachable objects compact. This technique avoids fragmentation problems and works well when the garbage collector runs infrequently.
              3. Incremental garbage collection (optional): Incremental GC creates a new middle section in the heap, which divides into multiple trains. Garbage is reclaimed from each train one at a time. This provides fewer, more frequent pauses for garbage collection, but it can decrease overall application performance. Incremental garbage collection can be enabled with the -Xincgc command-line option.
              All of these techniques are stop-the-world techniques. Though incremental garbage collection makes this effect less obvious, the application threads must still stop. That proves problematic for applications that can't afford to pause for garbage collection.
              Garbage collection is based on live objects; that is, those reachable from the current stack space. Live objects are copied from new object space to survivor space (1 or 2), and then from survivor space to old object space. The amount of time objects spend in survivor space can be controlled with command-line parameters (see Tables 2 and 3 below).
              The garbage collector typically runs in a low-priority thread, attempting to reclaim memory when the application is idle. This is fine for applications that regularly have idle time, such as graphical user interface (GUI)-driven applications. Unfortunately, if there is little or no idle time, the garbage collector may not get a chance to run.
              Garbage collection can also be triggered if the heap's subregions are nearly full. In this case, the garbage collection thread's priority increases, thus increasing the chance that the garbage collection will run to completion. If the new generation is full, a minor collection is triggered; if the old generation is full, a major collection is triggered. The steps in a minor collection are:
              1. Copy objects from Eden to survivor space (1 or 2).
              2. Copy from survivor space 1 to survivor space 2, or vice versa. After a certain number of copies (controllable from the command line), an object becomes tenured, that is, a candidate for old object space.
              3. Tenured objects move from survivor space 1 or 2 to old object space.
              A major collection uses the old generation garbage collector (mark-compact for J2SE 1.3) to reclaim old objects.
              Reference Counting
              • It is associated with program to execute in chunks.
              • Advantage:
                • Don't interrupts program for long.
                • Good for real-time application.
              • Disadvantage:
                • Overhead of incrementing and decrementing ref counter
                • Cant detect cyclic refs.
              References: 

              Java class lifestyle

              The Java class file lifestyle
              • JVM reads above info by reading prefacing info: variable-length information.
              • Info in class file are without any borundry between class components.
              • JVM has maximum number it can load.
              • Method Signature return type and argument types.
              Major components of a Class File:
              1. magic - 0xCAFEBABE (4 bytes)
              2. version - Major and Minor Verison (4 bytes)
              3. constant pool - Starts 9th byte, variable
              4. access flags - 2 bytes
              5. this class - 2 bytes
              6. super class
              7. interfaces
              8. fields
              9. methods
              10. attributes
              Constant Pool:
              • constant_pool[n]
              • It is organized in an array of variable-length elements.
              • Class file refers to any constant with index in this array.
              • Index start with 1 goes till n. constant_pool[1] -> constant_pool[n]
              • This array is preceded by its array size.
              • First one bytes of this array will be tag of type of constant that follows. E.g. String will start with two bytes as the length of string.
              Types:
              • Size: 2 bytes
                • literal strings, final variable values, class names, interface

              • names, variable names and types, and method names and signatures.
              Access Flag:
              • Size: 2 bytes
                • Class level access modifiers.
              • Type of info:
                • a class or an interface, public or abstract, final
              this class:
              • Size: 2 bytes
                • It has index value for constant pool where 'this' class is defined.
              • tag[CONSTANT_Class]/this (this_index) = constant_pool[this_class]
              • tag[string]/class|interface name = Constant_pool[name_index]
              Super Class
              • Size: 2 bytes
                • It is again index into constant pool.
              • tag[CONSTANT_Class]/super class (name_index) = constant_pool[super_class]
              • tag[string]/class|interface name = Constant_pool[name_index]
              Interfaces:
              • Two bytes -
                • Defined the number of interfaces implemented.
                • Followed by an array of index in constant pool.
              • tag[CONSTANT_Class]/interface (name_index) = constant_pool[interface]
              • tag[string]/class|interface name = Constant_pool[name_index]
              Fields:
              • Two Bytes:
                • Indicating number of fields for class/interface.
                • Followed by structure in a variable length array.
              • Field Structure:
                • name, type, final?, it constant value
              • It just maintain its own info, not any parents info.
              Methods:
              • Two Bytes:
                • Indicating number of methods for class/interface. Not inherited one.
                • Followed by structure in a variable length array.
              • Method Structure:
                • method descriptor (its return type and argument list),
                • the number of stack words required for the method's local variables,
                • the maximum number of stack words required for the method's operand stack,
                • a table of exceptions caught by the method,
                • the bytecode sequence,
                • a line number table.
              Attributes:
              • Two Byte count followed by variable length array.
                • Source File Name: ...

              Java Security

              Java's security architecture:
              • Sandbox restriction on Applet:
                • Reading or writing to the local disk
                • Making a network connection to any host, except the host from which the applet came
                • Creating a new process
                • Loading a new dynamic library and directly calling a native method
              • Fundamental Component for JVM Sandbox Security:
                • JVM security features
                • Class loader Arch
                • Class File Verifier
                • Security Manager.
              • JVM Safety
                • Type safe ref casting
                • Structured Memory access
                • Auto GC
                • Array bound checking
                • Checking refernce for null.
                • Structure Error Handling using Exception.
                • Unspecified memory layout in JVM.
              • It prevent two type of security breach.
                • Memory curraption causing crash.
                • Restricted memory access to prevent control of flow change. Etc. Class loader can be change if access is allowed to its pointer.
              Facts:
              • Even Bytecode instruction set does not allow you to have direct memory access.
              • Only native method call can open some gate to direct memory access.
              • SecurityManager can enforce the rule for not allowing dll to be loaded.
              • Exception thows prevent program to crash and just offend the running thread.
              • Sandbox can be customized by extending java.lang.SecurityManager and overriding the methods.
              Security and the class loader architecture
              • Class loader is kind of gate keeper for an organization. It allow classes into JVM and hold the responsibility of it.
              • There can be many class loader in one JVM.
              Two Type of Class loaders:
              • Primordial Class Loader
                • Part of JVM impl
                • Written in C
                • Only one
                • Trusted
              • Class Loader (User Defined)
                • Untusted
                • User defined
                • Written in Java
                • Loaded as an object call.
              Class Loader (User) :
              • It allows you to dynamically extends java app at run time.
              • JVM maintains the scope of Class loader and loads all the requested classes under same class loader.
              • It also restricts the visibility of classes under its umbrella. This is way multiple Name Space can be maintained in one JVM.
              • Application can allow talk between two class loader, but they cant directly refer each other.
              • It help an untrusted code to get access to trusted Class loader scope access.
              • Class loader should ask to primordial loader to load class on first before attempting.
              • Class loader does not allow malicious code to be said trusted which can maitain the integrity of sandbox. Name space is another reason for same this.
              Applet Class Loader: For each network domain, browser JVM maintain seprate class loader.
              Two Main Objectives for Class Loader:
              • It should not allow malicious code to sneak in.
              • It should also protect borders of trusted class library.
              Guidelines for writing a Class Loader
              1. If packages exist that this class loader is not allowed to load from, the class loader checks whether the requested class is in one of those forbidden packages mentioned above. If so, it throws a security exception. If not, it continues on to step two.
              2. The class loader passes the request to the primordial class loader. If the primordial class loader successfully returns the class, the class loader returns that same class. Otherwise it continues on to step three.
              3. If trusted packages exist that this class loader is not allowed to add classes to, the class loader checks whether the requested class is in one of those restricted packages. If so, it throws a security exception. If not, it continues on to step four.
              4. Finally, the class loader attempts to load the class in the custom way, such as by downloading it across a network. If successful, it returns the class. If unsuccessful, it throws a "no class definition found" error.
              Security and the class verifier

              Class verifier is the part of JVM which get executed before program. It ensures the robustness by preventing illegal jumps by goto or ifeq.
              Two phase of Class verifier:
              1. Internal Checks
              After class loaded, it check the integrity of byte codes and internal structure of class file. Otherwise reject the class file with an error.
              • Proper format and internal consistency
              • Look for magic number
              • Ensures the length of class file.
              • Ensures the component struct.
              • Ensures super class hierarchy.
              • Check for operands valid values and type.
              • Method invocation with right nu and type of parameters.
              2. Byte code verifier
              During execution of bytecodes, it checks for existance of symbolic references.
              • Check for ref. class/method ref.
              • Consolidate to resolve dynamic linking process.
              JVM Process:
              1. Finds the class being ref.
              2. Replace Symbolic link with direct ref.
              Other Concepts:
              • JVM maintains direct ref for all class after first initialization.
              • Symbolic ref are created by compiler which contains details like class, method, and deceleration .
              • JVM job is the enforce during execution.
              • If voilated, throw no such method.
              • This is a compiler job but should be done to avoid package-wise deployment.
              • Binary compatible that symbolic ref changes to class file will be caught before execution.
              Java security: How to install the security manager and customize your security policy
              • Security Manager (SM) is a java class which can be overridden.
              • It has lots of "check" methods to check almost every thing on a
              JVM instance.
              SM does not have policies for following two major event.
              1. Allocating new memory.
              2. Creating new thread.
              Other custom SM feature required.
              1. Applet image load
              2. Applet email capability.
              3. Applet speaker access.
              There are two step for JVM to take any action.
              1. Determine if SM is installed.
              2. If yes, let SM to decide about operation authorization.
              Other facts:
              • Security manager throws SecurityException and halt the program.
              • One JVM can only have one Security Manager installed.
              • But JVM allows you to have multiple security policy.
              • It also allows you to take action depending on Class loader.
              • Helps you to have different access policy for network, disk files.
              • Java strategy to be an open source adds up to overall Security of platform.

              Internals of JVM

              When I stated programming Java, Virtual machine concept always fascinated me by its mysterious dealings. Over the years, I have got some insight into it. Here I am sharing my learnings.

              JVM is a stack-based execution machine design to helps keep the JVM's instruction set and implementation small. This means that all the execution happens on stack where registers are just to keep the pointers to stack.

              Stack Frame
              Each executing thread has its own stack. Executing thread executes methods for which it allocates a section of stack, called Stack Frame, to store its passing and returning arguments of method. For currently executing method, this is the top of the stack section. So, this makes a Stack, a Stacks of Stack Frames.
              Each Stack Frame has three sections to store data for an executing methods.
              • Operand stack - Workspace for bytecode instructions from method area. (oparand and results)
              • Execution environment - Used to maintain the operations of the stack itself.
              • Local variables - Contains all the local variables being used by the current method invocation
              Register
              JVM registers store four 32bits address to control program execution on stack.
              • PC register (Program Counter):  Address of execution instruction in method area.
              • Optop register- Points to operand stack of SF. (Top of stack)
              • Frame register - Points to execution Env of SF.
              • Vars register - Points to local variable in SF.
              Heap GC Area: 
              This is where all the gets created by "new", lives objects as long as they have ref and also get murdered by GC if they loose their ref.

              Method Area:
              In Java, every thing is method. So, all executing bytecode stays in method area and get executed using PC counter (register).

              References: 

              Syncing a Thread

              Thread is one of the most complicated aspect of Java programming. But it can become very simple if we have insight about thread implementation at JVM level. Here I describe how.

              JVM assign one stack for each thread. This stack holds the local variables, object ref and stack frames.  Through this stack, thread performs all the operation execution.

              All threads in a JVM shares two main shared memory. 
              • Heap: Contains all the objects
              • Method Area: All the bytecode and static variables of classes
              During the execution, threat needs to acquire "Lock" before processing any code on class/object.
              • Object level lock - synchronized(this) {...}
              • Class level lock - static synchronized getx() {...} 
              Both these lock are object level but Object Level lock is on same object instance where class level lock on Class's Class object.  (JVM loads all the class first by creating a Class object on them). Locking a class means that you are locking the Class object. And for object lock, synchronized keyword takes an argument as the object on which it locks.

              Thread Monitor:
              JVM has two special opcodes to synchronize a block. 
              • monitorenter: Pop objectref, acquire the lock associated with objectref
              • monitorexit: Pop objectref, release the lock associated with objectref
              These monitor operates on object level. JVM has monitor to watch the execution and manage the lock system. Locks can be acquired multiple times on same object. It means that an thread can go from one synchronized block to another sync block in the same object or other objects of same scope. Each monitor holds a queue of waiting thread.

              Thread Communication: 
              Thread has Sleep, wait and notify methods to synchronize between each other.
              • Sleep - Holds the lock on monitor
              • Wait -  Release the lock on monitor
              • NotifyAll -  Wake up all waiting thread
              • Notify: Wakes up only on thread in arbitrarily for that obj.
              Thread synchronizes each other over shared resource using wait and notify. To get into queue of thread monitor, call the sync method or call wait() while using obj. Queued threads get chance to execute either by method return or processing thread calls wait(). Notify will unfrozen the waiting threads. Shared state/behavior should be synchronized. If a thread has to wait for a state to change, it should wait inside the class by wait() method. Thread should call notify if the shared state is changed.

              Volatile vs Synchronized
              At start of thread execution, threads have their local copies of variable on stack and main (method) memory can maintain its own. But after thread execution of a block, threat perform its responsibility to update local stack variable values to main memory. But there are more concept on top of this normal behavior.
              • In synchronized block: Main memory variables are not accessible by any other threads due to object lock by thread. 
              • In volatile variable: Thread reference volatile variable directly from main memory which avoid dirty read in absence of synchronized block.
              But both provides safety from dirty read, one by the block other by variable.

                References: 

                JVM Control Flow

                All if, if-else, while, do-while are compiled into "if" control block of bytecode.
                ifeq branchbyte1: Control flow
                Pop one from stack.
                Compare it with zero.
                If it is equal to zero, jump to specified offset.
                Otherwise follow default PC.
                if_cmpeq branch1, branchbyte2:
                Pop two from operand stack.
                Compare them.
                If they are equal, jump to jump to specified offset.
                Otherwise follow default PC.
                Switch stmt can only has int value as it makes them location based.
                Switch statement is compiles into "tableswitch" and "lookupswitch".
                tableswitch: It will look for lower and higher value, apply formula (Key - Low + 1) to know offset of the block.
                lookupswitch: It has a pair for branch value and its offset. So, program scrolls through all the available pairs.
                Tableswitch is consider to be efficient as it has lesser number of comaprision processing.
                But decision between is done by compiler by knowing the final instruction length.
                There are many jumps for a switch of short series.
                References:

                Saturday, December 13, 2008

                Constant, Flowting point, Array and Objects, Randams


                About Constant in JVM

                • Java Tool: Java Class Disassembler
                  • javap -c .. class
                • Constant in java mean that the ref to a variable will not change. But the value pointed by ref can change.
                  • final static a = B.b; // value of a can change depending on value of B.b.

                Randam:

                • Compiler does all the rules checking for a given code and generates the code accordingly.
                • Interface are not initialized till they are referenced from the code.
                • Super classes are always initialized before object initialization.
                • Dependency Direction
                • It will always be left to right.
                • Expression evaluation happens left to right.
                • Generated byte code looks same either for interface or class ref call. It just the way classloader work with the instructions.
                • Class Byte Code:
                • It load static code area.
                • Initiate the object.
                • Static methods.

                Primitive data types:

                • word (32 bits)
                • byte (8 bits),
                • short (16 bits),
                • int (32 bits),
                • long (64 bits),
                • float (32 bits),
                • double (64 bits),
                • char (16 bits) Unsigned Unicode
                • Object handle : 32-bit address

                Arrays and Objects:

                • JVM deals with three forms:
                • objects -> garbage-collected heap
                • object references
                • primitive types
                • Java stack as local variables.
                • On the heap as instance variables of objects.
                • In method area as class variables.
                • Object can not be local variable instead they can be be obj ref. But actual obj ref comes after initialization by jvm.
                • Array are a collection of objs and jvm deals with them as individual obj. But still maintains a separate set of instruction.

                Floating Point Arthmetic In JVM


                Floating Nu Representation in Decimal:
                • sign * mantissa * radix exponent
                Normalized Rule:
                • (1/radix <= mantissa
                Non normalized Rule:
                • 0
                Floating Nu Representation in binary:
                • sign * mantissa * 2 exponent
                Bit layout of Java float
                Bit layout of Java float
                s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm

                • Float: 1 | 8 | 23 |
                • Double: 1 | 11 | 52 |
                • s - sign bit
                • m - mentisa - Positive integer
                • e - exponent = All ones = Infinity/NaN
                • All zero = Denormalized floating nu.
                • It has an additional bit precision with metisa.
                • If (e == all zero) most significant Bit of mentisa is considered as zero. Otherwise, it is considered to be one.
                • two’s-complement numbers: msb is sign bit.
                Special float values
                Value
                Float bits (sign exponent mantissa)
                +Infinity
                0 11111111 00000000000000000000000
                -Infinity
                1 11111111 00000000000000000000000
                NaN
                1 11111111 10000000000000000000000

                • Unbiased Exponent Float = exponent - 126
                • Unbiased Exponent Double = exponent - 1023
                References: 

                Normalized float values
                Normalized float values
                Value
                Float bits (sign exponent mantissa)
                Unbiased exponent
                Largest positive (finite) float
                0 11111110 11111111111111111111111
                128
                Largest negative (finite) float
                1 11111110 11111111111111111111111
                128
                Smallest normalized float
                1 00000001 00000000000000000000000
                -125
                Pi
                0 10000000 10010010000111111011011
                2
                Denormalized float values
                Value
                Float bits (sign exponent mantissa)
                Smallest positive (non-zero) float
                0 00000000 00000000000000000000001
                Smallest negative (non-zero) float
                1 00000000 00000000000000000000001
                Largest denormalized float
                1 00000000 11111111111111111111111
                Positive zero
                0 00000000 00000000000000000000000
                Negative zero
                1 00000000 00000000000000000000000

                Friday, December 12, 2008

                Bytecodes Theory

                My fav subject in college was ASM programming. Why because, it was fun to deal with bits & byte and it was most raw form of programming. But Java has taken us far away from Real Byte World to Virtual Bytes World. Lets know what is this virtual byte world.

                Java compiles into a stream of bytecode which is a sequence of instructions. One byte has one opcode and its zero or more operands. JVM can address 32bit memory address which allows it to reach only 4GB virtual memory.

                Its opcodes are human readable mnemonics like Mul, Sub, iconst etc. These bytecode are big-endian hex code. Bytecode's operand usually points to an index on stack (local variable section) which points to real variable either on heap or method area.

                Here are few bytecodes for brain walk.
                Basic Stack Operations:
                • iconst_1 - Setting value for constant as 1.
                • sipush 01 00 - Push short onto stack
                • bipush 01 - Push byte onto stack
                • iconst_m1 = -1
                • iconst_0 to iconst_5
                • fconst_0 to fconst_3
                • lconst_0 - lconst_1
                • dconst_0 - dconst_1 - Two slot on stack
                • aconst_null
                • iload, lload, fload, dload - Loads local variable onto stack
                • ldc1, ldc2, ldc2w : Loads the variable from constant pool of class onto stack.

                Memory Access:
                • astore_1 - Store in method area stack. Starts from 0.
                • aload_1 - Load from method area.
                • getstatic - access static variable for a class.
                • putstatic - stores static variable for a class.
                • putfield: Saving a member variable for an object.
                • getfield: Getting a member variable for an object.
                Operator:
                • iadd - integer addition of two operands.
                • iconst_0 - Push a zero onto the stack.
                • goto - with operand (16-bit signed offset) to jump from current location.
                Method access:
                • invokespecial - default class method calls
                • invokevirtual - User defined method calls
                • new and dup - New object creation
                Load variable form var stack onto oprand stack
                • iload_0 - iload_3 - Load first int variable on to the stack.
                • fload_0 - fload_3 - Load first float variable on to the stack.
                • lload, dload - Of similer type as above for long and double
                • aload - Object ref
                • iload vIndex - Load vindex int variable on to the stack.
                • fload vIndex
                • change Load to store - It will do the reverse.
                Wide Management:
                • wide - expand the following instruction oprand lenght. Is used
                • like , wide 01, iload 11; Means: Load hex 0110 th variable to oprand stack.
                • Large to small type conversion is allowed on for int.
                References: 

                Thursday, December 11, 2008

                Concurrent Collection in Java

                package: java.util.concurrent
                List:
                CopyOnWriteArrayList: 
                List implement on Array. It provides copy-on-write semantics where each modification of the data structure results in a new internal copy of the data (writes are thus very expensive). Iterators on the data structure always see a snapshot of the data from when the iterator was created.


                Set: 
                CopyOnWriteArraySet Class 
                Set implemented on Array. It provides copy-on-write semantics where each modification of the data structure results in a new internal copy of the data (writes are thus very expensive). Iterators on the data structure always see a snapshot of the data from when the iterator was created.
                ConcurrentSkipListSet Class  
                Set Implemented on List. It (added in Java SE 6) provides concurrent access along with sorted set functionality similar to TreeSet. Due to the skip list based implementation, multiple threads can generally read and write within the set without contention as long as they aren’t modifying the same portions of the set.

                  Map:
                  ConcurrentMap (K,V) Interface
                  Map interface has its concurrent extension called ConcurrentMap which has atomic methods like
                  • putifabsent()
                  • remove()
                  • replace()
                  Two Concrete implementations ConcurrentMap:
                  ConcurrentHashMap Class
                  It provides two levels of internal hashing. The first level chooses an internal segment, and the second level hashes into buckets in the chosen segment. The first level provides concurrency by allowing reads and writes to occur safely on each segment in parallel.
                  ConcurrentSkipListMap Class
                  Map implemented by List. It (added in Java SE 6) provides concurrent access along with sorted map functionality similar to TreeMap. Performance bounds are similar to TreeMap although multiple threads can generally read and write from the map without contention as long as they aren’t modifying the same portion of the map.
                    Queue:

                    BlockingQueue:  
                    Generally, we need to implement a synchronized queue for Producer and Consumer kind problems. BlockingQueue offers the solution. Producer put elements in queue by put() method till queue reaches its limits and then it waits(stayes blocked) till there is some place to put. On other side, consumer uses take() method to pull elements out of queue until it reaches empty state and then it waits (stays blocked). This blocking can be either indefinite or with a timeout.
                    BlockingQueue q = new LinkedBlockingQueue(5);
                    t1 {
                       q.put(str);
                    }
                    t2 {
                       q.take();

                    Example: Producer and consumer working with a queue to fill up strings with delay in consumer or producer.
                    Queue Implementation: 
                    • PriorityQueue is the only non-concurrent queue implementation and can be used by a single thread to collect items and process them in a sorted order.
                    • ConcurrentLinkedQueue - An unbounded linked list queue implementation and the only concurrent implementation not supporting BlockingQueue.
                    • ArrayBlockingQueue - A bounded blocking queue backed by an array.
                    • LinkedBlockingQueue - An optionally bounded blocking queue backed by a linked list. This is probably the most commonly used Queue implementation.
                    • PriorityBlockingQueue - An unbounded blocking queue backed by a heap. Items are removed from the queue in an order based on the Comparator associated with the queue (instead of FIFO order).
                    • DelayQueue - An unbounded blocking queue of elements, each with a delay value. Elements can only be removed when their delay has passed and are removed in the order of the oldest expired item.
                    • SynchronousQueue - A 0-length queue where the producer and consumer block until the other arrives. When both threads arrive, the value is transferred directly from producer to consumer. Useful when transferring data between threads.
                    Deques: 
                    Added in Java SE 6, to allow add/remove operations on both ends of queue.
                    • Deque is extension of Queue. 
                    • BlockingDeque is extension of BlockingQueue.
                    References:

                    Java Data Structure - Decision Making

                    Package: java.utils.*
                    Java has fantastic collection of data structure to provide programmer, all the advance tools, they need. Java Collection API is very diverse and exhaustive which bring issue of "selection". Selection of right collection type requires a proper decision making to ensure that choice meets requirements but does not carry hidden overheads. Lets go over the process.

                    1. Java Data Structure 
                    • Basic Collection 
                      • List - An ordered collection.
                      • Set - A collection that contains no duplicate elements.
                    • Map -  An object that maps keys to values with no duplicate keys. 
                    • Process Collection 
                      • Queue -  A collection designed for holding elements prior to processing in FIFO order.
                      • Stack - A collection designed for holding elements prior to processing in LIFO order.
                      List:
                      An ordered collection (sometimes called a sequence). Lists can contain duplicate elements. The user of a List generally has precise control over where in the list each element is inserted and can access elements by their integer index (position).
                      Set:
                      A collection that cannot contain duplicate elements
                      Map:
                      An object that maps keys to values. A Map cannot contain duplicate keys; each key can map to at most one value.
                      Hierarchical Structure:
                      Other Collections:
                      • Queue: Queues act as pipes between “producers” and “consumers”. Items are put in one end of the pipe and emerge from the other end of the pipe in the same “first-in first-out” (FIFO) order. These were designed for single threaded model.
                      • Stack: Stack act as one end pipes between “producers” and “consumers”. Items are pushed on top of stack and popped of the stack “Last-in first-out” (LIFO) order. These were designed for single threaded model.
                      2. Choosing from types of collection 



                      Variable Size Duplication Ordered Indexing CRUD Traversal
                      List Yes Yes Yes Index Any Index
                      Set Yes No None None Any Iterator
                      Map No Yes(of value) Yes Key/Value Any Key
                      Queue Yes Yes No Ends At Ends Pointer

                      3. Collection - Concurrency and Sorting Capability
                      Ordering of keys Non-concurrent Concurrent
                      List

                      No particular order ArrayList Vector
                      Sorted

                      Fixed LinkedList CopyOnWriteArrayList
                      Map

                      No particular order HashMap ConcurrentHashMap
                      Sorted TreeMap ConcurrentSkipListMap
                      Fixed LinkedHashMap
                      Set

                      No particular order HashSet
                      Sorted TreeSet ConcurrentSkipListSet
                      Fixed LinkedHashSet CopyOnWriteArraySet

                      3. Queue - Blocking and Bound Capability
                      Blocking Other criteria Bound Non-bound
                      Blocking None ArrayBlockingQueue LinkedBlockingQueue
                      Order - Priority-based
                      PriorityBlockingQueue
                      Delayed
                      DelayQueue
                      Non-blocking Thread-safe
                      ConcurrentLinkedQueue
                      Non thread-safe
                      LinkedList
                      Non thread-safe,
                      Order - priority-based

                      PriorityQueue

                      Concepts:
                      • Iterator:  
                        • It are fast-fail which means that after creation of Iterator, if underlying collection is modified except iterator provided API, it will throw ConcurrentModificationException.
                      • Set: 
                        • Don't allow duplicates unless object.equals() and obj.hashCode() get overridden.
                      • Hash:  
                        • It needs initial capacity and loadFactor - how full it has to be before it increment the capacity.
                        • Hash - Calculated over object.hashcode()  
                        • Hash function chooses internal bucket to place a given element. 
                      • HashMap 
                        • Hashed value becomes the key to lookup in given map.
                      • Tree 
                        • Binary tree implementation to have sorting  
                      Compare: 
                        • HashMap and HashTable
                          • HashMap - Synchronized and allows null key/value 
                        • Vector and ArrayList 
                          • Vector - Synched and dynamically capacity 
                          • ArrayList - Non-synched and fixed.
                        • Iterator and Enumeration 
                          • Iterator can modify element of underlying collection 
                          • Enumeration is more for traversing its underlying collection and expose it as read-only traversal. 
                        Best Practices:
                        • Always work with interfaces of underlying implementation.
                        • Synchronized: To get synchronized characteristic from non-thread-safe collection types, you can introduce wrapper around it. But it has overhead as it lock entire object during operation. It is advisable to use collection which are designed to be safe due to the reason of "by design". 
                          • List l = Collections.synchronizedList(new ArrayList()); 
                        Version 6.0:
                        • SkipListSet - It is implementation for navigation map which is sorted using comparator function at the time of creation. It has optimized its operations to its complexity of log(n).  
                        • SkipListMap

                        References:

                          Tuesday, December 9, 2008

                          Jax India: Architecture Mgmt

                          Architecture Review:
                          • Visualization
                          • Design violation
                          • Style violation
                          • Business logic review
                          • Performance review
                          • Documentation
                          Architecture Management:
                          • Keep it simple and maintainable software start.
                          • Keep complexity in control.
                          • Cyclometer complexity
                            • Graph - Effort vs Time.
                          • Componenetization (between UI and Data)
                            • Business
                            • Control
                            • Customer
                            • Users
                            • Commons
                          • Sub-system - Interface dependency
                          • Parameters to judge architecture health 
                            • ACD - Average Component Dependency
                            • rACD = ACD / number of element
                          • NCCD <>
                          • How to to measure coupling?
                            • Graph against NCCD and N
                          • Solution
                            • Abstraction
                            • Dependency injunction framework
                            • Flexible architecture
                          • Golden rule for Project
                            • Cycle free dependency
                            • Cycle dependency within package
                            • Keep relative
                              • ACD < % for 500 compilation unit
                            • Limit the size of java files
                            • Limit the cyclomaric complexity - 15
                            • Limit the size of package - 50
                          • SonarJ Tool for Eclipse - Architect

                          Monday, December 8, 2008

                          Singleton Traps

                          Singleton patterns is easiest pattern to learn but it can be very complicated if language vulnerabilities are to be look at. Here, I talk from Java language perspective. 

                          Initialization:
                          Initialization of singleton reference object in class also has significance in type of initialization. 
                          • Eager initialization - Initializing static instance in declaration - Class loader
                          • Lazy initialization - Initialization in class constructor with null check - Object creation
                            Synchronization: 
                            Multiple threads entering in getInstance() method can force violation of concept. Solution is to protect obj creation block by synchronization


                            Inheritance:  
                            If Singleton class is allow to subclass, it will have instance for each subclass.Solution is to protect it by making it final.

                            Serialization:
                            If singleton class is serializable, signleton can easily be violated by de-serializing it twice by two outputStream or resetting it with one outputStream. Prevention is to avoid serialzable for Singleton.

                            Multiple JVM: 
                            RMI, applet, servlet and EJB scenarios presents a challenge of singleton. It can not be prevented unless JVM classloader is modified to confirm other existence in the system.

                            Classloader:  
                            Singleton significance are very much dependent on class loader. Classloader loads classes depending on name-space. Two class-loaders in one JVM can potentially create two singleton classes. It will be valid in two classloader name-space. How to prevent it???? 

                            Multi-singleton:
                            To create multiple instance by singleton, getInstance method needs to be parametrized to make it a factory method.

                            References: 

                            Friday, December 5, 2008

                            Java SE 5 - Tiger Roaring


                            Java constantly trying to reduce the developer boilerplate code, Java SE 5 has been a landmark overhaul of language since its exception. Few of them are stated here. Java every version come out better language feature to ease the development and certain extend reduce the code and take abstraction to next higher level.

                            Generics: 
                            Provide compile time(static) type casting. More on it in my generic articles.

                            Metadata: 
                            Annotation (metadata) is a way to tell compile/runtime environment to take additional action on given tagged data for annotated artifacts (class/method/variable). 

                            my annotation article.

                            AutoBoxing: 
                            Auto conversion between primitive type to reference type has been a dream feature for lots of developers. Yes, it finally came. It really helps in collection API where these operations are more intensive.
                            • Autoboxing = Primitive type to Reference Type(Primitive Wrapper Class)
                            • Unboxing = Reference Type to Primitive Type 
                            Primitive type
                            Reference type
                            boolean
                            Boolean
                            byte
                            Byte
                            double
                            Double
                            short
                            Short
                            int
                            Integer
                            long
                            Long
                            float
                            Float

                            Looping with Iterator:
                            Enough of managing iterator in for loop, now for loop itself has iterator.
                            for(Integer number: numbers) 
                            There is chatter that it could be term as "foreach" and why cant Sun allow collection to be managed with this. Hope it will come in Java 7.

                            Varargs (...)
                            Another simplification in programming is accepting a array of input to a method, earlier array needed to be defined both at signature and ensuring array is passed. Now, for primitive type of inputs, ellipsis (...) can be used to indicate to compiler that an array of parameter is sought and there is no need to array sign mentioning. It is more related to Boxing feature taken to array level for method call.
                            int sum(Integer... numbers)
                             
                            Java's C style Printf 
                            Another improvement in Printf method, c style printf formatting has been brought. 
                            System.out.printf("%d + %d = %d\n", x, y, sum); 
                            
                            Enum is a class!!!
                            Extensive usage of enum has forced Java to evelate it to Class level... Applaud!!! :) That is good news. Another C++ goodness has been gifted to Java developers.
                            public enum Color { Red, White, Blue }
                            It has all the goodness of a class.
                            • values() retuns collection - used in for loop
                            • valueof, toString, 
                            • ordinal() returns index
                            • compareTo based on index
                            static import: 
                            Gone are those days when static methods can only be accessed by its class ref. Now, just import that specific class and call direct method as if it is local.
                            import static java.lang.Math.ceil;
                            double y = ceil(x);
                            Other Improvements:
                            • New skin-able look n feel, called Synth.
                            • Auto stub(client side) creation for RMI
                            • Java concurrency utilities in java.util.concurrent.
                            • It is also applicable to enumerations.
                            • Compile time detection for generics

                            References: 

                            Wednesday, December 3, 2008

                            Transaction - Concept

                            As the size of application increases, batch processing of tasks becomes unavoidable. Thereafter, data access, failure, locking and error conditions comes in play. To solve all these problems, transactions are design for all the system. Here are few of my notes on same.


                            ACID Principal:
                            When defining a transaction, there are four major guiding principals.

                            • Atomicity: This implies indivisibility; any indivisible operation (one which will either complete fully or not at all) is said to be atomic.
                            • Consistency: A transaction must transition persistent data from one consistent state to another. If a failure occurs during processing, the data must be restored to the state it was in prior to the transaction.
                            • Isolation: Transactions should not affect each other. A transaction in progress, not yet committed or rolled back (these terms are explained at the end of this section), must be isolated from other transactions. Although several transactions may run concurrently, it should appear to each that all the others completed before or after it; all such concurrent transactions must effectively end in sequential order.
                            • Durability: Once a transaction has successfully committed, state changes committed by that transaction must be durable and persistent, despite any failures that occur afterwards.
                            Data Corruptions:
                            Transaction is all about data manipulation. When multiple transactions are working together with overlapped data, there are possibility of data corruption. What are these problems?

                            • Dirty reads: A transaction reads a row in a database table containing uncommitted changes from another transaction. :
                              1. Non-repeatable reads: A transaction reads a row in a database table, a second transaction changes the same row and the first transaction rereads the row and gets a different value.
                                1. Phantom reads: A transaction re-executes a query, returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.
                                Read Type

                                Data 
                                T1 - Read
                                T2 - Write 
                                T2 - Commit 
                                T1 - Read
                                T1 - Expect

                                Dirty Read
                                a
                                a
                                b
                                No
                                b
                                a
                                Non-Repeatable Read
                                a
                                a
                                b
                                Yes
                                b
                                a
                                Phantom Read
                                a
                                a
                                b +/- c
                                Yes
                                b + c
                                a

                                Isolation Levels: 
                                Container ensures that the data read during transaction, stays in accordance with isolation level defined for transaction. To do so, transaction provides four types of isolation levels. In this list, isolation increases as you go down.
                                • Read Uncommitted: Read both committed and uncommitted data
                                  • Data that have been updated but not yet committed by a transaction may be read by other transactions. 
                                • Read Committed: Read only committed data (default)
                                  • Only data that have been committed by a transaction can be read by other transactions. 
                                • Repeatable Read: Read Committed and further read results in same till other Tx commits
                                  • Only data that have been committed by a transaction can be read by other transactions, and multiple reads will yield the same result as long as the data have not been committed. 
                                • Serialzable - Tx are executed serially with Read Committed and Repeatable Read properties
                                  • This, the highest possible isolation level, ensures a transaction's exclusive read-write access to data. It includes the conditions of ReadCommitted and RepeatableRead and stipulates that all transactions run serially to achieve maximum data integrity. This yields the slowest performance and least concurrency. The term serializable in this context is absolutely unrelated to Java's object-serialization mechanism and the java.io.Serializable interface. 
                                  Isolation Level
                                  Dirty Read
                                  Non Repeatable Read
                                  Phantom Read
                                  READ UNCOMMITTED
                                  Yes
                                  Yes
                                  Yes
                                  READ COMMITTED
                                  No
                                  Yes
                                  Yes
                                  REPEATABLE READ
                                  No
                                  No
                                  Yes
                                  SERIALIZABLE
                                  No
                                  No
                                  No

                                  Attributes: 
                                  A transaction usually has many stages, being executed through many methods of different classes. With this, each method brings contractual rules for its participating in transaction. Sometime, it can work in orignal transaction session, other time it wants new session for itself. To accommodate this requirement, transaction brigs transaction attributes.

                                  Attribute 
                                  Client Transaction 
                                  Business Method Transaction 
                                  Required 
                                  None
                                  T2


                                  T1
                                  T1
                                  RequiresNew
                                  None
                                  T2


                                  T1
                                  T2
                                  Supports
                                  None
                                  None


                                  T1
                                  T1
                                  NotSupported 
                                  None
                                  None


                                  T1
                                  None
                                  Mandatory 
                                  None
                                  Error


                                  T1
                                  T1
                                  Never 
                                  None
                                  None

                                  T1
                                  Error

                                  * Error - RemoteException 
                                  Bean Managed or Declarative Transaction: 
                                  Bean starts and end the transaction.



                                  Programmatic Managed: 
                                  EJB being operated starts and ends the transaction.



                                  Client Managed: 
                                  Client starts and end the transaction.





                                  JTA Main API: 
                                  • begin()
                                  • Commit()
                                  • getStatus()
                                  • rollBack()
                                  • setRollbackOnly() 
                                  • setTransactionTimeout()
                                  References:

                                  Monday, December 1, 2008

                                  JIT vs JVM

                                  JVM provided advatage of running code as interpretor of bytecode. These bytecodes are compiled from Java Classes and optimized at compiler level. But when it is executing in JVM, optimization during runtime is limited and also depends on JVM vendor implementation.

                                  Just In Time(JIT) is an alternate solution to JVM interpretation approach. JIT also deals with bytecode but instead of interpretaing it it compiles it again. Reason for doing compilation of bytecode is to do further optimization before it runs on machine. It does few of couple of things:
                                  • Frequently executed bytecode converted into direct machine code. 
                                  • Less frequent bytecode is maintained as it is.
                                  To do above, there is startup time delay in JIT, but performance benefit follows when JIT plan is ready. In server client architecute, apps are compiled according to need.
                                  • Client App - Low compilation and High optimization 
                                  • Server App - High compilation and Low optimization
                                  JIT also consider target machine characteristics (CPU, Momeory etc.) to optimize generated machine code. 

                                  JIT carries all the goodness of JVM security and cross-platform. JIT has certain advantage for production level system where performance is

                                  References: