0% found this document useful (0 votes)

100 views42 pages

A JVM Does: That???

The document discusses the many services provided by a Java Virtual Machine (JVM). It notes that JVMs provide services like garbage collection, just-in-time compilation, enforcing the Java Memory Model, and providing a consistent threading and memory model across different hardware and operating systems. However, some of these services create illusions for developers about what is actually happening at the hardware level. The document explores several illusions that JVMs create and suggests some services could potentially be moved to the operating system level for better performance and consistency.

Uploaded by

rahulsinner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views42 pages

A JVM Does: That???

Uploaded by

rahulsinner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A JVM Does

That???
Cliff Click www.azulsystems.com/blogs

A JVM Does That???

Been a JVM Engineer for over a decade I'm still amazed at what goes in a JVM Services have increased over time Many new services painfully "volunteered" by naive change in specs

Some JVM Services

High Quality GC

Parallel, Concurrent, Collection Low total allocation cost Two JITs, JIT'd Code Management, Profiling Bytecode cost model Locks (synchronization), volatile, wait, notify

High Quality Machine Code Generation

Uniform Threading & Memory Model

Type Safety

Some JVM Services

Dynamic Code Loading

Class loading, Deoptimization, re-JIT'ing System.currentTimeMillis Reflection, JNI, JVMTI, JVMDI/JVMPI, Agents

Quick high-quality Time Access

Internal introspection services

Access to huge pre-built library Access to OS

threads, scheduling, priorities, native code

Too Many Services?

Where did all this come from? Mostly incrementally added over time The Language, JVM, & Hardware all co-evolved

e.g. incremental addition of finalizers, JMM, 64-bits Support for high core-count machines

Why Did We Add All These Services?

Because Illusions Are Powerful Abstractions

The 'V' in JVM

"Virtual" Its a Great Abstraction Programmers focus on value-add elsewhere JVM Provides Services The selection of Services is ad-hoc

Grown over time as needed Some services are unique to Java or the JVM Many services overlap with existing OS services

But sometimes have different requirements

Agenda

Introduction (just did that) Illusions We Have Illusions We Think We Have or Wish We Had Sorting Our Illusions Out

Illusion: Infinite Memory

Garbage Collection The Infinite Heap Illusion

Just allocate memory via 'new' Do not track lifetime, do not 'free' GC figures out What's Alive and What's Dead Fewer bugs, quicker time-to-market Just too hard to track liveness otherwise

Vastly easier to use than malloc/free

Enables certain kinds of concurrent algorithms

Illusion: Infinite Memory

GC have made huge strides in the last decade

Production-ready robust, parallel, concurrent Still major user pain-point

Too many tuning flags, GC pauses, etc

Major Vendor point of differentiation, active dev Throughput varies by maybe 30% Pause-times vary over 6 orders of magnitude

(Azul GPGC: 100's of Gig's w/10msec) (Stock full GC pause: 10's of Gig's w/10sec) (IBM Metronome: 100's Megs w/10microsec)

Illusion: Bytecodes Are Fast

Class files are a lousy way to describe programs There are better ways to describe semantics than Java bytecodes

But we're stuck with them for now Main win: hides CPU details

Programmers rely on them being "fast" It's a big Illusion: Interpretation is slow JIT'ing brings back the "expected" cost model

Illusion: Bytecodes Are Fast

JVMs eventually JIT bytecodes

To make them fast! Some JITs are high quality optimizing compilers

Amazingly complex beasties in their own rights

i.e. JVMs bring "gcc -O2" to the masses Tracking OOPs (ptrs) for GC Java Memory Model (volatile reordering & fences) New code patterns to optimize

But cannot use "gcc"-style compilers directly:

Illusion: Bytecodes Are Fast

JIT'ing requires Profiling

Because you don't want to JIT everything

Profiling allows focused code-gen Profiling allows better code-gen

Inline whats hot Loop unrolling, range-check elimination, etc Branch prediction, spill-code-gen, scheduling

JVMs bring Profiled code to the masses!

Illusion: virtual calls are fast

C++ avoids virtual calls because they are slow Java embraces them and makes them fast

Well, mostly fast JIT's do Class Hierarchy Analysis CHA turns most virtual calls into static calls JVM detects new classes loaded, adjusts CHA

May need to re-JIT

When CHA fails to make the call static, inline caches When IC's fail, virtual calls are back to being slow

Illusion: Partial Programs Are Fast

JVMs allow late class loading, name binding

i.e. classForName Adding new parts in (e.g. Class loading) is "cheap" May require: deoptimization, re-profiling, re-JIT Deoptimzation is a hard problem also

Partial programs are as fast as whole programs

Illusion: Consistent Memory Models

ALL machines have different memory models

The rules on visibility vary widely from machines And even within generations of the same machine X86 is very conservative, so is Sparc Power, MIPS less so IA64 & Azul very aggressive So must match the JMM Else program meaning would depend on hardware!

Program semantics depend on the JMM

Illusion: Consistent Memory Models

Very different hardware memory models None match the Java Memory Model The JVM bridges the gap

While keeping normal loads & stores fast Via combinations of fences, code scheduling, placement of locks & CAS ops Requires close cooperation from the JITs Requires detailed hardware knowledge

Illusion: Consistent Thread Models

Very different OS thread models

Linux, Solaris, AIX But also cell phones, iPad, etc On micro devices to 1000-cpu giant machines and synchronized, wait, notify, join, etc, all just work

Java just does 'new Thread'

Illusion: Locks are Fast

Contended locks obviously block and must involve the OS

(Expect fairness from the OS) Biased locking: ~2-4 clocks (when it works) Very fast user-mode locks otherwise

Uncontended locks are a dozen nano's or so

Highly optimized because synchronized is so common

Illusion: Locks are Fast

People don't know how to program concurrently

The 'just add locks until it works' mentality i.e. Lowest-common-denominator programming So locks became common So JVMs optimized them

This enabled a particular concurrent programming style And we, as an industry, learned alot about concurrent programming as a result

Illusion: Quick Time Access

System.currentTimeMillis

Called billions of times/sec in some benchmarks Fairly common in all large java apps Real Java programs expect that: if T1's Sys.cTM < T2's Sys.cTM then T1 <<<happens_before T2 Value not coherent across CPUs Not consistent, e.g. slow ticking in low-power mode Monotonic per CPU but not per-thread

But cannot use, e.g. X86's "tsc" register

Illusion: Quick Time Access

System.currentTimeMillis

Switching from fastest linux gettimeofday call

(mostly-user-mode atomic time struct read) gettimeofday gives quality time

To a plain load (updated by background thread) Was worth 10% speed boost on key benchmark Means: uniform monotonic ticking Means: slows access to tsc by 100x?

Hypervisors like to "idealize" tsc :

Agenda

Introduction (just did that) Illusions We Have Illusions We Think We Have or Wish We Had Sorting Our Illusions Out

Illusions We'd Like To Have

Infinite Stack

e.g. Tail calls. Useful in some functional languages e.g. Closures e.g. Auto-boxing optimizations e.g. Tagged integer math, silent overflow to infinite precision integers

Running-code-is-data

'Integer' is as cheap as 'int'

'BigInteger' is as cheap as 'int'

Illusions We'd Like To Have

Atomic Multi-Address Update

e.g. Software Transactional Memory e.g. invokedynamic

Fast alternative call bindings

Illusions We Think We Have

This mass of code is maintainable:

HotSpot is approaching 15yrs old Large chunks of code are fragile

(or very 'fluffy' per line of code)

Very slow new-feature rate-of-change Many major subsystems are simpler, faster, lighter >100K diffs from OpenJDK

Azul Systems has been busy rewriting lots of it

Illusions We Think We Have

Thread priorities

Mostly none on Linux without root permission But also relative to entire machine, not JVM Means a low-priority JVM with high priority threads

e.g. Concurrent GC threads trying to keep up

...can starve a medium-priority JVM Scale matters: programs for very small or very large machines are different

Write-once-run-anywhere

Illusions We Think We Have

Finalizers are Useful

They suck for reclaiming OS resources

Because no timeliness guarantees Code "eventually" runs, but might be never e.g. Tomcat requires a out-of-file-handles situation trigger a FullGC to reclaim finalizers to recycle OS file handles

What other out-of-OS resources situations need to trigger a GC? Do we really want to code our programs this way?

Illusions We Think We Have

Soft, Phantom Refs are Useful

Again using GC to manage a user resource e.g. Use GC to manage Caches

Low memory causes rapid GC cycles causes soft refs to flush causes caches to empty causes more cache misses causes more application work causes more allocation causes rapid GC cycles

Agenda

Introduction (just did that) Illusions We Have Illusions We Think We Have or Wish We Had Sorting Our Illusions Out

Services Summary

Services provided by JVM

GC, JIT'ing, JMM, thread management, fast time Hiding CPU details & hardware memory model Threads, context switching, priorities, I/O, files, virtual memory protection, Threadpools & worklists, transactions, cypto, caching, models of concurrent programming Alt languages: new dispatch, big ints, alt conc

Services provided below the JVM (OS)

Services provided above the JVM (App)

Move to OS: Fast Quality Time

JVM provides fast quality time

Fast not quality from X86 'tsc' Quality not fast from OS gettimeofday Tick memory word 1000/sec

This should be an OS service

Update with kernel thread or timer

Read-only process-shared page This CTM is a coherent across CPUs on a clock-cycle basis

Move to OS: Thread Priorities

OS provides thread priorities at the process level

Higher priority JVMs can/should starve lower ones GC threads need cycles before mutator threads

JVM also needs thread priorities within-process

Or else that concurrent GC will won't be concurrent And the mutator will block for a GC cycle Or else the 1000-runnable threads will starve the JIT And the program will always run interpreted

JIT threads need cycles

Move to OS: Thread Priorities

Right now Azul is faking thread priorities

With duty-cycle style locks & blocks Required for a low-pauses concurrent collector OS already does process priorities & context switches Also, cannot raise thread priorities without 'root' Lowering mutator priorities changes behavior wrt non-Java processes

Per-process Thread Priorities belong in the OS

Keep Above JVM: Alternative Concurrency

JVM provides thread management, fast locks Many new langs have new concurrency ideas

Actors, Msg-passing, STM, Fork/Join are a few JVM too big, too slow to move fast here Should experiment 'above' the JVM ...at least until we get some concensus on The Right Way To Do Concurrency Then JVM maybe provides building blocks

e.g. park/unpark or a specific kind of STM

Move to JVM: Fixnums

Fixnums belong in the JVM, not language impl JVM provides 'int' & 'long'

Many languages want 'ideal int' Obvious java translation to infinite math is inefficient

Really want some kind of tagged integer Requires JIT support to be really efficient You (app-level programmer) know if you might need more Don't make everybody else pay for it

I think "64bits ought to be enough for anybody"

Keep in JVM: GC, JIT'ing, JMM, Type Safety

JIT'ing (by itself) belongs above the OS and below the App so in the JVM GC requires deep hooks into the JIT'ing process

And also makes sense below the App And again (mostly) makes sense below the App Some alternative concurrency models would expose weaker MMs to the App, would enable faster, cheaper hardware but this is still going require close JIT cooperation

The JMM requires deep hooks into the JIT also

Move Above JVM: OS Resource Lifetime

Move outside-the-JVM resource lifetime control out of Finalizers

Make the app do e.g. ref-counting or 'arena' or other lifetime management Do not burden GC with knowledge that more of resource 'X' can be had running finalizers GC should not change application semantics

Move weak/soft/phantom refs to the App

Summary
OS VirtualMemory ContextSwitches JVM Type-Safe Memory GC JIT'ing & Code Management JMM Fast Locks Thread Management Thread Priorities Fast Time OS Resource Management Application AlternativeConcurrency STM / FJ

Files, I/O

Fixnums Tail Calls Closures

Cliff Click http://www.azulsystems.com/blogs

Move To JVM (Azul): Virtual / Physical Mappings

Azul's GPGC does aggressive virtual-memory to physical-memory remappings

Tbytes/sec remapping rates mmap() & friends too slow Still safe across processes But within process can totally screw self up

Need OS hacks to expose hardware TLB

Move To JVM (Azul): Hardware Perf Counters

JVM is already doing profile-directed compilation

Natural consumer of HW Perf Counters JIT's code, manages JIT'd code "hotcode" mapped back to user's bytecodes

JVM can map perf counters to bytecodes

Want quickest & thin-est way to expose HW perf counters to JVM

Summary (Azul)
OS Virtual / PhysicalMemory Mapping HW Perf Counters Azul's JVM GPGC Application

Profiling Thread Priorities Fast Time

Cliff Click http://www.azulsystems.com/blogs

Summary

There's Work To Do
(full employment contract for JVM engineers)

Cliff Click http://www.azulsystems.com/blogs

JVM Deep Dive
No ratings yet
JVM Deep Dive
53 pages
Jvmls2014manson 2265206
No ratings yet
Jvmls2014manson 2265206
37 pages
Java Memory Allocation
No ratings yet
Java Memory Allocation
12 pages
Inside The JVM......
No ratings yet
Inside The JVM......
51 pages
Java Platform Performance
No ratings yet
Java Platform Performance
229 pages
JVM and Java Performance Tuning
No ratings yet
JVM and Java Performance Tuning
12 pages
Java Performance Tuning Ver 1
No ratings yet
Java Performance Tuning Ver 1
72 pages
Java GC Optimization Guide
No ratings yet
Java GC Optimization Guide
27 pages
Java
No ratings yet
Java
7 pages
IBM Java 1.5 Diagnostics Guide PDF
No ratings yet
IBM Java 1.5 Diagnostics Guide PDF
468 pages
25 Javase 5 Language
No ratings yet
25 Javase 5 Language
36 pages
JVM Garbage Collection Process
No ratings yet
JVM Garbage Collection Process
41 pages
1.1.1 Java
No ratings yet
1.1.1 Java
19 pages
Java Performance Tuning (Full Presentation) by Ender
No ratings yet
Java Performance Tuning (Full Presentation) by Ender
172 pages
Java Vs C
No ratings yet
Java Vs C
8 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
32 pages
Java Memr PDF
No ratings yet
Java Memr PDF
144 pages
Java Concurrent Package Tutorial
No ratings yet
Java Concurrent Package Tutorial
32 pages
Performance Evaluation of Java For Numerical Computing: Roldan Pozo
No ratings yet
Performance Evaluation of Java For Numerical Computing: Roldan Pozo
44 pages
Theory of Memory Managementwhile Running The Scripts
No ratings yet
Theory of Memory Managementwhile Running The Scripts
26 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Java Programming Essentials
No ratings yet
Java Programming Essentials
161 pages
Java Garbage Collector
No ratings yet
Java Garbage Collector
21 pages
Java History & Features Overview
No ratings yet
Java History & Features Overview
37 pages
Ibm Java 1.4.2 For Sap
No ratings yet
Ibm Java 1.4.2 For Sap
382 pages
Java Performance Mindmap
No ratings yet
Java Performance Mindmap
1 page
Java ATM Project Overview
No ratings yet
Java ATM Project Overview
17 pages
Java Fullstack Development
No ratings yet
Java Fullstack Development
49 pages
IBM JDK Diagnostics Guide
No ratings yet
IBM JDK Diagnostics Guide
498 pages
Ijcrt 271441
No ratings yet
Ijcrt 271441
8 pages
Java
No ratings yet
Java
37 pages
Corejava by Ratan - 4
No ratings yet
Corejava by Ratan - 4
416 pages
Programming With JAVA
No ratings yet
Programming With JAVA
32 pages
Approaches To Synchronization and GC
No ratings yet
Approaches To Synchronization and GC
29 pages
AJP 2 Marks
No ratings yet
AJP 2 Marks
14 pages
JVM Performance Engineering 2024 4
100% (1)
JVM Performance Engineering 2024 4
397 pages
Unit1 Memorymanagement
No ratings yet
Unit1 Memorymanagement
6 pages
Java Notes 1
No ratings yet
Java Notes 1
42 pages
Java Basics
No ratings yet
Java Basics
40 pages
Oops Week - 9
No ratings yet
Oops Week - 9
10 pages
Core Java
No ratings yet
Core Java
126 pages
Core Java Final
No ratings yet
Core Java Final
267 pages
Unit1 JVM
No ratings yet
Unit1 JVM
6 pages
Java Concurrency Principles Explained
No ratings yet
Java Concurrency Principles Explained
216 pages
Java and The JVM: Presented by - Aftab Ahmad C.S.E (7 Sem) S.NO-03
No ratings yet
Java and The JVM: Presented by - Aftab Ahmad C.S.E (7 Sem) S.NO-03
21 pages
Top 50 Java Interview Questions
No ratings yet
Top 50 Java Interview Questions
19 pages
Chapter 1: Get Started and Sip Your First Java Cup... (4.5 HRS) Chapter Objective
No ratings yet
Chapter 1: Get Started and Sip Your First Java Cup... (4.5 HRS) Chapter Objective
9 pages
Java Heap, GC and Heap Dump
No ratings yet
Java Heap, GC and Heap Dump
4 pages
Submitted To: Submitted By: Mr. Vinod Jain Vandana Jain 07/CS/103
No ratings yet
Submitted To: Submitted By: Mr. Vinod Jain Vandana Jain 07/CS/103
39 pages
Enabling Java For High-Performance Computing: Exploiting Distributed Shared Memory and Remote Method Invocation
No ratings yet
Enabling Java For High-Performance Computing: Exploiting Distributed Shared Memory and Remote Method Invocation
14 pages
J Threads PDF
No ratings yet
J Threads PDF
24 pages
JavaPerformanceAndTuning Ateendees
No ratings yet
JavaPerformanceAndTuning Ateendees
55 pages
College of Engineering & Technology, Bikaner: Advance Java (J2EE)
No ratings yet
College of Engineering & Technology, Bikaner: Advance Java (J2EE)
43 pages
Move Oracle Datafile with RMAN
No ratings yet
Move Oracle Datafile with RMAN
3 pages
SQL Notes
No ratings yet
SQL Notes
74 pages
Ieee Flash
No ratings yet
Ieee Flash
3 pages
Notification 851 17102020
No ratings yet
Notification 851 17102020
3 pages
LA-5322P M/B Schematic Overview
100% (1)
LA-5322P M/B Schematic Overview
58 pages
AX3 NB5 Block Diagram
No ratings yet
AX3 NB5 Block Diagram
30 pages
Latex Natbib Reference
No ratings yet
Latex Natbib Reference
5 pages
SRS Passport
No ratings yet
SRS Passport
8 pages
EmSys Module 3
No ratings yet
EmSys Module 3
70 pages
Optimize Excel VBA with Timer Functions
No ratings yet
Optimize Excel VBA with Timer Functions
3 pages
Unit 1 - L6
No ratings yet
Unit 1 - L6
43 pages
PLC Timer and Counter Types Explained
No ratings yet
PLC Timer and Counter Types Explained
19 pages
Graphics Library Help
No ratings yet
Graphics Library Help
537 pages
Linux VPN Setup Guide
No ratings yet
Linux VPN Setup Guide
3 pages
CSE 391 - Lecture 2.2
No ratings yet
CSE 391 - Lecture 2.2
36 pages
Class 11 Record
No ratings yet
Class 11 Record
19 pages
MDS General Portfolio Guide
No ratings yet
MDS General Portfolio Guide
1 page
Mathtype 6 5 For Windows CHM
100% (1)
Mathtype 6 5 For Windows CHM
367 pages
Minimum, Recommended, and Latest Code Versions For Dell Technologies Servers, Storage, and Networking Products (000205512)
No ratings yet
Minimum, Recommended, and Latest Code Versions For Dell Technologies Servers, Storage, and Networking Products (000205512)
18 pages
GCSE Exam Questions
No ratings yet
GCSE Exam Questions
5 pages
Dan Rodney's List of Mac OS X Keyboard Shortcuts & Keystrokes
No ratings yet
Dan Rodney's List of Mac OS X Keyboard Shortcuts & Keystrokes
4 pages
ArcGIS MultiSpeak Data Model - Logical Diagram
No ratings yet
ArcGIS MultiSpeak Data Model - Logical Diagram
1 page
Leveraging Webrtc For P2P Content Distribution in Web Browsers
No ratings yet
Leveraging Webrtc For P2P Content Distribution in Web Browsers
2 pages
Sahil Sharma Dsa File
No ratings yet
Sahil Sharma Dsa File
57 pages
Omni Questions
No ratings yet
Omni Questions
29 pages
e-IPCRF Implementation Guide 2019-2020
No ratings yet
e-IPCRF Implementation Guide 2019-2020
1 page
Computer Science Full
No ratings yet
Computer Science Full
8 pages
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
No ratings yet
Procedural Lab Use The Teamcenter Environment Manager To Deploy The Template Project
2 pages
License
No ratings yet
License
39 pages
HealthHub App UI/UX Redesign Project
No ratings yet
HealthHub App UI/UX Redesign Project
8 pages