HashMap源码分析
HashMap源码分析
1. HashMap介绍
HashMap是一个Hash表,通过key-value来存储数据,并允许使用 null 值和 null 键。HashMap并不保证映射顺序,而是通过Hash算法将key-value保存到对应的索引位置。
还有一点就是HashMap不是线程安全的,但是可以通过Collections类的静态方法synchronizedMap变成线程安全的Map。HashMap之所以线程不安全是因为多个线程修改map时,可能会导致环状结构,形成死循环。
2. HashMap继承的接口
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable
3. HashMap属性
/** * The default initial capacity - MUST be a power of two. * --------------------------------------------------------- * 默认的初始容量,必须是2的次幂 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * ----------------------------------------- * 最大的容量值,指定的最大容量值也要小于等于2^29 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. * ------------------------------------------- * 默认的加载因子 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. * ------------------------------------------- * 树化链表的阈值,如果一个链表长度大于或等于8,扩容或者树化链表 */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. * ------------------------------------------ * 取消树化的阈值 */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. * ----------------------------------------------- * 当某一位置的链表长度到达8时,就会进行树化。 * 树化的过程 当table小于64,则进行扩容 ; 否则进行树化 * 后面会讲解到 */ static final int MIN_TREEIFY_CAPACITY = 64; /** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) * -------------------------------------------------------- * 存储数据的数组 */ transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). * -------------------------------------------------------------- * keySet返回的结果集 */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key-value mappings contained in this map. * ---------------------------------------------------------------- * Hash表中key-value的个数 */ transient int size; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). * ---------------------------------------------------------------- * 修改次数 */ transient int modCount; /** * The next size value at which to resize (capacity * load factor). * --------------------------------------------------------- * 下一个要调整的大小值 * @serial */ int threshold; /** * The load factor for the hash table. * ---------------------------------------------- * 装载因子 * @serial */ final float loadFactor;
4. 内部类
/** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } //判断key-value是否相等 //判断相等首先要看是否是地址相同,然后判断equals判断key和value是否相同 public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
//返回key的集合 public Set<K> keySet() { Set<K> ks = keySet; if (ks == null) { ks = new KeySet(); keySet = ks; } return ks; } final class KeySet extends AbstractSet<K> { public final int size() { return size; } public final void clear() { HashMap.this.clear(); } public final Iterator<K> iterator() { return new KeyIterator(); } public final boolean contains(Object o) { return containsKey(o); } public final boolean remove(Object key) { return removeNode(hash(key), key, null, false, true) != null; } public final Spliterator<K> spliterator() { return new KeySpliterator<>(HashMap.this, 0, -1, 0, 0); } public final void forEach(Consumer<? super K> action) { Node<K,V>[] tab; if (action == null) throw new NullPointerException(); if (size > 0 && (tab = table) != null) { int mc = modCount; for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e != null; e = e.next) action.accept(e.key); } if (modCount != mc) throw new ConcurrentModificationException(); } } }
//返回value的集合 public Collection<V> values() { Collection<V> vs = values; if (vs == null) { vs = new Values(); values = vs; } return vs; } final class Values extends AbstractCollection<V> { public final int size() { return size; } public final void clear() { HashMap.this.clear(); } public final Iterator<V> iterator() { return new ValueIterator(); } public final boolean contains(Object o) { return containsValue(o); } public final Spliterator<V> spliterator() { return new ValueSpliterator<>(HashMap.this, 0, -1, 0, 0); } public final void forEach(Consumer<? super V> action) { Node<K,V>[] tab; if (action == null) throw new NullPointerException(); if (size > 0 && (tab = table) != null) { int mc = modCount; for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e != null; e = e.next) action.accept(e.value); } if (modCount != mc) throw new ConcurrentModificationException(); } } }
/** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. */ static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } //…………一堆方法 }
5. 构造方法
/* * 从此方法可以看出,构造函数传进来{初始容量}和{装载因子},通过修改threshold来在resize()方法中 * 改变table数组的大小。 */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); //装载因子 this.loadFactor = loadFactor; //找到大于initialCapacity最小2的次幂数 this.threshold = tableSizeFor(initialCapacity); }
6. 重要方法
1. resize
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; //oldCap为旧数组长度 int oldCap = (oldTab == null) ? 0 : oldTab.length; //oldThr为旧下次要调整的大小 int oldThr = threshold; int newCap, newThr = 0; //主要是给newCap和newThr赋值 if (oldCap > 0) { //如果数组超过最大值,则将 if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; //只有一个节点 if (e.next == null) newTab[e.hash & (newCap - 1)] = e; //红黑树 else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); //链表 else { // preserve order //将每个链分为两份 Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; //循环,将一个链表分割成两个链表 do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); //loHead链表还在原来位置 if (loTail != null) { loTail.next = null; newTab[j] = loHead; } //hiTail链表迁移到j+oldCap位置 if (hiTail != null) { hiTail.next = null; //非常灵性 //为什么说j+oldCap有灵性呢? //因为计算机都是二进制存储,扩容之后的新的位置的bit位变成1即可 newTab[j + oldCap] = hiHead; } } } } } return newTab; }
总结一下
resize方法是HashMap中table的初始化方法和扩容方法,扩容后的大小为threshold。然后分割红黑树,或者分割链表。最后返回一个新数组。
2. hash
计算key的hash值
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
3. put
put操作是最最常规用的方法之一
put -> putVal -> 链表尾插入 || putTreeVal ->
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * --------------------------------------------------------------------- * 在map里将特殊的key和value联系起来,如果map中包含这个key,则将旧的value替换成新的value */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }
put方法调用的是putVal方法
/** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; //判断数组是否需要初始化 if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; //1.没有hash冲突 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); //2.有hash冲突,使用拉链法 else { Node<K,V> e; K k; //p是一个链表的首元素(头节点) //1.key已经存在,但是要根据onlyIfAbsent来决定是否更新 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; //2. 红黑树 else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); //3. 链表 else { //binCount记录链表长度 for (int binCount = 0; ; ++binCount) { //不存在,插入链表结尾 if ((e = p.next) == null) { //新节点 p.next = newNode(hash, key, value, null); //是否树化 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } //已存在 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } //存在待更新的值 if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); //返回替换的旧节点 return oldValue; } } //修改次数+1 ++modCount; //判断是否扩容 if (++size > threshold) resize(); //LinkedHashMap实现的方法,而HashMap中是空方法 afterNodeInsertion(evict); return null; }
再了解一下treeifyBin方法
/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. * ------------------------------------------------------ * 树化方法 */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; //重点 //table长度小于64就会扩容 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); //树化 else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } }
总结一下putVal
putVal是HashMap的插入方法,首先table是否初始化。接着判断key值是否出现hash冲突,不冲突直接加入table中;冲突则采用拉链法,一种加入到红黑树,一种加入链表。
- 链表长度大于等于8时,会转成红黑树
- 转化为红黑树的必要条件是table数组长度大于等于64,否则不会转化为红黑树,而是进行扩容
4. get
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; }
get方法调用getNode方法
/** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { //判断首元素 if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { //树取值 if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); //链表取值 do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
5. remove
remove为删除方法
/** * Removes the mapping for the specified key from this map if present. */ public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; }
public boolean remove(Object key, Object value) { return removeNode(hash(key), key, value, true, true) != null; }
removeNode方法
/** * Implements Map.remove and related methods * * @param hash hash for key * @param key the key * @param value the value to match if matchValue, else ignored * @param matchValue if true only remove if value is equal * matchValue 如果为true equals(value) 则删除 ; 否则不关心value * @param movable if false do not move other nodes while removing * movable 删除后是否移动节点 true 移动 false 不移动 * @return the node, or null if none */ final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) { Node<K,V> node = null, e; K k; V v; //头节点 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) node = p; //在链表或红黑树查找 else if ((e = p.next) != null) { //红黑树 if (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreeNode(hash, key); //链表 else { do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) != null); } } //开始移除 //node为空 --- matchValue=false --- 不需要关心value的值 // --- matchValue=true --- key对应的v == value if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) { //红黑树移除 if (node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); //链表头节点移除 else if (node == p) tab[index] = node.next; //链中移除 else p.next = node.next; //修改次数 ++modCount; --size; //LinkedHashMap实现 afterNodeRemoval(node); return node; } } return null; }
6. contains
contains判断map中是否包含key或者value
containsKey
- 如果此映射包含指定键的映射,则返回
true
。
/** * Returns <tt>true</tt> if this map contains a mapping for the * specified key. * * @param key The key whose presence in this map is to be tested * @return <tt>true</tt> if this map contains a mapping for the specified * key. */ public boolean containsKey(Object key) { //前面说到getNode方法 return getNode(hash(key), key) != null; }
containsValue
- 如果此地图将一个或多个键映射到指定值,则返回
true
。
/** * Returns <tt>true</tt> if this map maps one or more keys to the * specified value. * * @param value value whose presence in this map is to be tested * @return <tt>true</tt> if this map maps one or more keys to the * specified value */ public boolean containsValue(Object value) { Node<K,V>[] tab; V v; if ((tab = table) != null && size > 0) { //两层循环判断value是否存在,就像二维数组查找 for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e != null; e = e.next) { if ((v = e.value) == value || (value != null && value.equals(v))) return true; } } } return false; }
7. replace
/** * 判断oldVaule和key对应的value是否一样,如果一样则替换,返回true,否则返回false */ public boolean replace(K key, V oldValue, V newValue) { Node<K,V> e; V v; if ((e = getNode(hash(key), key)) != null && ((v = e.value) == oldValue || (v != null && v.equals(oldValue)))) { e.value = newValue; afterNodeAccess(e); return true; } return false; }
/** * 找到key,直接替换,将替换后的value返回 */ public V replace(K key, V value) { Node<K,V> e; if ((e = getNode(hash(key), key)) != null) { V oldValue = e.value; e.value = value; afterNodeAccess(e); return oldValue; } return null; }
7. 总结
仅是看了部分的HashMap的源码,从中学习到了很多知识
- 外部public方法调用内部private方法,防止外部直接调用内部逻辑方法
- ()内赋值再判断,减少代码量,但是可读性变差了,小白可能会崩溃的
- 内部类和外部类的完美应用,相互依靠,相得益彰
- 一个方法的设计要尽可能降低耦合度,提高方法的复用
- 方法中可以提供一些功能,让子类来实现
// Callbacks to allow LinkedHashMap post-actions void afterNodeAccess(Node<K,V> p) { } void afterNodeInsertion(boolean evict) { } void afterNodeRemoval(Node<K,V> p) { }
8. 面试
1. HashMap插入数据的流程
2. HashMap初始化
- 默认table长度16,负载因子0.75。
- 链表长度大于8树化。
- 树节点数小于6,链表化。
3. Hash函数设计
key的hashCode是int型32位的值,然后让hashCode的高16位和低16位进行异或操作。这叫做扰动函数,是为了减少hash碰撞,尽量分散;另一方面每次扩容都要进行再hash,也是为了提高效率。
hash值的范围太大,所以需要对其取模,得到的余数作为访问数组的下标。
4. 为什么长度为2的整数次幂
index = hash & (length - 1),与操作使高位全为0,仅保留底为的值用来作为访问数组的下标。
但是只取后几位碰撞概率也很大,这时候扰动函数就体现出来了价值。高16位和低16位进行异或运算能够保证高位和底为的特征保留下来,也降低了随机性。
5. JDK 1.8 HashMap较1.7做了哪些优化
- 数组+链表 改成了 数组+链表+红黑树
- 链表插入方式从头插入变成了尾插入。1.7是直接插入到头部;1.8要遍历整个数组
- 1.7扩容是需要重新hash再插入,1.8采用了将链表分为索引不变、原数组索引+旧容量大小两个部分。
- 1.7是先判断是否需要扩容,然后再插入;1.8是先插入,在判断需不需要扩容
6. 为什么做这些优化
防止hash冲突,链表长度过长,将时间复杂度O(n)降到O(logn)
头插入会使链表发生反转,多线程情况下可能会产生环。
线程A插入B,线程B也插入,但是需要扩容,头插入会导致形成环状结构
7. 扩容的时候为什么不重新计算hash来定位位置
因为之前说过,扩容的大小为原数组的2倍,原数组的大小是2的次幂。所以扩容后对应的位置是index+oldCapacity。也就是掩码高位多出一个1。
8. HashMap是线程安全的吗?
并不是,1.7的时候会出现环状结构、数据丢失、数据覆盖的问题。1.8会有数据覆盖的问题,两个线程同时操作HashMap,put操作相同位置时,会出现数据覆盖。
9. 如何解决HashMap的线程安全问题
使用HashTable、ConcurrentHashMap、Collections.synchronizedMap来实现HashMap线程安全。
10. HashMap树化和去树化的阀值?
树化的阀值是8,去树化的阀值是6。8都是经过概率计算的;而6是为了防止树和链表一直发生转化。
11. HashMap内部节点是有序的吗?
非也,Node是根据hash计算随机插入的,是无序的。有序的是LinkedHashMap和TreeMap。
12. LinkedHashMap如何实现有序的?
内部维持着一个单链表,Node类不管继承了HashMap中Node属性,还增加了before、after用来标识前后节点。可以实现按顺序插入和访问。
13. TreeMap是如何实现有序的?
TreeMap按照key的自然顺序或者Comparable接口和Comparator接口,底层是红黑树实现的。