雪花算法和代码实现分析

2023-04-06 约 1114 字

最近在 GitHub 上看一个分布式定时器实现，每一个定时任务需要分配一个唯一 ID，项目使用的是雪花算法。于是找资料学习了以下雪花算法。

雪花算法

1
2
3


+--------------------------------------------------------------------------+
| 1 Bit Unused | 41 Bit Timestamp |  10 Bit NodeID  |   12 Bit Sequence ID |
+--------------------------------------------------------------------------+

总共 64 bit，8 Byte。

Unused，1 bit，无用。
Timestamp，41 bit，时间戳，精确到毫秒，可容纳 69 年。指的是相对于起始时间的的毫秒数，可以自定义起始时间。
NodeID，10 bit，工作机器ID（分布式），数量可达 1024。
Sequence ID 序列ID（同一毫秒内），数量可达 4096。

SnowFlake 算法同一毫秒内最多可以生成全局唯一ID的数量：4 百万（1024 X 4096 = 4194304）。

优点

支持分布式，全局唯一性。
递增性，利于 MySQL 索引的插入（前提是把该 ID 作为主键）。

如果因为 MySQL 聚簇索引的数据是根据主键递增的，如果插入数据不是递增的，而是随机的，存在以下缺点：
（1）查找插入位置耗费时间；
（2）如果数据页面已满，插入新数据会导致页面分裂。
高可用性，确保任何时候都能生成正确的 ID。
高性能，高并发的环境下依旧表现良好。

注意：

雪花算法是 64 bit，Go 直接用 uint64 存储即可；
JavaScript 的 Number 的范围为 ±2^53，所以 JavaScript 中雪花算法的 ID 必须用 string 存储。

雪花算法实现

这里是在 GitHub 上找的一个 Go 实现的雪花算法。看完我感觉实现很巧妙，其实这也是我写这一篇文章想分享的原因。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76


// github.com/twitter-archive/snowflake

// 下面是默认的配置
var (
	// Epoch is set to the twitter snowflake epoch of Nov 04 2010 01:42:54 UTC in milliseconds
	// You may customize this to set a different epoch for your application.
    // 起始时间（距离 Nov 04 2010 01:42:54 UTC 的毫秒数）
	Epoch int64 = 1288834974657

	// NodeBits holds the number of bits to use for Node
	// Remember, you have a total 22 bits to share between Node/Step
	NodeBits uint8 = 10

	// StepBits holds the number of bits to use for Step
	// Remember, you have a total 22 bits to share between Node/Step
	StepBits uint8 = 12

	// DEPRECATED: the below four variables will be removed in a future release.
	mu        sync.Mutex
	nodeMax   int64 = -1 ^ (-1 << NodeBits)
	nodeMask        = nodeMax << StepBits
	stepMask  int64 = -1 ^ (-1 << StepBits)
	timeShift       = NodeBits + StepBits
	nodeShift       = StepBits
)

type Node struct {
    // 互斥锁，保证并发安全
	mu    sync.Mutex
	epoch time.Time
    // 这个 time 是精髓，表示上一次获取 id 所处的毫秒
	time  int64
	node  int64
	step  int64
	nodeMax   int64
	nodeMask  int64
	stepMask  int64
	timeShift uint8
	nodeShift uint8
}

// 核心代码其实就下面几行
// 代码值得学习，很巧妙  

func (n *Node) Generate() ID {
	n.mu.Lock()

	now := time.Since(n.epoch).Nanoseconds() / 1000000

    // 判断本次获取 ID 是否和上次处于同一毫秒
	if now == n.time {
        // step 表示本毫秒的下一个 ID
		n.step = (n.step + 1) & n.stepMask
        // 下一个 ID 为 0，说明已经循环了一次，本毫秒内没有 ID 可用了       
		if n.step == 0 {
            // 自旋直到进入新的毫秒
			for now <= n.time {
				now = time.Since(n.epoch).Nanoseconds() / 1000000
			}
		}
	} else {
        // 新毫秒的第一个 ID
		n.step = 0
	}

	n.time = now

    // 通过位运算来得到 ID，把时间和机器 ID 和该毫秒的第几个 ID 拼凑在一块
	r := ID((now)<<n.timeShift |
		(n.node << n.nodeShift) |
		(n.step),
	)

	n.mu.Unlock()
	return r
}

测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


import (
    "github.com/bwmarrin/snowflake"
    "time"

)

int main() {
	startTime := "2022-12-20"
	var machineID int64 = 1
	st, _ := time.Parse("2006-01-02", startTime)
	// 修改起始时间
	snowflake.Epoch = st.UnixNano() / 1000000
	node, _ := snowflake.NewNode(machineID)
	// 注意是 64 位
	var ID int64 = node.Generate().Int64()
	print(ID)
}

雪花算法和代码实现分析

雪花算法

雪花算法实现

End