Lesson 1.5: 性能分析与优化

PFinal南丞

Lesson 1.5: 性能分析与优化

学习目标

使用 pprof 进行 CPU/内存剖析
使用 go test -bench 编写和运行基准测试
使用 go tool trace 分析 Goroutine 调度
了解 SIMD 优化在 Go 中的应用

1. pprof 性能剖析

集成方式

go

import _ "net/http/pprof"

func main() {
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    // 应用代码...
}

常用命令

bash

# CPU 剖析（采样 30s）
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 堆内存剖析
go tool pprof http://localhost:6060/debug/pprof/heap

# goroutine 分析
go tool pprof http://localhost:6060/debug/pprof/goroutine

# 互斥锁分析
go tool pprof http://localhost:6060/debug/pprof/mutex

Web UI 交互

bash

go tool pprof -http=:8080 ~/pprof/pprof.samples.cpu.001.pb.gz

在 Web UI 中可以查看：

Top：消耗最多的函数
Graph：调用关系图
Flame Graph：火焰图（最直观）
Peek：逐行源码分析

2. 基准测试（Benchmark）

go

// 文件: sum_test.go
func BenchmarkSum(b *testing.B) {
    nums := make([]int, 10000)
    for i := range nums {
        nums[i] = i
    }

    b.ResetTimer() // 重置计时器，排除准备时间

    for i := 0; i < b.N; i++ {
        sum(nums)
    }
}

func sum(nums []int) int {
    var total int
    for _, n := range nums {
        total += n
    }
    return total
}

bash

# 运行基准测试
go test -bench=. -benchmem

# 输出示例：
# BenchmarkSum-8          100000         12345 ns/op        0 B/op        0 allocs/op
#    ↑CPU核数     ↑执行次数    ↑每次耗时       ↑每次内存        ↑每次分配次数

对比优化效果

go

// 优化前
func concatStr(a, b string) string {
    return a + b
}

// 优化后：使用 strings.Builder
func concatStrOptimized(a, b string) string {
    var sb strings.Builder
    sb.Grow(len(a) + len(b))
    sb.WriteString(a)
    sb.WriteString(b)
    return sb.String()
}

3. 执行追踪（trace）

go

import "runtime/trace"

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()

    trace.Start(f)
    defer trace.Stop()

    // 应用代码...
}

bash

go run main.go
go tool trace trace.out  # 打开 Web UI

trace 可以分析：

Goroutine 创建和销毁
网络阻塞等待
系统调用
GC 事件
调度器延迟

4. SIMD 优化

从 Go 1.24+ 开始，Go 通过 math/bits 和 crypto 包利用 CPU 的 SIMD 指令：

go

// 使用 SIMD 加速的 bits 操作（自动优化）
import "math/bits"

// 统计二进制中 1 的个数（硬件加速）
count := bits.OnesCount64(0b10101010)

// 前导零计数
leading := bits.LeadingZeros64(0x00FFFF)

// 并行比较（编译器自动向量化）
func sumFloat64(nums []float64) float64 {
    var sum float64
    for _, n := range nums {
        sum += n
    }
    return sum
}

Go 的 SIMD 现状：

版本	SIMD 能力	说明
Go 1.20	基础	部分标准库自动向量化
Go 1.22	改进	循环优化、边界检查消除
Go 1.24+	Enhanced	更好的自动向量化，新的 SIMD 内建函数支持

5. 性能优化工作流

1. 明确目标
   ↓
2. 设置基准（Benchmark）
   ↓
3. 收集数据（pprof / trace）
   ↓
4. 定位瓶颈（火焰图 / top）
   ↓
5. 提出方案（减少分配 / 优化算法 / 并发改造）
   ↓
6. 实施优化
   ↓
7. 验证效果（对比 Benchmark）
   ↓
8. 如果达标则结束，否则回到步骤 3

练习

对以下函数编写 Benchmark 测试，分别用 pprof 分析瓶颈：
- 字符串拼接（大量小字符串用 += vs strings.Builder）
- 数组求和（for range vs 索引访问）
用 go tool trace 分析一个并发 HTTP 服务的执行轨迹，找到 Goroutine 阻塞最严重的地方。

Lesson 1.5: 性能分析与优化 ​

学习目标 ​

1. pprof 性能剖析 ​

集成方式 ​

常用命令 ​

Web UI 交互 ​

2. 基准测试（Benchmark） ​

对比优化效果 ​

3. 执行追踪（trace） ​

4. SIMD 优化 ​

5. 性能优化工作流 ​

推荐阅读 ​

练习 ​

Lesson 1.5: 性能分析与优化

学习目标

1. pprof 性能剖析

集成方式

常用命令

Web UI 交互

2. 基准测试（Benchmark）

对比优化效果

3. 执行追踪（trace）

4. SIMD 优化

5. 性能优化工作流

推荐阅读

练习