来自百问网嵌入式Linux wiki







   ${PERF} \
   trace-cmd \
   blktrace \
PERF = "perf"


Board $> simpleperf --help
Usage: simpleperf [common options] subcommand [args_for_subcommand]
common options:
    -h/--help     Print this help information.
    --log <severity> Set the minimum severity of logging. Possible severities
                       include verbose, debug, warning, info, error, fatal.
                       Default is info.
    --version     Print version of simpleperf.
    debug-unwind        Debug/test offline unwinding.
    dump                dump perf record file
    help                print help information for simpleperf
    kmem                collect kernel memory allocation information
    list                list available event types
    record              record sampling info in
    report              report sampling information in
    report-sample       report raw sample information in
    stat                gather performance counter information


Board $> which perf


usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

The most commonly used perf commands are:
  annotate        Reads (created by perf record) and displays annotated code
  archive         Creates archive with object files with build-ids found in file
  bench           General framework for benchmark suites
  buildid-cache   Manages build-id cache.
  buildid-list    Lists the buildids in a file
  c2c             Shared Data C2C/HITM Analyzer.
  config          Gets and sets variables in a configuration file.
  data            Data file related processing
  diff            Reads files and displays the differential profile
  evlist          Lists the event names in a file
  ftrace          simple wrapper for kernel's ftrace functionality
  inject          Filters to augment the events stream with additional information
  kallsyms        Searches running kernel for symbols
  kmem            Tool to trace/measure kernel memory properties
  kvm             Tool to trace/measure kvm guest os
  list            Lists all symbolic event types
  lock            Analyzes lock events
  mem             Profiles memory accesses
  record          Runs a command and records its profile into
  report          Reads (created by perf record) and displays the profile
  sched           Tool to trace/measure scheduler properties (latencies)
  script          Reads (created by perf record) and displays trace output
  stat            Runs a command and gathers performance counter statistics
  test            Runs sanity tests.
  timechart       Tool to visualize total system behavior during a workload
  top             System profiling tool.
  probe           Defines new dynamic tracepoints

See 'perf COMMAND -h' for more information on a specific command.
  • perf top(Linux内核文档[3]):通过计算循环事件数来提供CPU负载;默认顺序是每个符号的采样数降序:
Board $> perf top
 40.62%  [kernel]                           [k] v7_dma_inv_range
 18.65%  [kernel]                           [k] _raw_spin_unlock_irqrestore
 17.01%  [kernel]                           [k] arch_cpu_idle
  8.27%  [kernel]                           [k] v7_dma_clean_range
  5.00%  [kernel]                           [k] rcu_idle_exit
  1.70%  [kernel]                           [k] cpu_startup_entry
  0.52%  [kernel]                           [k] trace_graph_return
  0.48%  [kernel]                           [k] finish_task_switch
  0.48%                       [.] memcpy
  0.47%  [kernel]                           [k] trace_graph_entry
sage: perf top [<options>]
   -s, --sort <key[,key2...]>
              sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer to the main page for the complete list.
  • perf stat(linux kernel documentation[5]): 获取事件计数
Board $> perf stat hello_world_example
User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0 
User space example: goodbye from STMicroelectronics
Performance counter stats for 'hello_world_example':

         4.328249      task-clock (msec)         #    0.000 CPUs utilized          
               11      context-switches          #    0.003 M/sec                  
                0      cpu-migrations            #    0.000 K/sec                  
               38      page-faults               #    0.009 M/sec                  
          2710036      cycles                    #    0.626 GHz                    
           640856      instructions              #    0.24  insn per cycle         
            75644      branches                  #   17.477 M/sec                  
            21764      branch-misses             #   28.77% of all branches        

     11.109859338 seconds time elapsed
  • perf list (Linux kernel documentation[7]): 支持的符号事件类型.
Board $> perf list
 branch-instructions OR branches                    [Hardware event]
 branch-misses                                      [Hardware event]
 bus-cycles                                         [Hardware event]
 cache-misses                                       [Hardware event]
 cache-references                                   [Hardware event]
 cpu-cycles OR cycles                               [Hardware event]
 instructions                                       [Hardware event]
 alignment-faults                                   [Software event]
 bpf-output                                         [Software event]
 context-switches OR cs                             [Software event]
 cpu-clock                                          [Software event]
 cpu-migrations OR migrations                       [Software event]
 dummy                                              [Software event]
 emulation-faults                                   [Software event]
 major-faults                                       [Software event]
 minor-faults                                       [Software event]
 page-faults OR faults                              [Software event]
 task-clock                                         [Software event]
 L1-dcache-load-misses                              [Hardware cache event]
 L1-dcache-loads                                    [Hardware cache event]
 L1-dcache-store-misses                             [Hardware cache event]
 L1-dcache-stores                                   [Hardware cache event]
 L1-icache-load-misses                              [Hardware cache event]
 L1-icache-loads                                    [Hardware cache event]
 LLC-load-misses                                    [Hardware cache event]
 LLC-loads                                          [Hardware cache event]
 LLC-store-misses                                   [Hardware cache event]
 LLC-stores                                         [Hardware cache event]
 branch-load-misses                                 [Hardware cache event]
 branch-loads                                       [Hardware cache event]
 dTLB-load-misses                                   [Hardware cache event]
 dTLB-store-misses                                  [Hardware cache event]
 iTLB-load-misses                                   [Hardware cache event]
 armv7_cortex_a7/br_immed_retired/                  [Kernel PMU event]
 armv7_cortex_a7/br_mis_pred/                       [Kernel PMU event]
 armv7_cortex_a7/br_pred/                           [Kernel PMU event]
 armv7_cortex_a7/br_return_retired/                 [Kernel PMU event]
 armv7_cortex_a7/bus_access/                        [Kernel PMU event]
 armv7_cortex_a7/bus_cycles/                        [Kernel PMU event]
 armv7_cortex_a7/cid_write_retired/                 [Kernel PMU event]
 armv7_cortex_a7/cpu_cycles/                        [Kernel PMU event]
 armv7_cortex_a7/exc_return/                        [Kernel PMU event]
 armv7_cortex_a7/exc_taken/                         [Kernel PMU event]
 armv7_cortex_a7/inst_retired/                      [Kernel PMU event]
 armv7_cortex_a7/inst_spec/                         [Kernel PMU event]
 armv7_cortex_a7/l1d_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l1d_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l1d_cache_wb/                      [Kernel PMU event]
 armv7_cortex_a7/l1d_tlb_refill/                    [Kernel PMU event]
 armv7_cortex_a7/l1i_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l1i_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l1i_tlb_refill/                    [Kernel PMU event]
 armv7_cortex_a7/l2d_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l2d_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l2d_cache_wb/                      [Kernel PMU event]
 armv7_cortex_a7/ld_retired/                        [Kernel PMU event]
 armv7_cortex_a7/mem_access/                        [Kernel PMU event]
 armv7_cortex_a7/memory_error/                      [Kernel PMU event]
 armv7_cortex_a7/pc_write_retired/                  [Kernel PMU event]
 armv7_cortex_a7/st_retired/                        [Kernel PMU event]
 armv7_cortex_a7/sw_incr/                           [Kernel PMU event]
 armv7_cortex_a7/ttbr_write_retired/                [Kernel PMU event]
 armv7_cortex_a7/unaligned_ldst_retired/            [Kernel PMU event]
 rNNN                                               [Raw hardware event descriptor]
 cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
 mem:<addr>[/len][:access]                          [Hardware breakpoint]
 alarmtimer:alarmtimer_cancel                       [Tracepoint event]
 alarmtimer:alarmtimer_fired                        [Tracepoint event]
 alarmtimer:alarmtimer_start                        [Tracepoint event]
 alarmtimer:alarmtimer_suspend                      [Tracepoint event]
 asoc:snd_soc_bias_level_done                       [Tracepoint event]
 asoc:snd_soc_bias_level_start                      [Tracepoint event]
 asoc:snd_soc_dapm_connected                        [Tracepoint event]
 asoc:snd_soc_dapm_done                             [Tracepoint event]
 asoc:snd_soc_dapm_path                             [Tracepoint event]
 asoc:snd_soc_dapm_start                            [Tracepoint event]
 asoc:snd_soc_dapm_walk_done                        [Tracepoint event]
 asoc:snd_soc_dapm_widget_event_done                [Tracepoint event]
 asoc:snd_soc_dapm_widget_event_start               [Tracepoint event]
 xhci-hcd:xhci_inc_enq                              [Tracepoint event]
 xhci-hcd:xhci_queue_trb                            [Tracepoint event]
 xhci-hcd:xhci_ring_alloc                           [Tracepoint event]
 xhci-hcd:xhci_ring_expansion                       [Tracepoint event]
 xhci-hcd:xhci_ring_free                            [Tracepoint event]
 xhci-hcd:xhci_setup_addressable_virt_device        [Tracepoint event]
 xhci-hcd:xhci_setup_device                         [Tracepoint event]
 xhci-hcd:xhci_setup_device_slot                    [Tracepoint event]
 xhci-hcd:xhci_stop_device                          [Tracepoint event]
 xhci-hcd:xhci_urb_dequeue                          [Tracepoint event]
 xhci-hcd:xhci_urb_enqueue                          [Tracepoint event]
 xhci-hcd:xhci_urb_giveback                         [Tracepoint event]
  • perf record (Linux kernel documentation[8]): 记录事件以供以后报告
Board $> perf record hello_world_example 

User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0 
User space example: goodbye from STMicroelectronics
[ perf record: Woken up 1 time to write data ]
[ perf record: Captured and wrote 0.004 MB (28 samples) ]
这可以过滤事件(由perf list命令提供)。有关更多信息,选项和示例,请参见[9]
默认情况下,事件记录在perf.data文件中。如果要指定另一个输出文件名,则必须添加-o,--output <file>选项。
  • perf report (Linux kernel documentation[10]): 按进程,功能等细分事件。.
Example after previous command "perf record hello_world_example"
Board $> perf report
Samples: 28  of event 'cycles:ppp', Event count (approx.):2737925                
Overhead  Command          Shared Object Symbol                   
  12.66%  hello_world_exa         [.] _dl_relocate_object
  11.71%  hello_world_exa  [kernel.kallsyms]  [k] filemap_map_pages
  10.65%  hello_world_exa  [kernel.kallsyms]  [k] n_tty_write
   6.43%  hello_world_exa  [kernel.kallsyms]  [k] percpu_counter_add_batch
   6.43%  hello_world_exa         [.] sbrk
   6.24%  hello_world_exa  [kernel.kallsyms]  [k] cpu_v7_set_pte_ext
   5.56%  hello_world_exa  [kernel.kallsyms]  [k] alloc_set_pte
   5.56%  hello_world_exa       [.] __sbrk
   5.37%  hello_world_exa  [kernel.kallsyms]  [k] __vma_link_file
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] __fput
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] ldsem_up_read
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] unmap_page_range
   5.32%  hello_world_exa       [.] printf
   5.24%  hello_world_exa  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
   2.23%  hello_world_exa  [kernel.kallsyms]  [k] perf_event_mmap
   0.48%  hello_world_exa  [kernel.kallsyms]  [k] perf_output_begin
   0.13%  perf             [kernel.kallsyms]  [k] perf_event_exec

默认情况下,报告文件perf.data被读取为输入文件。如果要指定另一个输入文件名,则必须添加-i,--input <file> option。 有关更多信息和示例,请参见[11]

  • perf bench (Linux kernel documentation[12]): 运行不同的内核微基准测试:
# List of all available benchmark collections:

        sched: Scheduler and IPC benchmarks
          mem: Memory access benchmarks
        futex: Futex stressing benchmarks
          all: All benchmarks
Example of getting memcpy benchmark for 100MB:
Board $> perf bench mem memcpy --size 100MB
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 100MB bytes ...

       1.426138 GB/sec


作为Flame Graphs [14 ]的一部分,可以可视化来自perf的痕迹

作为Flame Graphs[14]的一部分, 用来可视化perf的trace数据分析.

Perf 002.png

  • 在主机端安装Flame Graph工具套件。
PC $> cd <your_local_path>
PC $> git clone
PC $> cd FlameGraph
  • 从perf tool生成火焰图


- 在板上执行perf record命令 
Board $> perf record -a -g top
Board $> perf script > perf_top.out

- 复制perf_top.out到主机PC中(即在FlameGraph目录中)
- 使用stackcollapse-perf.pl脚本在主机PC端执行火焰图生成命令。
PC $> ./ perf_top.out > out.top_folded

- 使用flamegraph.pl渲染SVG(可缩放矢量图形)文件。
PC $> ./ out.top_folded > top.svg

- 例如使用网络浏览器查看SVG
PC $> firefox top.svg