[转载] iOS Crash 治理:淘宝VisionKitCore 问题修复

原文地址

iOS Crash 治理:淘宝VisionKitCore 问题修复

本文通过逆向系统,阅读汇编指令,逐步找到源码,定位到了 iOS 16.0.<iOS 16.2 WKWebView 的系统bug 。同时苹果已经在新版本修复了 Bug,对于巨大的存量用户,仍旧会造成日均 Crash pv 1200+ uv 1000+, 最终通过 Hook 系统行为,规避此 Bug。在手机淘宝双 11 版本中已经彻底修复,Crash 跌 0。

descript

背景

手机淘宝的 Crash 率(Crash+Abort)维持在了 x% 左右一两年的时间了,今年组织又提出了更高的要求,努力把 Crash 再降一降,
我也参与到了其中,我在其中负责几个疑难杂症,有幸定位解决了一些操作系统的 Bug。本文将Crash在 VisionKitCore 的系统 Bug
调研过程以及解决方案记录一下。 

descript

Crash 信息

堆栈特征:

descript

Noteable Address 特征:

descript

额外信息:(观察到都是图文详情)

PS 有水印不方便透出, 额外信息为 改造 KSCrash 附带的当前页面信息。

版本特征:

descript

crash 占比:有堆栈 Crash 第三名

以上简单信息已经可以佐证,首先这大概率是一个操作系统 Bug,
并且由于前期念纪大佬治理了较多业务堆栈问题,这个疑难杂症已经登上了 Crash(有堆栈)的排行榜 Top 3 了,必须要投入解决了。

descript

排查定位

先在苹果论坛搜索了下这个 Crash 堆栈,发现果然有人反馈过这个 Crash。

发现去年苹果论坛有人反馈是因为在webview 长按复制图片的逻辑中触发了这个 bug,有位用户反馈了,禁用掉这个 WKWebview 长按手势就可以规避掉这个 Crash(其实不行)。基于以上信息进行测试,并且从 平台找到一个用户访问的图文详情尝试寻找堆栈。

1
2
3
WKWebView *webview = [[WKWebView alloc] initWithFrame:self.view.bounds];
[self.view addSubview:webview];
[webview loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://url"]] ];

论坛用户描述的是:禁用长按就不会 crash,但是我测试下来,禁用长按只会让 wkwebview 不创建选择框,但是还是会走创建图片的逻辑,同时手机淘宝的 WebView 容器禁用掉了默认的长按选择框,只实现了一个保存图片的功能,因此这个帖子的解决办法并不能解决手机淘宝的bug。

刚好今年系统性学习了下 Arm 64 汇编,刚好锻炼下新掌握的知识,从底层找下
Bug、简要堆栈。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

Incident Identifier: 9DAC8C95-D65D-4AA2-BF12-D36DC1A7F3B8
CrashReporter Key: KSCrash2
Hardware Model: iPhone14,2
Process: Taobao4iPhone [20565]
Path: /private/var/containers/Bundle/Application/36FBCF28-38AA-40B3-8234-EDAE1B3D6611/Taobao4iPhone.app/Taobao4iPhone
Identifier: TBXDetailViewController|com.taobao.taobao4iphone
Version: 31863389 (10.27.40)
Code Type: ARM-64
Parent Process: ? [1]

Date/Time: 2023-09-11 21:32:18 +0800
Launch Time: 2023-09-11 21:27:01 +0800
OS Version: iOS 16.1.1 (20B101)
Report Version: 104

Exception Type: EXC_BAD_ACCESS
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000173fc8000
Exception Subtype: SIGSEGV
Triggered by Thread: 99

Thread 99 Crashed:
0 libsystem_platform.dylib 0x000000021350e930 0x21350e000 + 2352 _platform_memmove :96 (in libsystem_platform.dylib)
1 CoreGraphics 0x00000001c8159988 0x1c80f3000 + 420232 _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2 CoreGraphics 0x00000001c8142648 0x1c80f3000 + 325192 _CGBitmapContextCreateImage :216 (in CoreGraphics)
3 VisionKitCore 0x0000000208405ad0 0x2083fa000 + 47824 -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4 VisionKitCore 0x0000000208405880 0x2083fa000 + 47232 -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5 VisionKitCore 0x000000020849da98 0x2083fa000 + 670360 __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)
6 VisionKitCore 0x0000000208473a5c 0x2083fa000 + 498268 __63-[VKCRemoveBackgroundRequestHandler performRequest:completion:]_block_invoke.5 :436 (in VisionKitCore)
7 MediaAnalysisServices 0x0000000209847968 0x209840000 + 31080 __92-[MADService performRequests:onPixelBuffer:withOrientation:andIdentifier:completionHandler:]_block_invoke.38 :400 (in MediaAnalysisServices)
8 CoreFoundation 0x00000001c65b8704 0x1c6544000 + 476932 __invoking___ :148 (in CoreFoundation)
9 CoreFoundation 0x00000001c6564b6c 0x1c6544000 + 133996 -[NSInvocation invoke] :428 (in CoreFoundation)
10 Foundation 0x00000001c09c5b08 0x1c0924000 + 662280 __NSXPCCONNECTION_IS_CALLING_OUT_TO_REPLY_BLOCK__ :16 (in Foundation)
11 Foundation 0x00000001c0996ef0 0x1c0924000 + 470768 -[NSXPCConnection _decodeAndInvokeReplyBlockWithEvent:sequence:replyInfo:] :520 (in Foundation)
12 Foundation 0x00000001c0f702e4 0x1c0924000 + 6603492 __88-[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:]_block_invoke_5 :188 (in Foundation)
13 libxpc.dylib 0x0000000213604f1c 0x2135e7000 + 122652 _xpc_connection_reply_callout :124 (in libxpc.dylib)
14 libxpc.dylib 0x00000002135f7fb4 0x2135e7000 + 69556 _xpc_connection_call_reply_async :88 (in libxpc.dylib)
15 libdispatch.dylib 0x00000001cdb1e05c 0x1cdb1a000 + 16476 _dispatch_client_callout3 :20 (in libdispatch.dylib)
16 libdispatch.dylib 0x00000001cdb3bf58 0x1cdb1a000 + 139096 _dispatch_mach_msg_async_reply_invoke :344 (in libdispatch.dylib)
17 libdispatch.dylib 0x00000001cdb2556c 0x1cdb1a000 + 46444 _dispatch_lane_serial_drain :376 (in libdispatch.dylib)
18 libdispatch.dylib 0x00000001cdb26214 0x1cdb1a000 + 49684 _dispatch_lane_invoke :436 (in libdispatch.dylib)
19 libdispatch.dylib 0x00000001cdb30e10 0x1cdb1a000 + 93712 _dispatch_workloop_worker_thread :652 (in libdispatch.dylib)
20 libsystem_pthread.dylib 0x00000002135a3df8 0x2135a3000 + 3576 _pthread_wqthread :288 (in libsystem_pthread.dylib)
21 libsystem_pthread.dylib 0x00000002135a3b98 0x2135a3000 + 2968 _start_wqthread :8 (in libsystem_pthread.dylib)
Thread State:
x8:0xe361afd13768009c x9:0xe361afd13768009c lr:0x00000001c8155de0 fp:0x000000016bcae040
x10:0x0000000000000090 x12:0x0000000115204f70 x11:0x000000021cbc2268 x14:0x000000021dfff180
x13:0x000000021dfff160 x16:0x000000021350e8d0 x15:0x00000000e781489a sp:0x000000016bcadfb0
x18:0x0000000000000000 x17:0x000000021e003320 x19:0x0000000000143c00 cpsr:0x0000000020001000
pc:0x000000021350e930 x21:0x0000000000148000 x20:0x0000000173e846c8 x0:0x000000016fe8c6c8
x23:0x000000016fe8c000 x1:0x0000000173fc8000 x22:0x000000016fe8c6c8 x2:0x00000000000002a8
x25:0x0000000000000020 x3:0x000000016ffd0000 x24:0x000000021cbae570 x4:0x0000000003ff8000
x27:0x0000000000000000 x5:0x0000000000000018 x26:0x0000000000000008 x6:0x000000000000002c
x7:0x0000000000000000 x28:0x000000028040e180
Binary Images:
0x0000000104638000 - 0x000000010c70bfff Taobao4iPhone arm64 <23be6181e1c43ce9a6b37d61de01bab3> /private/var/containers/Bundle/Application/36FBCF28-38AA-40B3-8234-EDAE1B3D6611/Taobao4iPhone.app/Taobao4iPhone
0x000000021350e000 - 0x0000000213514ff3 libsystem_platform.dylib arm64e <29a26364acef38c28b0ddb0dfca0bb65> /usr/lib/system/libsystem_platform.dylib
0x00000001c80f3000 - 0x00000001c8700ff3 CoreGraphics arm64e <ffb3f1e74e3b3ff79d00be32c9d8133c> /System/Library/Frameworks/CoreGraphics.framework/CoreGraphics
0x00000002083fa000 - 0x0000000208500fff VisionKitCore arm64e <ce997b5ba4b03818bba22d7f057bc3a2> /System/Library/PrivateFrameworks/VisionKitCore.framework/VisionKitCore
0x0000000209840000 - 0x000000020985dfff MediaAnalysisServices arm64e <0c75ee56f3343b8ca96080651906e0dd> /System/Library/PrivateFrameworks/MediaAnalysisServices.framework/MediaAnalysisServices
0x00000001c6544000 - 0x00000001c6929fff CoreFoundation arm64e <5cdc5d9ae5063740b64ebb30867b4f1b> /System/Library/Frameworks/CoreFoundation.framework/CoreFoundation
0x00000001c0924000 - 0x00000001c126dfff Foundation arm64e <c431acb6fe043d28b6774de6e1c7d81f> /System/Library/Frameworks/Foundation.framework/Foundation


Notable Addresses:
memory near x0:
0x000000016fe8c678: 0000000000000000 0000000000000000 ................
0x000000016fe8c688: 0000000000000000 0000000000000000 ................
0x000000016fe8c698: 0000000000000000 0000000000000000 ................
0x000000016fe8c6a8: 0000000000000000 0000000000000000 ................
0x000000016fe8c6b8: 0000000000000000 0000000000000000 ................
->0x000000016fe8c6c8: 878787a3aaaaaac5 a9a9a9c5b5b5b5d2 ................
0x000000016fe8c6d8: cbcbcbe7d5d5d5f0 d6d6d6f1d4d4d4f1 ................
0x000000016fe8c6e8: d4d4d4f1d6d6d6f4 d8d8d8f7d8d8d8f8 ................
[0xf8d8d8d8f7d8d8d8: [objc_object: NSString()]]
0x000000016fe8c6f8: d9d9d9fadadadafb dbdbdbfbdcdcdcfc ................
0x000000016fe8c708: dededefddfdfdffe dfdfdffedfdfdffe ................
0x000000016fe8c718: dfdfdffedfdfdffe dfdfdfffe0e0e0ff ................
0x000000016fe8c728: e0e0e0ffe0e0e0ff e0e0e0ffdfdfdfff ................
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
[0xffdfdfdfffe0e0e0: [objc_object: NSString(Hh-e2cRJ)]]
0x000000016fe8c738: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
0x000000016fe8c748: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
0x000000016fe8c758: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
[0xffe0e0e0ffe0e0e0: [objc_object: NSString(NATY2cRJ)]]
memory near x1:
0x0000000173fc7fb0: 0000000000000000 0000000000000000 ................
0x0000000173fc7fc0: 0000000000000000 0000000000000000 ................
0x0000000173fc7fd0: 0000000000000000 0000000000000000 ................
0x0000000173fc7fe0: 0000000000000000 0000000000000000 ................
0x0000000173fc7ff0: 0000000000000000 0000000000000000 ................
memory near x3:
0x000000016ffcffb0: 0000000000000000 0000000000000000 ................
0x000000016ffcffc0: 0000000000000000 0000000000000000 ................
0x000000016ffcffd0: 0000000000000000 0000000000000000 ................
0x000000016ffcffe0: 0000000000000000 0000000000000000 ................
0x000000016ffcfff0: 0000000000000000 0000000000000000 ................
->0x000000016ffd0000: 0000000000000000 0000000000000000 ................
0x000000016ffd0010: 0000000000000000 0000000000000000 ................
0x000000016ffd0020: 0000000000000000 0000000000000000 ................
0x000000016ffd0030: 0000000000000000 0000000000000000 ................
0x000000016ffd0040: 0000000000000000 0000000000000000 ................
0x000000016ffd0050: 0000000000000000 0000000000000000 ................
0x000000016ffd0060: 0000000000000000 0000000000000000 ................
0x000000016ffd0070: 0000000000000000 0000000000000000 ................
0x000000016ffd0080: 0000000000000000 0000000000000000 ................
0x000000016ffd0090: 0000000000000000 0000000000000000 ................

图文详情链接:
https://xxxx.xx.com

▐  分析关键函数汇编指令

函数调用栈为:

1
2
3
4
5
6
7

0 libsystem_platform.dylib 0x00000001fb27a930 _platform_memmove :96 (in libsystem_platform.dylib)
1 CoreGraphics 0x00000001afec1988 _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2 CoreGraphics 0x00000001afeaa648 _CGBitmapContextCreateImage :216 (in CoreGraphics)
3 VisionKitCore 0x00000001f0171ad0 -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4 VisionKitCore 0x00000001f0171880 -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5 VisionKitCore 0x00000001f0209a98 __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)

基础知识:

Arm 64 调用约定及传参规范

针对本文,只需要了解到,

  1. x0..x7 是函数调用时传递参数使用到的通用寄存器,分别为第 1 个 到 第 7 个标量参数

    1. v0-v8 是128位浮点计数器,d0-d7 只取 低 8 字节浮点数,用于传递第 1 个 到 第 7 个浮点数参数

    2. x29 为 fp 寄存器,指向栈底

    3. x30 为 lr 寄存器,记录函数调用返回地址

  2. 符号化,必须要选择与出现问题的操作系统一样的版本幸好万瑜 老师手里有一台 iOS 16.1.1 的手机。

libSystem_platform _platform_memmove 分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

__platform_memmove: (x0: dest, x1: src, x2: count)
00000001d3d628d0 sub x3, x0, x1 ; x3 = x0 - x1
00000001d3d628d4 cmp x3, x2 ; x3 < x2?
00000001d3d628d8 b.lo 0x1d3d62aa0; 看起来是判断 src的尾部 和 dest 有没有重叠 本例没有满足小于
00000001d3d628dc mov x3, x0; x3 = x0
00000001d3d628e0 cmp x2, #0x40; x2 - 0x40?
00000001d3d628e4 b.lo 0x1d3d62a7c ; 判断count 有没有小于 0x40, 本例没有满足小于
00000001d3d628e8 sub x4, x1, x0 ; x4 = x1 - x0
00000001d3d628ec cmp x4, x2; x4 - x2 ; 看起来是判断 dest 的尾部 和 src 有没有重叠
00000001d3d628f0 b.lo 0x1d3d629b4; 也没有满足
00000001d3d628f4 cmp x2, #0x4, lsl #12; 比较 count 是否小于 #0x4000,
00000001d3d628f8 b.lo 0x1d3d62958; 本例也没有小于
00000001d3d628fc add x3, x3, #0x20
00000001d3d62900 and x3, x3, #0xffffffffffffffe0


00000001d3d62904 ldnp q2, q3, [x1]
00000001d3d62908 sub x5, x3, x0
00000001d3d6290c add x1, x1, x5
00000001d3d62910 ldnp q0, q1, [x1]
00000001d3d62914 add x1, x1, #0x20

00000001d3d62918 sub x2, x2, x5
00000001d3d6291c stnp q2, q3, [x0]
00000001d3d62920 subs x2, x2, #0x40
00000001d3d62924 b.ls 0x1d3d62940
00000001d3d62928 stnp q0, q1, [x3]
00000001d3d6292c add x3, x3, #0x20
00000001d3d62930 ldnp q0, q1, [x1]
; 崩溃第 16 行堆栈 这里 x1 的地址是 0x0000000173fc8000

near x0附近全是 0000

通过分析,可以看到  __platform_memmove,的代码是一个较为常见的 memove 或者 memcopy的实现,有一些首尾重叠校验,最终 Crash 的时候 发现 X1 寄存器的内存地址指向了一块数据,这快数据出现了异常。继续往看。

  • _CGDataProviderCreateWithCopyOfData

这里发现 _CGDataProviderCreateWithCopyOfData 地址跳转的是 _create_protected_copy,(⊙o⊙)…

神奇的是 Crash 堆栈里面并没有这个函数调用栈。并且_create_protected_copy 也没有找到任何关于 _platform_memmove 的 b、br、bl 调用,难道是这堆栈有点问题?

descript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
CoreGraphics`create_protected_copy:
-> 0x19c8a1cd0 <+0>: pacibsp
0x19c8a1cd4 <+4>: sub sp, sp, #0xa0
0x19c8a1cd8 <+8>: stp x24, x23, [sp, #0x60]
0x19c8a1cdc <+12>: stp x22, x21, [sp, #0x70]
0x19c8a1ce0 <+16>: stp x20, x19, [sp, #0x80]
0x19c8a1ce4 <+20>: stp x29, x30, [sp, #0x90]
0x19c8a1ce8 <+24>: add x29, sp, #0x90
0x19c8a1cec <+28>: mov x21, #0x0
0x19c8a1cf0 <+32>: cbz x0, 0x19c8a1e5c ; <+396>
0x19c8a1cf4 <+36>: mov x19, x1
0x19c8a1cf8 <+40>: cbz x1, 0x19c8a1e5c ; <+396>
0x19c8a1cfc <+44>: mov x20, x0
0x19c8a1d00 <+48>: adrp x8, 341894
0x19c8a1d04 <+52>: ldr x8, [x8, #0xaa0]
0x19c8a1d08 <+56>: ldr x8, [x8]
0x19c8a1d0c <+60>: cmp x8, x19
0x19c8a1d10 <+64>: b.ls 0x19c8a1d48 ; <+120>
0x19c8a1d14 <+68>: mov x0, #0x0
0x19c8a1d18 <+72>: mov x1, x20
0x19c8a1d1c <+76>: mov x2, x19
0x19c8a1d20 <+80>: ldp x29, x30, [sp, #0x90]
0x19c8a1d24 <+84>: ldp x20, x19, [sp, #0x80]
0x19c8a1d28 <+88>: ldp x22, x21, [sp, #0x70]
0x19c8a1d2c <+92>: ldp x24, x23, [sp, #0x60]
0x19c8a1d30 <+96>: add sp, sp, #0xa0
0x19c8a1d34 <+100>: autibsp
0x19c8a1d38 <+104>: eor x16, x30, x30, lsl #1
0x19c8a1d3c <+108>: tbz x16, #0x3e, 0x19c8a1d44 ; <+116>
0x19c8a1d40 <+112>: brk #0xc471
0x19c8a1d44 <+116>: b 0x1a076cd80
0x19c8a1d48 <+120>: neg x9, x8
0x19c8a1d4c <+124>: and x22, x9, x20
0x19c8a1d50 <+128>: add x10, x19, x20
0x19c8a1d54 <+132>: add x8, x10, x8
0x19c8a1d58 <+136>: sub x8, x8, #0x1
0x19c8a1d5c <+140>: and x8, x8, x9
0x19c8a1d60 <+144>: sub x21, x8, x22
0x19c8a1d64 <+148>: mov x0, #0x0
0x19c8a1d68 <+152>: mov x1, x21
0x19c8a1d6c <+156>: mov w2, #0x3
0x19c8a1d70 <+160>: mov w3, #0x1002
0x19c8a1d74 <+164>: mov w4, #0x36000000
0x19c8a1d78 <+168>: mov x5, #0x0
0x19c8a1d7c <+172>: bl 0x1a076e5c0
0x19c8a1d80 <+176>: cmn x0, #0x1
0x19c8a1d84 <+180>: b.eq 0x19c8a1e58 ; <+392>
0x19c8a1d88 <+184>: mov x23, x0
0x19c8a1d8c <+188>: adrp x24, 341894
0x19c8a1d90 <+192>: ldr x24, [x24, #0xa80]
0x19c8a1d94 <+196>: ldr w0, [x24]
0x19c8a1d98 <+200>: mov x1, x22
0x19c8a1d9c <+204>: mov x2, x21
0x19c8a1da0 <+208>: mov x3, x23
0x19c8a1da4 <+212>: bl 0x1a076ecd0
0x19c8a1da8 <+216>: sub x8, x20, x22
0x19c8a1dac <+220>: add x22, x8, x23
0x19c8a1db0 <+224>: cbz w0, 0x19c8a1de0 ; <+272>
0x19c8a1db4 <+228>: adrp x8, 1405
0x19c8a1db8 <+232>: add x8, x8, #0xed9 ; "copy_read_only"
0x19c8a1dbc <+236>: stp x8, x0, [sp]
0x19c8a1dc0 <+240>: adrp x1, 1405
0x19c8a1dc4 <+244>: add x1, x1, #0xeba ; "%s: vm_copy failed: status %d."
0x19c8a1dc8 <+248>: mov w0, #0x0
0x19c8a1dcc <+252>: bl 0x19cb1ffcc ; CGLog
0x19c8a1dd0 <+256>: mov x0, x22
0x19c8a1dd4 <+260>: mov x1, x20
0x19c8a1dd8 <+264>: mov x2, x19
0x19c8a1ddc <+268>: bl 0x19cc48f80 ; symbol stub for: memcpy
0x19c8a1de0 <+272>: ldr w0, [x24]
0x19c8a1de4 <+276>: mov x1, x22
0x19c8a1de8 <+280>: mov x2, x19
0x19c8a1dec <+284>: mov w3, #0x1
0x19c8a1df0 <+288>: mov w4, #0x1
0x19c8a1df4 <+292>: bl 0x1a076ecf0
0x19c8a1df8 <+296>: cbz x22, 0x19c8a1e58 ; <+392>
0x19c8a1dfc <+300>: cmp x22, x20
0x19c8a1e00 <+304>: b.eq 0x19c8a1d14 ; <+68>
0x19c8a1e04 <+308>: movi.2d v0, #0000000000000000
0x19c8a1e08 <+312>: stp q0, q0, [sp, #0x30]
0x19c8a1e0c <+316>: stp q0, q0, [sp, #0x10]
0x19c8a1e10 <+320>: str x21, [sp, #0x18]
0x19c8a1e14 <+324>: adrp x16, -67
0x19c8a1e18 <+328>: add x16, x16, #0x544 ; vm_allocator_deallocate
0x19c8a1e1c <+332>: paciza x16
0x19c8a1e20 <+336>: stp x16, xzr, [sp, #0x48]
0x19c8a1e24 <+340>: add x1, sp, #0x10
0x19c8a1e28 <+344>: mov x0, #0x0
0x19c8a1e2c <+348>: bl 0x1a076cb00
0x19c8a1e30 <+352>: mov x20, x0
0x19c8a1e34 <+356>: mov x0, #0x0
0x19c8a1e38 <+360>: mov x1, x22
0x19c8a1e3c <+364>: mov x2, x19
0x19c8a1e40 <+368>: mov x3, x20
0x19c8a1e44 <+372>: bl 0x1a076cdb0
0x19c8a1e48 <+376>: mov x21, x0
0x19c8a1e4c <+380>: mov x0, x20
0x19c8a1e50 <+384>: bl 0x1a076d200
0x19c8a1e54 <+388>: b 0x19c8a1e5c ; <+396>
0x19c8a1e58 <+392>: mov x21, #0x0
0x19c8a1e5c <+396>: mov x0, x21
0x19c8a1e60 <+400>: ldp x29, x30, [sp, #0x90]
0x19c8a1e64 <+404>: ldp x20, x19, [sp, #0x80]
0x19c8a1e68 <+408>: ldp x22, x21, [sp, #0x70]
0x19c8a1e6c <+412>: ldp x24, x23, [sp, #0x60]
0x19c8a1e70 <+416>: add sp, sp, #0xa0
0x19c8a1e74 <+420>: retab
  • Arm64 Crash 堆栈解析

最后经过请教了大佬同事,补充了一个知识盲区,x86_64的调用约定里面强制要求函数调用时需要将 pc 的下一行地址(返回地址)入栈,因此只需要遍历栈即可获取正确的函数调用栈。

但 Arm 64 体系结构中使用 LR
寄存器存放函数返回地址,如果当前函数也需要调用其他函数,就需要再 prolog 里面保存 lr 寄存器的地址。这也是大家经常在函数调用栈开始看到的模版代码:

1
2
3
4
5
6
7
8
9

WKCopy`+[ViewController load]:
0x100b0c000 <+0>: sub sp, sp, #0x20 // 栈增长
0x100b0c004 <+4>: stp x29, x30, [sp, #0x10] // 旧 lr 和 fp 存栈
0x100b0c008 <+8>: add x29, sp, #0x10 // fp 指向 新的栈底
do some thing
0x100b0c020 <+32>: ldp x29, x30, [sp, #0x10] // 恢复 旧的 lr 和 fp
0x100b0c024 <+36>: add sp, sp, #0x20 // 栈缩小
0x100b0c028 <+40>: ret // 返回上个调用栈

但是由于并不是所有函数都使用栈,这类函数叫 FrameLess 函数。比如 memset. memove memcpy 这类函数通常的逻辑都是通过一个来源地址,每次拷贝一部分数据到寄存器,然后再从寄存器复制到目标地址中,并且地址长度增长到某个长度截止。

同时 Arm64 中还有一类不返回跳转指令,比如 b/br 一般用于桩指令。

在一些尾递归场景中为了省去不必要的返回(当函数发现我调用下一个函数没必要回来)也会直接使用 b 指令来进行优化。其实最常见的就是 msg_send 既用到了尾调用优化,又是 frameless 函数。

当进程 Crash 时,KSCrash 会对函数调用堆栈进行回溯如果函数是
FrameLess函数,规则会有一定细节处理具体来说就是:

  1. 崩溃当前函数,直接用 pc 地址,获取最后一个函数栈帧,获取起始范围,
  2. 遍历 上一个函数栈,通过 ldp fp, lr, x29 取出来 lr 计算函数栈
  3. 递归执行2,当lr执行到0的时候,证明到了 线程启动函数,终止。

代码见 KSStackCursor

但会有个场景 frameless function + b + frameless function crash,导致堆栈看起来丢失。以本文为例,在这个里面丢失了两行堆栈原因是因为:

  1. memmove 是一个尾调用优化,因此再尾调用优化的自身就丢失了,这确实是正常的

  2. platformmemove 是一个 frameless 函数,因此它没有保存栈的逻辑,取出来的栈上的lr其实是_create_protected_copy 的函数栈,因为自己都是无栈的,所以丢失了lr。碰见这种函数可以从 lr 地址里面去看函数地址。

所以本文其实真正的调用堆栈是:

1
2
3
4
5
6
7
8
0   libsystem_platform.dylib        0x00000001fb27a930 _platform_memmove :96 (in libsystem_platform.dylib)
丢失的堆栈2 _memcpy
丢失的堆栈1:_create_protected_copy
1 CoreGraphics 0x00000001afec1988 _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2 CoreGraphics 0x00000001afeaa648 _CGBitmapContextCreateImage :216 (in CoreGraphics)
3 VisionKitCore 0x00000001f0171ad0 -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4 VisionKitCore 0x00000001f0171880 -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5 VisionKitCore 0x00000001f0209a98 __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)

接着看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143

VisionKitCore`-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:
0x1dcb51974 <+0>: cbz x0, 0x1dcb51b98 ; <+548>
0x1dcb51978 <+4>: pacibsp
0x1dcb5197c <+8>: sub sp, sp, #0x90
0x1dcb51980 <+12>: stp d11, d10, [sp, #0x10]
0x1dcb51984 <+16>: stp d9, d8, [sp, #0x20]
0x1dcb51988 <+20>: stp x28, x27, [sp, #0x30]
0x1dcb5198c <+24>: stp x26, x25, [sp, #0x40]
0x1dcb51990 <+28>: stp x24, x23, [sp, #0x50]
0x1dcb51994 <+32>: stp x22, x21, [sp, #0x60]
0x1dcb51998 <+36>: stp x20, x19, [sp, #0x70]
0x1dcb5199c <+40>: stp x29, x30, [sp, #0x80]
0x1dcb519a0 <+44>: add x29, sp, #0x80
0x1dcb519a4 <+48>: fmov d8, d3
0x1dcb519a8 <+52>: fmov d9, d2
0x1dcb519ac <+56>: fmov d10, d1
0x1dcb519b0 <+60>: fmov d11, d0
0x1dcb519b4 <+64>: mov x19, x2
0x1dcb519b8 <+68>: mov x0, x2
0x1dcb519bc <+72>: mov w1, #0x1
0x1dcb519c0 <+76>: bl 0x1dda62ce0 CVPixelBufferLockBaseAddress(cvpixbuffer)
0x1dcb519c4 <+80>: mov x0, x19
0x1dcb519c8 <+84>: bl 0x1dda62c90 CVPixelBufferGetBaseAddress
0x1dcb519cc <+88>: mov x21, x0 x21 = pix = address
0x1dcb519d0 <+92>: mov x0, x19
0x1dcb519d4 <+96>: bl 0x1dda62cc0 CVPixelBufferGetPixelFormatType(cvpixbuffer)
0x1dcb519d8 <+100>: mov x1, x0 x1 = pixformatdesc
0x1dcb519dc <+104>: mov x0, #0x0
0x1dcb519e0 <+108>: bl 0x1dda62d20 CVPixelFormatDescriptionCreateWithPixelFormatType(NULL: allocator, pixformatdesc)
0x1dcb519e4 <+112>: bl 0x1dda632f0 autorelease
0x1dcb519e8 <+116>: mov x20, x0 x0 : __NSFrozenDictionaryM = format desc
0x1dcb519ec <+120>: cbz x0, 0x1dcb51b20 ; <+428>
0x1dcb519f0 <+124>: cbz x21, 0x1dcb51b5c; <+488>校验是否为空 baseAddress pixformatdesc is nil
0x1dcb519f4 <+128>: fcvtmu x22, d9; 类型转换 x22 = d9 = d3 height
0x1dcb519f8 <+132>: fcvtmu x23, d8 x23 = width
0x1dcb519fc <+136>: adrp x8, 64224
0x1dcb51a00 <+140>: ldr x8, [x8, #0x958]
0x1dcb51a04 <+144>: ldr x2, [x8]
0x1dcb51a08 <+148>: mov x0, x20
0x1dcb51a0c <+152>: bl 0x1dcc41360 pifornmar[BitsPerBlock] is 32 ; objc_msgSend$objectForKeyedSubscript:
0x1dcb51a10 <+156>: bl 0x1dda632f0 autorelease
0x1dcb51a14 <+160>: mov x24, x0;
0x1dcb51a18 <+164>: bl 0x1dcc3ee00 [x0 integervalue]; objc_msgSend$integerValue
0x1dcb51a1c <+168>: mov x25, x0 ; x25 = bitsperBlock = 32
0x1dcb51a20 <+172>: bl 0x1dda63450 x0 release
0x1dcb51a24 <+176>: adrp x2, 102753
0x1dcb51a28 <+180>: add x2, x2, #0x9e0; @"BitsPerComponent"
0x1dcb51a2c <+184>: mov x0, x20
0x1dcb51a30 <+188>: bl 0x1dcc41360 pifornmar[BitsPerComponent]; objc_msgSend$objectForKeyedSubscript:
0x1dcb51a34 <+192>: bl 0x1dda632f0 autorelease
0x1dcb51a38 <+196>: mov x24, x0
0x1dcb51a3c <+200>: bl 0x1dcc3ee00; objc_msgSend$integerValue
0x1dcb51a40 <+204>: mov x26, x0 ; x26 = 8 通道
0x1dcb51a44 <+208>: bl 0x1dda63450 x0 release
0x1dcb51a48 <+212>: fmov d0, d11
0x1dcb51a4c <+216>: fmov d1, d10
0x1dcb51a50 <+220>: fmov d2, d9
0x1dcb51a54 <+224>: fmov d3, d8
0x1dcb51a58 <+228>: bl 0x1dda62af0 CGRectGetMinX
0x1dcb51a5c <+232>: fcvtmu x24, d0 x24 = minX
0x1dcb51a60 <+236>: fmov d0, d11
0x1dcb51a64 <+240>: fmov d1, d10
0x1dcb51a68 <+244>: fmov d2, d9
0x1dcb51a6c <+248>: fmov d3, d8
0x1dcb51a70 <+252>: bl 0x1dda62b00 CGRectGetMinY
0x1dcb51a74 <+256>: fcvtmu x27, d0 ; x27 = minY
0x1dcb51a78 <+260>: lsr x8, x25, #3 ; 右移三位 x8 = 4了
0x1dcb51a7c <+264>: madd x21, x8, x24, x21 (4 * x24) + baseAddress
0x1dcb51a80 <+268>: mov x0, x19
0x1dcb51a84 <+272>: bl 0x1dda62ca0 CVPixelBufferGetBytesPerRow(pifbuffer, )
0x1dcb51a88 <+276>: madd x21, x0, x27, x21 (byterPerfow * minY ) + baseAddress
0x1dcb51a8c <+280>: adrp x8, 64219
0x1dcb51a90 <+284>: ldr x8, [x8, #0xb60]
0x1dcb51a94 <+288>: ldr x0, [x8]
0x1dcb51a98 <+292>: bl 0x1dda627e0 CGColorSpaceCreateWithName(kCGColorSpaceSRGB)
0x1dcb51a9c <+296>: mov x24, x0 x24 = <CGColorSpace 0x28121fe40> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)
0x1dcb51aa0 <+300>: mov x0, x19;
0x1dcb51aa4 <+304>: bl 0x1dda62ca0 CVPixelBufferGetBytesPerRow(pixbuffer)
0x1dcb51aa8 <+308>: mov x4, x0 x4 = btresPerrow
0x1dcb51aac <+312>: mov x0, x21
0x1dcb51ab0 <+316>: mov x1, x22
0x1dcb51ab4 <+320>: mov x2, x23
0x1dcb51ab8 <+324>: mov x3, x26
0x1dcb51abc <+328>: mov x5, x24
0x1dcb51ac0 <+332>: mov w6, #0x2002
0x1dcb51ac4 <+336>: bl 0x1dda62700 CGBitmapContextCreate(baseAddress, width, height ,BitsPerComponent(8), btresPerrow, minX, 0x2002)
0x1dcb51ac8 <+340>: mov x21, x0;; x0 = bitMap
0x1dcb51acc <+344>: bl 0x1dda62710; _CGBitmapContextCreateImage


//
0x1dcb51ad0 <+348>: mov x22, x0
0x1dcb51ad4 <+352>: mov x0, x21
0x1dcb51ad8 <+356>: bl 0x1dda62850
0x1dcb51adc <+360>: mov x0, x24
0x1dcb51ae0 <+364>: bl 0x1dda62810
0x1dcb51ae4 <+368>: mov x0, x19
0x1dcb51ae8 <+372>: mov w1, #0x1
0x1dcb51aec <+376>: bl 0x1dda62d10
0x1dcb51af0 <+380>: bl 0x1dda63410
0x1dcb51af4 <+384>: mov x0, x22
0x1dcb51af8 <+388>: ldp x29, x30, [sp, #0x80]
0x1dcb51afc <+392>: ldp x20, x19, [sp, #0x70]
0x1dcb51b00 <+396>: ldp x22, x21, [sp, #0x60]
0x1dcb51b04 <+400>: ldp x24, x23, [sp, #0x50]
0x1dcb51b08 <+404>: ldp x26, x25, [sp, #0x40]
0x1dcb51b0c <+408>: ldp x28, x27, [sp, #0x30]
0x1dcb51b10 <+412>: ldp d9, d8, [sp, #0x20]
0x1dcb51b14 <+416>: ldp d11, d10, [sp, #0x10]
0x1dcb51b18 <+420>: add sp, sp, #0x90
0x1dcb51b1c <+424>: retab
0x1dcb51b20 <+428>: adrp x8, 76725
0x1dcb51b24 <+432>: ldr x0, [x8, #0xae8]
0x1dcb51b28 <+436>: adrp x8, 176
0x1dcb51b2c <+440>: add x8, x8, #0xc38 ; "pixelFormatDict"
0x1dcb51b30 <+444>: str x8, [sp]
0x1dcb51b34 <+448>: adrp x2, 176
0x1dcb51b38 <+452>: add x2, x2, #0xbb4 ; "((pixelFormatDict) != nil)"
0x1dcb51b3c <+456>: adrp x3, 176
0x1dcb51b40 <+460>: add x3, x3, #0xbcf ; "-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]"
0x1dcb51b44 <+464>: adrp x6, 102753
0x1dcb51b48 <+468>: add x6, x6, #0x9c0 ; @"Expected non-nil value for '%s'"
0x1dcb51b4c <+472>: mov w4, #0x0
0x1dcb51b50 <+476>: mov w5, #0x0
0x1dcb51b54 <+480>: bl 0x1dcc3d080 ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
0x1dcb51b58 <+484>: cbnz x21, 0x1dcb519f4 ; <+128>
0x1dcb51b5c <+488>: adrp x8, 76725
0x1dcb51b60 <+492>: ldr x0, [x8, #0xae8]
0x1dcb51b64 <+496>: adrp x8, 176
0x1dcb51b68 <+500>: add x8, x8, #0xc65 ; "bufferBaseAddress"
0x1dcb51b6c <+504>: str x8, [sp]
0x1dcb51b70 <+508>: adrp x2, 176
0x1dcb51b74 <+512>: add x2, x2, #0xc48 ; "((bufferBaseAddress) != nil)"
0x1dcb51b78 <+516>: adrp x3, 176
0x1dcb51b7c <+520>: add x3, x3, #0xbcf ; "-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]"
0x1dcb51b80 <+524>: adrp x6, 102753
0x1dcb51b84 <+528>: add x6, x6, #0x9c0 ; @"Expected non-nil value for '%s'"
0x1dcb51b88 <+532>: mov w4, #0x0
0x1dcb51b8c <+536>: mov w5, #0x0
0x1dcb51b90 <+540>: bl 0x1dcc3d080 ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
0x1dcb51b94 <+544>: b 0x1dcb519f4 ; <+128>
0x1dcb51b98 <+548>: ret

伪代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

CVReturn return1 = CVPixelBufferLockBaseAddress(cvpixbuffer,YES)
pixaddress = CVPixelBufferGetBaseAddress(cvpixbuffer)
formatType = CVPixelBufferGetPixelFormatType(cvpixbuffer)
formatDesc: Dictionary = CVPixelFormatDescriptionCreateWithPixelFormatType(NULL, formatType)
if pixaddress == nil || formatDesc {
go other logic
}

bitsPerBlock = [formatDesc[bitsperBlock] integerValue]
bitsPerComponent = [formatDesc[BitsPerComponent] integerValue]

minx = CGRectGetMinX(x, y, w, h)
miny = CGRectGetMinY(x, y, w, h)
baseaddress = (4 * minx) + baseAddress
size byterPerRow = CVPixelBufferGetBytesPerRow(pifbuffer)
baseAddress (byterPerRow * minY ) + baseAddress
colorSpace = CGColorSpaceCreateWithName(kCGColorSpaceSRGB)
size byterPerRow = CVPixelBufferGetBytesPerRow(pifbuffer)
bitMap = CGBitmapContextCreate(baseAddress, width, height ,BitsPerComponent(8), btresPerrow, minX,,colorspace, 0x2002)

CGBitmapContextCreateImage(bitMap)

伪代码逻辑:

1
2
3
4
5
6
7
cvpixbuffer = [VKCRemoveBackgroundResult pifbuffer]
getCropRect = [VKCRemoveBackgroundResult crioRect]
if cvpixbuffer ==nil || !VKMRectHasArea(getCropRect) {
go other logic
}
[cvpixbuffer retain ]
VKCRemoveBackgroundResult: _createCGImageFromBGRAPixelBuffer: pixbuffer: cropRect: cropRect]

由以上逻辑可以看到系统在 WKWebview 里面长按的逻辑是这样实现的:

WKWebview 跨进程访问了 从BitMap 里面截取了一个图片,并且传递给 VisionKitCore,然后 VisionKit 直接从这个区域获取了 buffer 然后创建了一张图片做一些行为。但是具体为什么 Crash 这时候已经很难排查,因为这个 bitmap 的对象其实是很早创建的,只是在这里消费的时候挂掉了,有可能是因为提前释放,有可能是野指针,有可能是越界了~~ 因此尝试从其他地方找一些蛛丝马迹。

descript

对比下各版本操作系统

既然线上观察到 iOS 16.2 以上就不会出现 Crash了,那可能真的是系统 Bug,并且偷偷摸摸解决了。于是寻找几台高版本的手机进行实验。

**▐  iOS 16.2 **

长按 webview 后,
__vk_cgImageRemoveBackgroundWithDownsizing_block_invoke函数传递过来的 x1 是 nil,而且针对 VKCRemoveBackgroundResult 所有符号打符号断点,发现长按webview时,不会命中任何逻辑。彻底和 iOS 16.1.1 的设备逻辑不一致了。

descript

**▐  iOS 17 **

到了iOS 17 后又不一样了,VisionKitCore-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:改成了直接调用 visionkit 里面的 vk_cgImageFromPixelBuffer 创建。

descript

descript

descript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
VisionKitCore`-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:
-> 0x204396388 <+0>: cbz x0, 0x2043963f8 ; <+112>
0x20439638c <+4>: pacibsp
0x204396390 <+8>: stp d11, d10, [sp, #-0x40]!
0x204396394 <+12>: stp d9, d8, [sp, #0x10]
0x204396398 <+16>: stp x20, x19, [sp, #0x20]
0x20439639c <+20>: stp x29, x30, [sp, #0x30]
0x2043963a0 <+24>: add x29, sp, #0x30
0x2043963a4 <+28>: fmov d8, d3
0x2043963a8 <+32>: fmov d9, d2
0x2043963ac <+36>: fmov d10, d1
0x2043963b0 <+40>: fmov d11, d0
0x2043963b4 <+44>: mov x0, x1
0x2043963b8 <+48>: bl 0x20444d4e8 ; vk_cgImageFromPixelBuffer
0x2043963bc <+52>: mov x19, x0
0x2043963c0 <+56>: fmov d0, d11
0x2043963c4 <+60>: fmov d1, d10
0x2043963c8 <+64>: fmov d2, d9
0x2043963cc <+68>: fmov d3, d8
0x2043963d0 <+72>: bl 0x206acc070
0x2043963d4 <+76>: mov x20, x0
0x2043963d8 <+80>: mov x0, x19
0x2043963dc <+84>: bl 0x206acc110
0x2043963e0 <+88>: mov x0, x20
0x2043963e4 <+92>: ldp x29, x30, [sp, #0x30]
0x2043963e8 <+96>: ldp x20, x19, [sp, #0x20]
0x2043963ec <+100>: ldp d9, d8, [sp, #0x10]
0x2043963f0 <+104>: ldp d11, d10, [sp], #0x40
0x2043963f4 <+108>: retab
0x2043963f8 <+112>: ret

**▐  iOS 16.1.1 **

descript

blockInvoke 的时候也就是说 x1 一定是有值的,因此会走调用逻辑。

看看这个图片到底有什么用?

descript

descript

descript

看上去绘制了一个低分辨率的缩略图,不知道有啥用。

继续看 :

descript

看起来是回调到了 webkit,那webkit 是开源的,继续看——

找到对应设备存在的Webkit版本号:

descript

代码在ImageAnalysisUtilities.mm

看上去做图像识别的,但是还不确定,继续搜谁调用了它,Github目前能直接搜索符号

descript

基本确认是做图像物体识别的,并且有额外判断逻辑,没有 image 就 return。

WebContextMenuProxyMax.mm

descript

descript

解决方案

基于前面的原因得到一些初步的结论:这个功能是 iOS 16 新增的Feature,也就是图像识别,在iOS 16中,系统相册也可以长按抠图,同时系统直接给 WKWebview 里面的所有图片都增加了这个功能。

  1. iOS 16.0..<16.2 期间的所有版本都是有隐含 Bug 的。并不是开发者造成的

  2. _memmove. platformmemory 是非常底层常用的 API,不可能是这的问题。

  3. 大概率是 WKWebview 使用方式导致的,或者是 VisionKit 抠图能力有 Bug。但是由于多次异步加 XPC 调度已经很难确认。

▐  第一种解决方案

我突然想到,既然是默认的行为,那是不是去掉这个行为就好了,同时在前面的的调用栈发现,当-[VKCRemoveBackgroundResult createCGImage]创建图片识别时,系统也有判空逻辑,不会出现 Crash 那我不让它返回就好了。

于是我写个 demo 测试下Hook 掉这个行为, 用了下之前去家里的小猫照片。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

- (void)viewDidLoad {
WKWebView *webview = [[WKWebView alloc] initWithFrame:self.view.bounds configuration:config];
[self.view addSubview:webview];
[webview loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://www.valiantcat.cn/dsn.html"]] ];
UIButton *button = [UIButton buttonWithType:(UIButtonTypeCustom)];
button.frame = CGRectMake(0, 0, 300, 200);
[button setTitle:@"点击hook" forState:(UIControlStateNormal)];
[button setTitleColor:UIColor.redColor forState:UIControlStateNormal];
[self.view addSubview:button];
button.center = self.view.center;
[button addTarget:self action:@selector(hook) forControlEvents:(UIControlEventTouchUpInside)];
}

- (void)hook {
Class class = objc_getClass("VKCRemoveBackgroundResult");
SEL selector = sel_registerName("createCGImage");
Method m = class_getInstanceMethod(class, selector);
const char *type = method_getTypeEncoding(m);
IMP newImp = imp_implementationWithBlock(^CGImageRef(id self, SEL cmd) {
return NULL;
});
IMP oldImp = class_replaceMethod(class, selector, newImp, type);
NSLog(@"%p", oldImp);
}

可以发现,在 hook 后,长按图片不再有抠图功能。

综上猜测,觉得这个方案可行,于是咨询了下详情和容器,他们并未对 WKWebView 的默认行为做额外处理,并不太会影响手机淘宝的业务。于是准备上线。

不过在上线前突然发现, 淘宝里扫一扫和拍立淘有 visionkit 的使用,觉得有风险,又陷入了困境。

▐  Diff 发现

突然想到既然代码是开源,并且只在 iOS 16.0..<iOS16.2 之间的版本有,是不是可以看下系统怎么偷偷摸摸修了bug。果不其然发现了蛛丝马迹,系统在多处
copy 图片的逻辑中都涉及一个图片长度尺寸的变更(但是我在打符号断点的过程中强制修改这个函数的入参,并不能造成同样的Crash)但是经过这个diff,可以更大概率的确认 Bug 来自 WKWebView 而不是 VisionKit。

Diff 链接

descript

▐  第二种解决方案

继续尝试从 WKWebview 排查。长按触发堆栈查找有用信息。

descript

descript

descript

通过阅读代码后发现这是 iOS 16 新增的功能,同时在源码中查找到了是如何添加的手势

descript

突然发现原来在 iOS 16 以前 WKWebView 里面只有一个手势,当长按时,会触发保存图片菜单。

在 iOS 16 以后,WKWebview 添加了两个手势,竞争用户的长按动作。

descript

  • 超时逻辑验证

直接添加符号断点-[WKContentView imageAnalysisGestureDidBegin:]并添加 Command thread return 中断逻辑。发现果然会命中超时逻辑。

descript

结合代码可以看到超时的菜单中没有 copySubject 逻辑。

descript

  • 非超时逻辑

WKContentViewInteraction.mm

抠图识别成功后,具有 CopySubject 菜单。

descript

descript

因此新的方案为 Hook WKWebView 长按手势图片识别能力。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
static void hook2(void) {

Class class = objc_getClass("WKContentView");
SEL selector = sel_registerName("imageAnalysisGestureDidBegin:");
Method m = class_getInstanceMethod(class, selector);
const char *type = method_getTypeEncoding(m);
IMP newImp = imp_implementationWithBlock(^void(id self,UILongPressGestureRecognizer *ges) {
// do nothing

});
if (m == NULL || class == NULL) {
return;
}
IMP oldImp = class_replaceMethod(class, selector, newImp, type);
}
void hookStart() {
if (@available (iOS 16.0, *)) {
if (@available (iOS 16.2, *)) {
return;
} else {
hook2();
}
}
}

▐  线上观察

由于 Hook 长按手势后会导致 WKWebview 自带的抠图功能和文字 OCR 功能失效,担心有舆情风险。我们选择在手机淘宝安全气垫 SDK 实现此 Hook,并且通过放量修复。我们在 10.28.11 中通过放量来进行观察,发现Crash 从 500+ 跌倒了 67(冷起生效,有时效性问题),可以确认修复有效,并且没有舆情反馈。全量后,经过观察,带有 Hook 方案的手机淘宝 Crash 基本跌 0,至此此 Bug 彻底修复。 日降低 Crash 1200+,影响设备 1000+ 。

descript

总结

稳定性治理是一个长期的事情,由于前期同事的努力使得用户Crash 基本解决,一些操作系统的 Bug 逐步浮出水面,冲上排行榜,起初我并没有信心解决系统的
Bug,但是在定位过程中利用自己学习到的知识抽丝剥茧逐步定位到问题,也让自己对系统 Crash 不在畏惧,同时感谢同事在排查Bug 期间的经验输出和指导。

同时在定位过程中如有疑问或错误,欢迎讨论、指正。

descript

参考资料

  1. iOS app crashed on iOS 16
  2. The-ABI-for-ARM-64-bit-Architecture
  3. WebKit
-------------本文结束感谢您的阅读-------------

欢迎关注我的其它发布渠道