codeql_zero_to_hero part4学习

codeql_zero_to_hero part4学习

henry Lv4

Reference:

https://github.blog/security/vulnerability-research/codeql-zero-to-hero-part-4-gradio-framework-case-study/

老样子阅读能力好的,建议直接看原文享受顶级讲解

Gradio framework case study

Gradio

Gradio 是一个用于构建交互式机器学习模型演示和用户界面的 Python 库。它可以让开发者快速地创建基于网页的界面,方便其他人通过浏览器与机器学习模型或其他函数进行交互。

Gradio interface

通过调用 gradio 的interface类可以很快创建一个界面

1
2
3
4
5
6
7
8
9
10
11
import gradio as gr

def greet(name, intensity):
return "Hello, " + name + "!" * int(intensity)

demo = gr.Interface(
fn=greet,
inputs=[gr.Textbox(), gr.Slider()],
outputs=[gr.Textbox()])

demo.launch()

很快就创建出了如下界面:

Snipaste_2024-12-17_17-25-06

Gradio Blocks

Gradio 文档中是这样描述Blocks类的

Blocks offers more flexibility and control over:

​ (1) the layout of components

​ (2) the events that trigger the execution of functions

​ (3) data flows (for example, inputs can trigger outputs, which can trigger the next level of outputs).

可见 Blocks 类的功能相比于interface会更加强大

下面是一个使用的例子,创建了滑块,下拉选项,复选框,单选按钮,单选框

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import gradio as gr

def sentence_builder(quantity, animal, countries, place, morning):
return f"""The {quantity} {animal}s from {" and ".join(countries)} went to the {place} in the {"morning" if morning else "night"}"""

with gr.Blocks() as demo:

gr.Markdown("Choose the options and then click **Run** to see the output.")
with gr.Row():
quantity = gr.Slider(2, 20, value=4, label="Count", info="Choose between 2 and 20")
animal = gr.Dropdown(["cat", "dog", "bird"], label="Animal", info="Will add more animals later!")
countries = gr.CheckboxGroup(["USA", "Japan", "Pakistan"], label="Countries", info="Where are they from?")
place = gr.Radio(["park", "zoo", "road"], label="Location", info="Where did they go?")
morning = gr.Checkbox(label="Morning", info="Did they do it in the morning?")

btn = gr.Button("Run")
btn.click(
fn=sentence_builder,
inputs=[quantity, animal, countries, place, morning],
outputs=gr.Textbox(label="Output")
)

if __name__ == "__main__":
demo.launch(debug=True)

输入组件种类及功能

​ • Slider:滑块输入,用于选择数量范围(2 到 20),默认值为 4。

​ • Dropdown:下拉选择框,用户从预设列表中选择动物。

​ • CheckboxGroup:多选框组,允许用户选择多个国家。

​ • Radio:单选按钮,用于选择位置(如公园、动物园或道路)。

​ • Checkbox:单选框,用于选择布尔值(是否在早晨发生)。

共有特性

​ • label:显示在组件上的标题。

​ • info:帮助文本,为用户提供额外的说明。

Snipaste_2024-12-17_17-39-14

需要注意的是 gr.Button 创建了一个名为 “Run” 的按钮,并为这个按钮绑定了一个事件,其中btn.click 关联了以下内容:

​ • fn:点击按钮后调用的函数,这里是 sentence_builder

​ • inputs:传递给函数的输入组件,按顺序是 quantity, animal, countries, place, morning。

​ • outputs:函数输出的显示位置,这里是一个 Textbox,用来显示结果。

Identifying attack surface in Gradio

找到一些sources点,从输入下手,首先的一些简单思路就是设置一些不符合原本规范的值,

Modeling Gradio with CodeQL

这里原作者给出了两个使用 gradio 存在命令执行的例子,其中 folder 和 logs 作为execute_cmd的参数,然后这两个参数被当作命令的一部分进行执行。

使用interface

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import gradio as gr
import os

def execute_cmd(folder, logs):
cmd = f"python caption.py --dir={folder} --logs={logs}"
os.system(cmd)
return f"Command: {cmd}"


folder = gr.Textbox(placeholder="Directory to caption")
logs = gr.Checkbox(label="Save verbose logs")
output = gr.Textbox()

demo = gr.Interface(
fn=execute_cmd,
inputs=[folder, logs],
outputs=output)

if __name__ == "__main__":
demo.launch(debug=True)

使用Blocks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import gradio as gr
import os

def execute_cmd(folder, logs):
cmd = f"python caption.py --dir={folder} --logs={logs}"
os.system(cmd)


with gr.Blocks() as demo:
gr.Markdown("Create caption files for images in a directory")
with gr.Row():
folder = gr.Textbox(placeholder="Directory to caption")
logs = gr.Checkbox(label="Add verbose logs")

btn = gr.Button("Run")
btn.click(fn=execute_cmd, inputs=[folder, logs])


if __name__ == "__main__":
demo.launch(debug=True)

尝试使用 CodeQL CLI 对codeql-zero-to-hero 路径4/vulnerable-code-snippets 建立CodeQL database,在https://henrymartin262.github.io/2024/12/11/codeql_note/ 已有介绍

1
gh codeql database create gradio-codeql-db --language=python

在上面这两个例子当中,folder 和 logs 就是 sources

Modeling gr.interface

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* @kind problem
* @severity error
* @id githubsecuritylab/4-1
*/

import python
import semmle.python.ApiGraphs

from API::CallNode node
where node =
API::moduleImport("gradio").getMember("Interface").getACall()
select node, "call to interface"

通过上面的方法可以找到所有调用 gr.interface 的调用点

Snipaste_2024-12-17_20-21-27

下面这段代码用于继续找到folderlogs两个参数

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* @kind problem
* @severity error
* @id githubsecuritylab/4-1
*/

import python
import semmle.python.ApiGraphs

from API::CallNode node
where node =
API::moduleImport("gradio").getMember("Interface").getACall()
select node.getParameter(0, "fn").getParameter(_), "Gradio sources"

Snipaste_2024-12-17_20-30-44

可以使用下面的class来进行封装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* @kind problem
* @severity error
* @id githubsecuritylab/4-1
*/

import python
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.RemoteFlowSources

class GradioInterface extends RemoteFlowSource::Range{
GradioInterface(){
exists(API::CallNode n |
n = API::moduleImport("gradio").getMember("Interface").getACall()
and this = n.getParameter(0, "fn").getParameter(_).asSource())
}
override string getSourceType(){result = "Gradio vuln input"}
}

from GradioInterface src
select src, "Gradio sources"

Snipaste_2024-12-17_20-48-05

注意这里使用了RemoteFlowSource::Range这个抽象类,简单来说这个类定义了很多source,其中就包括我们上面找到的这三个source,可以简单验证一下

1
2
3
4
5
6
7
8
9
10
11
12
/**
* @kind problem
* @severity error
* @id githubsecuritylab/4-1
*/

import python
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.RemoteFlowSources

from RemoteFlowSource rfs
select rfs, "Gradio sources"

Snipaste_2024-12-17_20-52-11

事实也确实如此

Modeling gr.Button

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* @kind problem
* @severity error
* @id githubsecuritylab/4-1
*/

import python
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.RemoteFlowSources

class GradioInterface extends RemoteFlowSource::Range{
GradioInterface(){
exists(API::CallNode n |
n = API::moduleImport("gradio").getMember("Button").getReturn().getMember("click").getACall()
and this = n.getParameter(0, "fn").getParameter(_).asSource())
}
override string getSourceType(){result = "Gradio Blocks vuln input"}
}

from GradioInterface src
select src, "Gradio sources"

interface中的代码简单修改就可以得到对应Blocks类对应的source。

Snipaste_2024-12-17_20-58-43

使用前面写的两个例子作为source,然后只需要定义一个sink,我们就可以实现数据流分析了,具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/**
* @id codeql-zero-to-hero/4-7
* @severity error
* @kind path-problem
*/


import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.RemoteFlowSources
import MyFlow::PathGraph

class GradioButton extends RemoteFlowSource::Range {
GradioButton() {
exists(API::CallNode n |
n = API::moduleImport("gradio").getMember("Button").getReturn()
.getMember("click").getACall() |
this = n.getParameter(0, "fn").getParameter(_).asSource())
}

override string getSourceType() { result = "Gradio untrusted input" }

}

class GradioInterface extends RemoteFlowSource::Range {
GradioInterface() {
exists(API::CallNode n |
n = API::moduleImport("gradio").getMember("Interface").getACall() |
this = n.getParameter(0, "fn").getParameter(_).asSource())
}
override string getSourceType() { result = "Gradio untrusted input" }

}


class OsSystemSink extends API::CallNode {
OsSystemSink() {
this = API::moduleImport("os").getMember("system").getACall()
}
}


private module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof GradioButton
or
source instanceof GradioInterface
}

predicate isSink(DataFlow::Node sink) {
exists(OsSystemSink call |
sink = call.getArg(0)
)
}
}

module MyFlow = TaintTracking::Global<MyConfig>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Data Flow from a Gradio source to `os.system`"

Snipaste_2024-12-17_21-04-39

  • Title: codeql_zero_to_hero part4学习
  • Author: henry
  • Created at : 2024-12-17 21:12:04
  • Updated at : 2024-12-17 21:14:19
  • Link: https://henrymartin262.github.io/2024/12/17/codeql_note_p4/
  • License: This work is licensed under CC BY-NC-SA 4.0.
 Comments