codeql_zero_to_hero part3学习

考虑challenge2 中的 sql-injection-flask.py 代码：

from django.db import connection, models
from django.db.models.expressions import RawSQL
from flask import Flask, request
app = Flask(__name__)

class User(models.Model):
    pass

@app.route("/users/<username>")
def show_user():
    username = request.args.get("username")
    with connection.cursor() as cursor:
        # GOOD -- Using parameters
        cursor.execute("SELECT * FROM users WHERE username = %s", username)
        User.objects.raw("SELECT * FROM users WHERE username = %s", (username,))

        # BAD -- Using string formatting
        cursor.execute("SELECT * FROM users WHERE username = '%s'" % username)

        # BAD -- other ways of executing raw SQL code with string interpolation
        User.objects.annotate(RawSQL("insert into names_file ('name') values ('%s')" % username))
        User.objects.raw("insert into names_file ('name') values ('%s')" % username)
        User.objects.extra("insert into names_file ('name') values ('%s')" % username)

写如下ql脚本

/**
 * @id codeql-zero-to-hero/3-1
 * @severity error
 * @kind problem
 */

import python
import semmle.python.ApiGraphs

from API::CallNode node
where node = API::moduleImport("django").getMember("db").getMember("connection").getMember("cursor").getReturn().getMember("execute").getACall()
    and
    node.getLocation().getFile().getRelativePath().regexpMatch("2/challenge-1/.*")

select node, "Call to django.db execute"

首先注解里面的 @kind problem 会返回查询到的结果所在的路径信息。

其次from API::CallNode node 限制了 node 只考虑 API graph 里面的节点，比如说上图中的程序调用了django 这个库，这时候这个库里面的一些方法调用就会被纳入到 API:CallNode 的考虑范围。

在where子句中，使用API::moduleImport("django")筛选出来自django库的节点。然后，我们通过.getMember("db").getMember("connection").getMember("cursor")找到对cursor的引用。

这将匹配django.db.connection.cursor。由于在cursor对象上调用execute方法，因此首先需要使用getReturn()谓词来获取表示创建cursor对象结果的节点，这将返回django.db.connection.cursor()（注意末尾的括号）。最后，通过getMember("execute")获取表示execute方法的节点，并通过getACall()获取对由execute节点表示的方法的实际方法调用。

Challenge 1

Find all method calls that are called ‘execute’ and come from the django.db library

上面 ql 脚本就是 c1 的答案。

Challenge 2

Write a query to find all os.system method call

找到所有的 os.system 调用，照猫画虎即可

/**
 * @id codeql-zero-to-hero/3-1
 * @severity error
 * @kind problem
 */

 import python
 import semmle.python.ApiGraphs
 
 from API::CallNode node
 where node =
     API::moduleImport("os").getMember("system").getACall()
     and
     node.getLocation().getFile().getRelativePath().regexpMatch("2/challenge-1/.*")
 
 select node, "Call to django.db execute"

Challenge 3

Write a query to find all Flask requests

/**
 * @id codeql-zero-to-hero/3-1
 * @severity error
 * @kind problem
 */

 import python
 import semmle.python.ApiGraphs
 
select API::moduleImport("flask").getMember("request").getMember("args").asSource(), "find request"

Challenge 4

Run the query with getAQlClass predicate

/**
 * @id codeql-zero-to-hero/3-4
 * @severity error
 * @kind problem
 */
import python
import semmle.python.ApiGraphs

from API::CallNode node
where node = API::moduleImport("django").getMember("db").getMember("connection").getMember("cursor").getReturn().getMember("execute").getACall()
select node, "The node has type " + node.getAQlClass()

Taint analysis in Codeql

local data flow

顾名思义，只跟踪局部数据流，比如说只考虑一个函数内的 local data flow。

/**
 * @id codeql-zero-to-hero/3-4
 * @severity error
 * @kind problem
 */
import python
import semmle.python.ApiGraphs

class ExecuteNode extends DataFlow::CallCfgNode{
    ExecuteNode(){
        this = API::moduleImport("django").getMember("db").getMember("connection").getMember("cursor").getReturn().getMember("execute").getACall()
    }
}

predicate isEvalExcute(DataFlow::CallCfgNode call) {
    exists(DataFlow::ExprNode expr | //声明一个表达式变量作为source，在这个例子中就是username
            call instanceof ExecuteNode //将call限制在写的规则范围之内
            and expr instanceof DataFlow::LocalSourceNode//将expr定义为一个source点，这里限制为local sources
            and DataFlow::localFlow(expr, call.getArg(0))//localFlow(source, sink)，这里定义数据流考虑source到sink点
            and not expr.getNode().isLiteral()//这里限制source点不能是一个常量，比如username不能是一个固定的字符串
            //and not call.getArg(0).asCfgNode().isLiteral() 
            )
}
from DataFlow::CallCfgNode call
where call instanceof ExecuteNode
    and isEvalExcute(call)
select call, "The execute point is vuln! "

这里的指令用来找到所有不以固定字符串为参数的execute节点（sink点）

最终结果显示也是如愿找到满足条件的节点。

Global data flow

这种方法会跟踪整个代码库的数据流，这也是大部分情况中所采用的方法，

Challenge 5

Run the local data flow query to find execute calls that do not take a string literal

就是上面 local data flow 的内容

New taint tracking API

下面为污点分析最新的API配置代码

/**
 * @kind path-problem
 */

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.ApiGraphs
import MyFlow::PathGraph

private module MyConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    // Define your source nodes here. 
  }

  predicate isSink(DataFlow::Node sink) {
    // Define your sink nodes here.
  }
}

module MyFlow = TaintTracking::Global<MyConfig>; // or DataFlow::Global<..>

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Sample TaintTracking query"

需要注意的是通过设定@kind path-problem，就可以实现看见source和sink点之间的路径，接下来我们尝试使用新的污点分析API来实现Challenge 3中的数据流跟踪。

/**
 * @kind path-problem
 */

 import python
 import semmle.python.dataflow.new.DataFlow
 import semmle.python.dataflow.new.TaintTracking
 import semmle.python.ApiGraphs
 import MyFlow::PathGraph
 

class ExecuteCall extends DataFlow::CallCfgNode {
    ExecuteCall() {
    this = API::moduleImport("django").getMember("db").getMember("connection").getMember("cursor").getReturn().getMember("execute").getACall()
    }
}

 private module MyConfig implements DataFlow::ConfigSig {
   predicate isSource(DataFlow::Node source) {
     // Define your source nodes here. 
     source = API::moduleImport("flask").getMember("request").getMember("args").asSource()
   }
 
   predicate isSink(DataFlow::Node sink) {
     // Define your sink nodes here.
     exists(ExecuteCall call |  
            sink = call.getArg(0))
   }
 }
 
 module MyFlow = TaintTracking::Global<MyConfig>; // or DataFlow::Global<..>
 
 from MyFlow::PathNode source, MyFlow::PathNode sink
 where MyFlow::flowPath(source, sink)
 select sink.getNode(), source, sink, "Sample TaintTracking query"

API::moduleImport("flask").getMember("request").getMember("args").asSource() 简单解释一下这段代码，首先还是通过 API graph 提取目标节点，然后通过 asSource()将该节点设置为我们要分析的 data flow graph 中的一个source点。

exists(ExecuteCall call | sink = call.getArg(0))} sink 点的定义就更简单了，直接为cursor.execute 就可以了。

Challenge 6

Run the taint tracking query to find flows from a Flask request to a django.db’s execute sink

上面即为 Challenge 6 的结果

Variant analysis

简单来说通过一种基本漏洞，尝试去发现类似的其他漏洞，比如说你现在自己审计出来一个漏洞，然后可以尝试通过codeql去建立这个漏洞的规则，然后说不定就可以发现其他相似类型的漏洞在代码库当中。

Source and sink models in CodeQL

sources

codeql 需要有能力去区分一个节点是不是 source，因此这需要提前进行建模，比如说我们前面的flask模块，这些都已在codeql中集成好了，

API::Node request() { result = 
API::moduleImport("flask").getMember("request") }

这里有预定义好的一些 source，这些source的类型被设置为RemoteFlowSource，比如说下面的flask.request

private class FlaskRequestSource extends RemoteFlowSource::Range {
  FlaskRequestSource() { this = request().asSource() }

  override string getSourceType() { result = "flask.request" }
}

sinks

Codeql 也为每种漏洞类型建立了对应的sink点，比如说 cursor.execute

private class ExecuteMethodCall extends SqlExecution::Range, API::CallNode {
  ExecuteMethodCall() {
    exists(API::Node start |
      start instanceof DatabaseCursor or start instanceof DatabaseConnection
    |
      this = start.getMember(getExecuteMethodName()).getACall()
    )
  }

  override DataFlow::Node getSql() {
    result in [this.getArg(0), this.getArgByName(getSqlKwargName()),]
  }
}

Security research methodology with CodeQL—approaching a new target

第一步：Quick look with code scanning

先直接开扫就完事了，GitHubSecurityLab/CodeQL-Community-Packs 这里有codeql团队使用的一些queries可供参考

第二步：Run specific queries or parts of queries

<language>/ql/src/Security/ 这个文件夹里面有各种漏洞类型的queries，可以针对某种特定漏洞类型先测测

由于这里限制了最大运行数为20，所以需要修改 codeQL.runningQueries.maxQueries 设置，从而提升这个限制：

1. 打开 CodeQL 的设置文件：

• 在 VS Code 中按下 Ctrl + ,（打开设置）。

• 搜索 CodeQL Running Queries MaxQueries。

2. 提高并行查询数限制，比如将其改为 50：

Challenge 7

完成上面的操作就好了

第三步：Find all sources with the RemoteFlowSource type

尽可能的找到所有的 sources，方法如下：

/**
 * @kind problem
 * @problem.severity error
 * @id githubsecuritylab/3-8
 */
import python
import semmle.python.dataflow.new.RemoteFlowSources

from RemoteFlowSource rfs
select rfs, "A remote flow source"

成功找到了一些source点，如果想要限制我们扫描source点的范围，可以像Challenge 1中那样，通过getLocation来限制查询的范围。

Challenge 8

上述内容即为Challenge 8

第四步：Find all sinks for a specific vulnerability type

可以考虑使用Quick evaluation 功能，比如说想查询sql注入的sink点，先进入到python/ql/src/Security/CWE-089/SqlInjection.ql 文件，进入到SqlInjectionQuery定义的地方

点击下图中isSink上方的Quick Evaluation即可开始自动扫描可能的sink点

这里也是成功扫描出来了

还有一种类似获取source点的方法，比如说查询所有SqlExecution类型的sink点

/**
 * @kind problem
 * @problem.severity error
 * @id githubsecuritylab/3-9
 */

import python
import semmle.python.Concepts

from SqlExecution sink
select sink, "Potential SQL injection sink"

第五步：Find calls to all external APIs (untrusted functionality)

CWE-20 Untrusted APIs 可以用来检测外部API是否使用不受信任的数据来源，具体位置如下，运行即可

vscode-codeql-starter/ql/python/ql/src/Security/CWE-020-ExternalAPIs/UntrustedDataToExternalAPI.ql

Challenge 10

Run CWE-20 Untrusted APIs query

如上

Challenge 11

Run MRVA using one of the security queries

MRVA 支持 query 同时运行在多个项目上，可以大幅提高分析效率

https://github.com/GitHubSecurityLab/codeql-zero-to-hero/blob/main/3/11/instructions.md

这里可以设置扫描的项目

暂时没有大规模需求，先放着吧

Community research

https://github.blog/security/vulnerability-research/codeql-zero-to-hero-part-3-security-research-with-codeql/#community-research-with-codeql

以下是一些C/C++相关的参考资料

Bug Hunting with CodeQL in Rsyslog
Beginner-friendly step by step process of finding vulnerabilities using CodeQL in Rsyslog by @agustingianni .

Hunting bugs in Accel-PPP with CodeQL by Chloe Ong and Removing False Positives in CodeQL by Kar Wei Loh.
The two researchers worked together to find memory corruption bugs in Accel-PPP. The first article is an in-depth writeup about looking for the bugs and the thought process, writing CodeQL and the challenges they’ve encountered. The second article shows how they refined the CodeQL query to provide more precise results.

Finding Gadgets for CPU Side-Channels with Static Analysis Tools
Research by @pwningsystems and @fkaasan into finding useful gadgets in CPU side-channel exploitation.

Reference

https://github.blog/security/vulnerability-research/codeql-zero-to-hero-part-3-security-research-with-codeql/