Automating CSRF Detection in WordPress Plugins with Semgrep

Brandon Roldan
6 min readNov 1, 2023

Introduction

Some introduction, this month, i discovered semgrep. Semgrep is a powerful static analysis tool. Semgrep support many languages, including php. Semgrep is also easier to learn compare to codeql. You can read a good comparison between the two from spaceracoon’s blog.

Semgrep

Introduction to Semgrep

Semgrep is a static code analysis tool, in which we can write our own rules. These “rules” are the patterns we want semgrep to find. Take this rule as an example

We matched the function using this rule. In semgrep three dots (…) means match anything, and all keywords that start with $ and only has uppercase letters are called metavariables, these are like the variables for semgrep. To learn more about semgrep, read the documentation.

Join Mode

In the time of writing, only a few languages in semgrep allow cross file analysis. However, semgrep have an experimental feature called “join mode”. Join mode allow us to use multiple rules, and do comparisons between the metavariables of those rules. The match between these rules could be from different files. Lets show you a simple example

For this example, we will try to find a simple reflected xss on django. I made two files, a python file that calls to a render, and an html file, as the template that will be rendered

And here is the semgrep rule i made for it

rules:
- id: demo
mode: join
join:
rules:
- id: call-to-render
languages: [python]
pattern: "render(request, $TEMPLATE, {..., '$PARAMETER': ..., ...})"

- id: variable-marked-with-safe
languages: [generic]
pattern: "{{ $VARIABLE | safe }}"

on:
- 'call-to-render.$PARAMETER == variable-marked-with-safe.$VARIABLE'
- 'call-to-render.$TEMPLATE > variable-marked-with-safe.path'
message: "TEST"
severity: INFO

Lets break it down, the first rule, “call-to-render” matches all the render call on the python files. It has two metavariables, $TEMPLATE which is the html template file, and the $PARAMETER which is the name of the parameter to be passed on the template

The second rule “variable-marked-with-safe”, match the pattern {{ $VARIABLE | safe }} . It has a metavariable called $VARIABLE which is the name of the variable that is rendered as safe.

On the “on” part, this is where the magic of join rule happen. This is where we put the comparisons we want to do between the rules. The first line checks, if the $PARAMETER variable of the “call-to-render” rule is equal to the $VARIABLE parameter of the variable-marked-with-safe . The second line, uses the “<” operator which matches if the $TEMPLATE variable of “call-to-render” is a substring of the file path of the variable-marked-with-safe match.

Running this, gave us our expected result. Learn more about join rule in the documentation

Recursive Join

Semgrep also have something called recursive join. Its hard to explain, but basically, it recursively match a rule and generate a table for it. To use it, we use the join rule operator --> . Take this example from semgrep

If we used recursive join rule here, it will make a table like this

Using this, we can check if a function, will somehow make a call to another function. Lets have a simple example
This is the python file

def function1():
dangerous()

def demo(request):
function1()

And this is the semgrep rule

rules:
- id: demo
mode: join
join:
rules:
- id: callgraph
languages: [python]
patterns:
- pattern: $CALLEE(...)
- pattern-inside: |
def $CALLER(...):
...

- id: dangerous-function
languages: [python]
patterns:
- pattern: "$DANGEROUSFUNC(...)"
- metavariable-regex:
metavariable: $DANGEROUSFUNC
regex: dangerous

on:
- 'callgraph.$CALLER --> callgraph.$CALLEE'
- 'callgraph.$CALLEE == dangerous-function.$DANGEROUSFUNC'
message: "TEST"
severity: INFO

In summary, what this rule do is check if a function does make a call to the dangerous function. This is what its callgraph would look like

We can see that the demo function has a call to the dangerous function meaning it is also matched by this rule. You can read more about recursive join in the documentation.

Patching Semgrep

Join rule is nice but we have a problem. Recursive join is good for finding which function makes a call to another function, but csrf on the other hand, exist if the function doesn’t make a call to a protection function. Right now there is no functionality for semgrep to check if a certain function, doesnt make a call to another function, so i made some dumb changes to the code to make it work as i need.

You can see the changes i made here. I wont be explaining it as its not well done, but reading it should give you an idea. But in summary, i made another operator, specific for the callgraphs only, which is the !! operator, which check if a caller wont make a call to a certain function.

Rule for finding csrf

With those, i end up with this rule.

rules:
- id: wp-wpajax-csrf
mode: join
join:
rules:
- id: wpajax-actions
languages: [php]
patterns:
- pattern-either:
- pattern: add_action('$ACTION', [$UNIMPORTANT, '$HOOKFUNC']); ## SOME DEVS DO THIS. THE FIRST ARGS ARE THE CLASS
- pattern: add_action('$ACTION', '$HOOKFUNC');
- metavariable-regex:
metavariable: $ACTION
regex: (wp_ajax_).*

- id: callgraph
languages: [php]
patterns:
- pattern: $CALLEE(...)
- pattern-inside: |
function $CALLER(...){
...
}

- id: csrf-protection
languages: [php]
patterns:
- pattern: $CSRFPROTECTFUNC(...) # FIND CALLS TO CSRF PROTECTIONS
- metavariable-regex:
metavariable: $CSRFPROTECTFUNC
regex: (check_ajax_referer|check_admin_referer|wp_verify_nonce)

on:
#- 'callgraph.path == wpajax-actions.path' # Experimental. Remove this line for more result
- 'callgraph.$CALLEE !! csrf-protection.$CSRFPROTECTFUNC' #Use !! to check if a function somehow will not reach a certain call. the first paratmeter should always be the callgraph $CALLEE the second is the function we want to check if it doesnt reach
#Good for finding csrf ^
- 'callgraph.$CALLER --> callgraph.$CALLEE'
- 'callgraph.$CALLER == wpajax-actions.$HOOKFUNC'

message: "This ajax action doest call csrf protection functions"
severity: INFO

Result

All the bugs i found are reported through Patchstack. Whats nice about patchstack is they handle the disclosing of bugs to the vendor. They also have monthly leaderboards, in which you can receive monetary rewards depending on your points.

I ran the semgrep rules across all wordpress projects with at least 10k installs. I used projectdiscovery’s notify to receive notification on my discord server if it found matches. I begin scaning on October 24, and ended on October 29.

These are the result of this little project

Scanned Plugins: 2524
Plugins with findings: 510
Total Reports: 55
Accepted: 43
Rejected: 12
False Positive: 455

These bugs found vary by severity, from disabling specific plugin functionalities to RCE!!. As you can see, there are alot of false positive so i have to spend alot of time going through each of those one by one.

As for the leaderboard, i managed to reach #2. Apparently, csrf reports have 0.25x multiplier on points, so i only managed to get 240 points. Without that multiplier, i would be around 900 points!!, but there is nothing we can do.

Outro

I feel like alot of people in the infosec community are sleeping on semgrep. Codeql is more popular but semgrep for me, is just as good, if not better. Thank you semgrep for the amazing tool and thank you patchstack for working with me on my reports. Thank you for reading.

My previous twitter got deactivated, so if you want, please follow me on my new twitter account: @tomorrowisnew__

--

--