Automating CSRF Detection in WordPress Plugins with Semgrep
Introduction
Some introduction, this month, i discovered semgrep. Semgrep is a powerful static analysis tool. Semgrep support many languages, including php. Semgrep is also easier to learn compare to codeql. You can read a good comparison between the two from spaceracoon’s blog.
Semgrep
Introduction to Semgrep
Semgrep is a static code analysis tool, in which we can write our own rules. These “rules” are the patterns we want semgrep to find. Take this rule as an example
We matched the function using this rule. In semgrep three dots (…) means match anything, and all keywords that start with $ and only has uppercase letters are called metavariables, these are like the variables for semgrep. To learn more about semgrep, read the documentation.
Join Mode
In the time of writing, only a few languages in semgrep allow cross file analysis. However, semgrep have an experimental feature called “join mode”. Join mode allow us to use multiple rules, and do comparisons between the metavariables of those rules. The match between these rules could be from different files. Lets show you a simple example
For this example, we will try to find a simple reflected xss on django. I made two files, a python file that calls to a render, and an html file, as the template that will be rendered
And here is the semgrep rule i made for it
rules:
- id: demo
mode: join
join:
rules:
- id: call-to-render
languages: [python]
pattern: "render(request, $TEMPLATE, {..., '$PARAMETER': ..., ...})"
- id: variable-marked-with-safe
languages: [generic]
pattern: "{{ $VARIABLE | safe }}"
on:
- 'call-to-render.$PARAMETER == variable-marked-with-safe.$VARIABLE'
- 'call-to-render.$TEMPLATE > variable-marked-with-safe.path'
message: "TEST"
severity: INFO
Lets break it down, the first rule, “call-to-render” matches all the render call on the python files. It has two metavariables, $TEMPLATE
which is the html template file, and the $PARAMETER
which is the name of the parameter to be passed on the template
The second rule “variable-marked-with-safe”, match the pattern {{ $VARIABLE | safe }}
. It has a metavariable called $VARIABLE
which is the name of the variable that is rendered as safe.
On the “on” part, this is where the magic of join rule happen. This is where we put the comparisons we want to do between the rules. The first line checks, if the $PARAMETER
variable of the “call-to-render” rule is equal to the $VARIABLE
parameter of the variable-marked-with-safe
. The second line, uses the “<” operator which matches if the $TEMPLATE
variable of “call-to-render” is a substring of the file path of the variable-marked-with-safe match.
Running this, gave us our expected result. Learn more about join rule in the documentation
Recursive Join
Semgrep also have something called recursive join. Its hard to explain, but basically, it recursively match a rule and generate a table for it. To use it, we use the join rule operator -->
. Take this example from semgrep
If we used recursive join rule here, it will make a table like this
Using this, we can check if a function, will somehow make a call to another function. Lets have a simple example
This is the python file
def function1():
dangerous()
def demo(request):
function1()
And this is the semgrep rule
rules:
- id: demo
mode: join
join:
rules:
- id: callgraph
languages: [python]
patterns:
- pattern: $CALLEE(...)
- pattern-inside: |
def $CALLER(...):
...
- id: dangerous-function
languages: [python]
patterns:
- pattern: "$DANGEROUSFUNC(...)"
- metavariable-regex:
metavariable: $DANGEROUSFUNC
regex: dangerous
on:
- 'callgraph.$CALLER --> callgraph.$CALLEE'
- 'callgraph.$CALLEE == dangerous-function.$DANGEROUSFUNC'
message: "TEST"
severity: INFO
In summary, what this rule do is check if a function does make a call to the dangerous function. This is what its callgraph would look like
We can see that the demo
function has a call to the dangerous
function meaning it is also matched by this rule. You can read more about recursive join in the documentation.
Patching Semgrep
Join rule is nice but we have a problem. Recursive join is good for finding which function makes a call to another function, but csrf on the other hand, exist if the function doesn’t make a call to a protection function. Right now there is no functionality for semgrep to check if a certain function, doesnt make a call to another function, so i made some dumb changes to the code to make it work as i need.
You can see the changes i made here. I wont be explaining it as its not well done, but reading it should give you an idea. But in summary, i made another operator, specific for the callgraphs only, which is the !!
operator, which check if a caller wont make a call to a certain function.
Rule for finding csrf
With those, i end up with this rule.
rules:
- id: wp-wpajax-csrf
mode: join
join:
rules:
- id: wpajax-actions
languages: [php]
patterns:
- pattern-either:
- pattern: add_action('$ACTION', [$UNIMPORTANT, '$HOOKFUNC']); ## SOME DEVS DO THIS. THE FIRST ARGS ARE THE CLASS
- pattern: add_action('$ACTION', '$HOOKFUNC');
- metavariable-regex:
metavariable: $ACTION
regex: (wp_ajax_).*
- id: callgraph
languages: [php]
patterns:
- pattern: $CALLEE(...)
- pattern-inside: |
function $CALLER(...){
...
}
- id: csrf-protection
languages: [php]
patterns:
- pattern: $CSRFPROTECTFUNC(...) # FIND CALLS TO CSRF PROTECTIONS
- metavariable-regex:
metavariable: $CSRFPROTECTFUNC
regex: (check_ajax_referer|check_admin_referer|wp_verify_nonce)
on:
#- 'callgraph.path == wpajax-actions.path' # Experimental. Remove this line for more result
- 'callgraph.$CALLEE !! csrf-protection.$CSRFPROTECTFUNC' #Use !! to check if a function somehow will not reach a certain call. the first paratmeter should always be the callgraph $CALLEE the second is the function we want to check if it doesnt reach
#Good for finding csrf ^
- 'callgraph.$CALLER --> callgraph.$CALLEE'
- 'callgraph.$CALLER == wpajax-actions.$HOOKFUNC'
message: "This ajax action doest call csrf protection functions"
severity: INFO
Result
All the bugs i found are reported through Patchstack. Whats nice about patchstack is they handle the disclosing of bugs to the vendor. They also have monthly leaderboards, in which you can receive monetary rewards depending on your points.
I ran the semgrep rules across all wordpress projects with at least 10k installs. I used projectdiscovery’s notify to receive notification on my discord server if it found matches. I begin scaning on October 24, and ended on October 29.
These are the result of this little project
Scanned Plugins: 2524
Plugins with findings: 510
Total Reports: 55
Accepted: 43
Rejected: 12
False Positive: 455
These bugs found vary by severity, from disabling specific plugin functionalities to RCE!!. As you can see, there are alot of false positive so i have to spend alot of time going through each of those one by one.
As for the leaderboard, i managed to reach #2. Apparently, csrf reports have 0.25x multiplier on points, so i only managed to get 240 points. Without that multiplier, i would be around 900 points!!, but there is nothing we can do.
Outro
I feel like alot of people in the infosec community are sleeping on semgrep. Codeql is more popular but semgrep for me, is just as good, if not better. Thank you semgrep for the amazing tool and thank you patchstack for working with me on my reports. Thank you for reading.
My previous twitter got deactivated, so if you want, please follow me on my new twitter account: @tomorrowisnew__