Hunting CVEs in WordPress Plugins using Claude + Semgrep
Press enter or click to view image in full sizeFor the last couple of months, I’ve been working on b 2026-5-19 08:59:32 Author: infosecwriteups.com(查看原文) 阅读量:5 收藏

Muhan Luo

Press enter or click to view image in full size

For the last couple of months, I’ve been working on building a workflow to help me find vulnerabilities in WordPress plugins. Thus far, my system has helped me find 7 vulnerabilities. 2 have been publicly disclosed, 1 more is waiting to be disclosed, and 4 are waiting to be triaged.

As of writing, here are the two vulnerabilities that have been publicly disclosed:

  • (CVE-2026–40786) A low-level user can perform admin-level actions like like resetting the database and deleting all customer points.
  • (CVE-2026–40796) A low-level user can access sensitive PII through an IDOR vulnerability.

Additionally, my vulnerability hunting workflow also helped me discover many unpatched vulnerabilities. These vulnerabilities were already reported and publicly disclosed but had not been fixed as of writing. Here are 2 notable examples:

This article will focus on how I built this workflow and how it could’ve been improved.

Background

Press enter or click to view image in full size

List of CVEs currently on my WordFence profile

Since early 2025, I have been trying to improve my secure code-review skills. I liked the code review exercises on PentesterLab, but I realized I would learn a lot more if I practiced these skills on actual software. WordPress plugins are a great option because they’re free, the code is publicly accessible, and there are tens of thousands of plugins available to review.

At the time, I had very little experience with WordPress, so I spent a lot of time learning WordPress security through online blogs. I found WordFence’s Beginner Series and list of common WordPress vulnerabilities to be especially useful. While there was a decent learning curve, my effort eventually paid off when I found a stored XSS vulnerability in a plugin with 20,000+ active installs (CVE-2025–4406). You can read my writeup here.

Later that year, I also became interested in learning how to use static analysis security testing (SAST) tools like CodeQL and Semgrep to help expedite code review. What these tools allow you to do is create rules to detect patterns in code. The screenshot below shows a Semgrep rule which detects usages of eval() where the first argument includes a variable.

Press enter or click to view image in full size

This Semgrep rule detects usages of the eval() function in JS where the first argument includes a variable.

I decided to combine my interest in SAST tools with WordPress by creating Semgrep rules to detect common vulnerable patterns in WordPress plugins (available on my GitHub). My plan was to use these rules to scan the top 10,000 WordPress plugins to find security vulnerabilities and earn bug bounties.

Unfortunately, my initial attempts to find vulnerabilities were unsuccessful. The problem is that Semgrep produced thousands of findings, but almost all of them were false positives. It was incredibly draining to spend hours triaging these findings.

Press enter or click to view image in full size

Semgrep would often produce 1000s of findings, almost all of which were false positive

What inspired me to start this project was this blog post by GitHub Security titled AI-supported vulnerability triage with the GitHub Security Lab Taskflow Agent. The article describes how the GitHub security team was getting overwhelmed by the huge numbers of false positives coming from their static analysis tools and were able to fix this by offloading work to an LLM. The LLM reviewed the findings and filtered out false positives, which saved the team a lot of time. I realized this approach would fit perfectly into my Semgrep workflow and decided to implement something similar.

Methodology

I didn’t want to spend too much money on tokens, so I decided the best approach would be to focus on just one class of vulnerabilities. The idea I settled on was to look for missing authorization vulnerabilities in AJAX hooks. In WordPress, AJAX hooks are basically API endpoints that can be called by all users, including low-privilege ones. When an AJAX hook is called by a user, the AJAX hook invokes its callback function.

In the example below, the wp_ajax_get_user_data AJAX hook is registered using add_action().

<?php
// Register AJAX hook
add_action( 'wp_ajax_get_user_data', 'handle_get_user_data' );

function handle_get_user_data() {
// Verify nonce
check_ajax_referer( 'get_user_data_nonce', 'nonce' );

$user_id = isset( $_POST['user_id'] ) ? absint( $_POST['user_id'] ) : 0;

if ( ! $user_id ) {
wp_send_json_error( 'Invalid user ID', 400 );
}

$user = get_userdata( $user_id );
wp_send_json_success( [ 'username' => $user->user_login ] );
}

The handle_get_user_data() function is registered as the callback to the wp_ajax_get_user_data AJAX hook. When the wp_ajax_get_user_dataAJAX hook is triggered, its associated callback function, handle_get_user_data() is invoked.

Similar to many API frameworks, AJAX hooks do not implement any authorization checks by default. If a developer fails to verify the user’s permissions using a function such as current_user_can() in the body of the callback, a low-privileged user can potentially perform actions which they shouldn’t be allowed to.

My plan was this:

Press enter or click to view image in full size

  1. I would download the top 10,000 most installed WordPress plugins.
  2. For each plugin, I would use Semgrep to detect AJAX hooks whose callback function did not include any authorization checks.
  3. Claude would analyze the Semgrep output and score each finding from 1–5 (1 being a false-positive, 5 a high-severity issue).
  4. I would manually review Claude’s analysis.
  5. If Claude’s analysis seemed correct, I would perform dynamic testing.

Step 1 - Downloading WordPress plugins

Press enter or click to view image in full size

I thought downloading the top 10,000 plugins would be a chore, but thankfully WordPress implemented an API for their plugin directory. This allowed me to write a simple Python script to perform this task. You can find this script on my GitHub.

Step 2: Semgrep Scan

Next, I needed to create a Semgrep rule to identify AJAX hook callbacks without authorization checks implemented. I found this article by a Brandon Roldan, who found many CSRF vulnerabilities in WordPress plugins using Semgrep, to be very helpful.

Here is the rule I ended up writing (You can find it on my GitHub). Note that you have to login into Semgrep to use this rule since join mode is not enabled for unauthenticated users.

rules:
- id: wp-ajax-hook-missing-auth
# Metavariable Focus
mode: join
join:
rules:
- id: add-action-results
languages: [php]
patterns:
- pattern-either:
- pattern: add_action('$ACTION', [..., '$HOOK_FUNC'], ...);
- pattern: add_action('$ACTION', '$HOOK_FUNC', ...);
- metavariable-regex:
metavariable: $ACTION
regex: (wp_ajax_.*|admin_post_.*|admin_action_.*)

message: Detects usages of add-action
severity: INFO

- id: no-auth-functions-results
languages:
- php
patterns:
- pattern: |
function $CALLBACK(...) {
...
}
- pattern-not:
patterns:
- pattern: |
function $CALLBACK(...) {
...
if (<... $WP_VERIFY(...) ...>)
{
...
}
...
}
- metavariable-regex:
metavariable: $WP_VERIFY
regex: (current_user_can|wp_verify_nonce|check_ajax_referer|check_admin_referer)
- pattern-not: function $CALLBACK(...) { ... check_ajax_referer(...); ... }
- pattern-not: function $CALLBACK(...) { ... check_admin_referer(...); ... }

message: These functions did not have a function like "current_user_can" or "check_ajax_referer" in their body.
severity: INFO

on:
- 'add-action-results.path == no-auth-functions-results.path' # We need a check to ensure that "add_action()" and the callback called by "add_action()" are in the same file.
- 'add-action-results.$HOOK_FUNC == no-auth-functions-results.$CALLBACK' # Only return functions which are called by "add_action()"

message: These hooks did not have a corresponding current_user_can() for their
callback, which might indicate a missing authorization vulnerability. Only
hooks usually associated with high-severity access control vulnerabilities
are marked.
severity: INFO

This Semgrep rule matches an AJAX hook’s callback function if the following conditions are met:

  1. add_action() is registering an AJAX hook. So usages of add_action($ACTION, ...) where $ACTION matches the following regex: (wp_ajax_.*|admin_post_.*|admin_action_.*)
  2. The callback does not contain any functions in its body which performs authorization checks such as current_user_can(), wp_verify_nonce() , check_ajax_referer() , and check_admin_referer()

Examples of the Rule in Action

✅Example 1: Semgrep Matches update_dismiss_status_ajax()

add_action( 'wp_ajax_update_dismiss_status', 'update_dismiss_status_ajax' );

function update_dismiss_status_ajax() {
$isdismiss = isset( $_POST['isdismiss'] ) ? $_POST['isdismiss'] : false;
update_option( 'tho-klaviyo-isdismiss', $isdismiss );

wp_die();
}

Explanation: There’s no function which performs authorization checks in update_dismiss_status_ajax()

❌Example 2: No Match

add_action('wp_ajax_bmi_gdrive_banner', [&$this, 'handle_banner_action']);

public function handle_banner_action() {
if (!current_user_can('manage_options')) return wp_send_json_error();

$mode = sanitize_text_field($_POST['mode']);
if ($mode == 'dismiss') update_option('bmi_gdrive_banner_dismissed', true);

}

Explanation: handle_banner_action() is not matched because it includes current_user_can() in its body.

Get Muhan Luo’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

❌Example 3: No Match

add_action( 'wp_ajax_goodbye_form', array( $this, 'goodbye_form_callback' ) );

public function goodbye_form_callback() {
check_ajax_referer( 'wisdom_goodbye_form', 'security' );

if ( isset( $_POST['values'] ) ) {
$values = wp_json_encode( array_map( 'sanitize_text_field', wp_unslash( $_POST['values'] ) ) );
update_option( 'wisdom_deactivation_reason_' . $this->plugin_name, $values );
}

if ( isset( $_POST['details'] ) ) {
$details = sanitize_text_field( wp_unslash( $_POST['details'] ) );
update_option( 'wisdom_deactivation_details_' . $this->plugin_name, $details );
}

$this->do_tracking(); // Run this straightaway.
echo 'success';
wp_die();
}

Explanation: goodbye_form_callback is not matched because it includes check_ajax_referer() in its body.

Step 3: Creating the Prompt

After running Semgrep on each plugin’s codebase, I used Claude to sort through the false positives. For instance, in example 1, even though update_dismiss_status_ajax() was matched by Semgrep , it isn’t actually a security issue. All the callback does is set an option to dismiss a pop-up, so it doesn’t really matter if a low-level user can call it.

Press enter or click to view image in full size

A screenshot of me experimenting with my system prompt on the Claude Console

Here was the prompt I decided to use to verify whether whether the callback function matched by Semgrep was actually a vulnerability. Basically, I asked Claude to produce a score from 1–5 of how likely this is a security issue based on the following criteria:

  1. Does this callback function seem to implement some sort of authorization check? If so, downgrade the score.
  2. Is it actually a problem if a low-level user can call this function? If the function is not a mutating/business-critical operation, downgrade the score.

You are an expert application security engineer reviewing WordPress PHP code. You will be provided with a PHP function snippet (let’s call this $VULN_FUNC) which was found to not include capability checks or nonce verification according to Semgrep. This was determined by looking for functions which did not contain any of these functions in their body: current_user_can(), wp_verify_nonce(), check_ajax_referer(), and check_admin_referer().

Your goal is to determine if it would be a security risk if $VULN_FUNC could be called by a Subscriber-level user in WordPress. You will return a risk score from 1–5 ( 1 meaning very low security risk and 5 very high security risk).

Score 5: High likelihood + High impact (e.g., arbitrary file deletion, SQL injection)
Score 4: High likelihood + Medium impact OR Medium likelihood + High impact
Score 3: Medium likelihood + Medium impact
Score 2: Low likelihood OR low impact
Score 1: Minimal risk (e.g., reading non-sensitive public data)

Your methodology for determining the score is as follows:

- First, determine the likelihood of exploitation. We are only looking functions with no-auth and which do not perform nonce checks. If $VULN_FUNC’s body contains any function whose name strongly suggests that it performs nonce verification or authorization checks, significantly lower the score. (Ex: check_nonce(), verify_auth(), but obviously not limited to these examples) Important note that the is_admin() function in WordPress does not actually check for authorization. If uncertain, err towards lower scores.
- Next, determine the impact. If $VULN_FUNC could be called by a Subscriber-level user, would this be a security issue? For example, downgrade the score if $VULN_FUNC doesn’t appear to take user input and/or doesn’t perform any sensitive or business critical mutating operations. Again, err towards lower scores if uncertain.

Your return format should be a string that follows this format:

score, explanation

”score” should be an integer value 1–5 representing the risk score determined. The second part ”explanation” should contain a very brief (150 words or less) explanation of how the score was determined.

You can find the script I used to automate the Semgrep Scan + Claude Analysis on my GitHub.

Step 4: Reviewing the Output

Once Semgrep and Claude both finished running, I would review the output files using a VS Code extension called SARIF Explorer. This tool was developed by Trail of Bits, and I’ve found it to be extremely useful when reviewing the output of static analysis tools. It has many features, such as the ability to mark findings as false positives or bugs, leave comments, and filter findings by keywords which I haven’t found in other tools.

Press enter or click to view image in full size

Interface for SARIF Explorer

Press enter or click to view image in full size

Another great SARIF explorer feature is that it opens up the file and highlights the exact snippet of code matched by a rule.

The search feature was especially useful, as it allowed me to focus on findings that were more likely to be serious issues.

Press enter or click to view image in full size

Step 5: Dynamic Testing

Press enter or click to view image in full size

Screenshot of my locally-hosted instance of WordPress

After reviewing the code, I would perform dynamic testing to produce a proof-of-concept (POC) if I thought the vulnerability was actually exploitable. I ran a locally-hosted instance of WordPress on Docker using the “Wordfence Docker WordPress Research Lab” as my pre-configured environment.

I personally liked this setup because all you need to do to start WordPress is run docker-compose up (You can find more detailed instructions here). It’s also nice because a debugger is directly integrated with WordPress, which makes it easy to step through the code. This saved me lots of time when producing a POC.

Press enter or click to view image in full size

Screenshot of me using the XDebug, a PHP debugger, in VSCode to produce a POC for CVE-2026–40786

Finally, after discovering a vulnerability, I would report my finding to either Patchstack or Wordfence’s bug bounty platform. WordFence has a more strict scope for bug bounties (100,000+ active installations for most vulnerabilities), but allows you to submit out-of-scope (OOS) reports, while Patchstack has a more lax scope (1,000+ active installations for most vulnerabilities), but does not allow OOS reports.

Conclusion

What Went Well

  • The entire process was pretty cheap. Reviewing ~11,000 Semgrep findings with Claude Opus 4.6 only cost around $120.
  • Claude was very good at filtering out obvious false positives. I took a quick look at about 100 of the findings Claude scored as either 1 or 2 (i.e. unlikely to be security issues). All of them were clearly non-issues. They either had an authorization check that wasn’t detected by Semgrep and/or the callback function performed actions which had no security impact (like upvoting a post). This reduced the amount of findings to review from ~11,000 → ~1,400.

Press enter or click to view image in full size

Finding which Claude correctly marked as a false positive

Press enter or click to view image in full size

Snippet of code which Claude correctly marked as a false positive
  • AI helped me discover vulnerabilities I otherwise wouldn’t have. For instance, I had absolutely almost no understanding of anything WooCommerce-related before I started. In the end, the majority of the vulnerabilities I discovered ended up being related to WooCommerce.

What Could Have Been Improved

  • While Claude was very good at filtering out most false positives, It definitely wasn’t perfect, especially when it came to understanding WordPress-specific security knowledge. For example, there were several instances where Claude thought that a function was vulnerable to SQL injection, when it actually wasn’t. Claude didn’t seem to be aware that WordPress automatically escaped quotes in superglobals like $_GET or $_POST.² Similarly, Claude did not understand that file upload functions in WordPress like media_handle_upload() or wp_handle_upload() were secure against webshell upload attacks. Next time, I think I should create a Claude skill containing WordPress specific-knowledge to further reduce the false positive rate.
<?php

/*
Example: This code is NOT vulnerable to SQL injection in WordPress because of wp_magic_quotes()
*/

$post_id = $_GET['post_id']; // Automatically escaped by wp_magic_quotes()
$query = "SELECT * FROM wp_posts WHERE ID = '" . $post_id . "'";

  • I also think I should’ve been more specific to Claude as to what counts as a business-critical operation. While the ability to deface the site layout, perform denial of service, and view private posts are all technically security issues, none of these vulnerabilities are in-scope for either Patchstack or Wordfence’s bug bounty programs.
  • The Semgrep rule I wrote took longer to run than other rules, especially on large code bases. I’m not sure why, but join-mode in Semgrep seems to use a lot of memory. I actually crashed my computer initially by trying to run the rule on every single plugin codebase at once. I thought it might’ve be an issue with my computer, so I tried it on a 32 GB RAM EC2 instance, but it also crashed about 1/4 of the way through. This is what caused me to eventually write a script to scan each plugin codebase individually. I’m wondering if there’s a way to significantly improve the performance of this rule.

Footnotes

  1. Technically, wp_verify_nonce() doesn’t actually verify the user’s role, it just verifies that the user’s CSRF token (nonce) is valid. However, in practice, I’ve found that this function prevents a lot of attacks because getting a valid nonce requires access to certain admin pages which low-level users can’t access.
  2. You can find a really in-depth guide on how to find SQL Injection in WordPress plugins on WordFence’s blog here.

文章来源: https://infosecwriteups.com/hunting-cves-in-wordpress-plugins-using-claude-semgrep-1f0c82453356?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh