构建自己的web目录扫描器

0x01 写在前面

打ctf的时候，有时候很烦一些隐藏的敏感文件，如:swp、bak等文件，虽然有FileSensor这个工具了，但用起来不太智能，于是就想着能不能结合下目录扫描+敏感文件泄露，于是就有了以下的研究：hiddenSensor

0x02 对dirsearch的分析

入口

class Program(object):
    def __init__(self):
        self.script_path = (os.path.dirname(os.path.realpath(__file__)))
        self.arguments = ArgumentParser(self.script_path)
        self.output = CLIOutput()
        self.controller = Controller(self.script_path, self.arguments, self.output)

ArgumentParser和CLIOutput没什么好看的,直接看Controller：

 self.fuzzer = Fuzzer(self.requester, self.dictionary, testFailPath=self.arguments.testFailPath, threads=self.arguments.threadsCount, matchCallbacks=matchCallbacks, notFoundCallbacks=notFoundCallbacks, errorCallbacks=errorCallbacks)

跟进fuzzer:

def setupScanners(self):
   if len(self.scanners) != 0:
       self.scanners = {}
   self.defaultScanner = Scanner(self.requester, self.testFailPath, "")
   self.scanners['/'] = Scanner(self.requester, self.testFailPath, "/")
   for extension in self.dictionary.extensions:
       self.scanners[extension] = Scanner(
           self.requester, self.testFailPath, "." + extension)

fuzzer调用了 Scanner, 看来核心就在Scanner里面了：

import re
from difflib import SequenceMatcher

from lib.utils import RandomUtils
from thirdparty.sqlmap import DynamicContentParser


class ScannerException(Exception):
    pass


class Scanner(object):
    def __init__(self, requester, testPath=None, suffix=None):
        if testPath is None or testPath is "":
            self.testPath = RandomUtils.randString()
        else:
            self.testPath = testPath
        self.suffix = suffix if suffix is not None else ""
        self.requester = requester
        self.tester = None
        self.redirectRegExp = None
        self.invalidStatus = None
        self.dynamicParser = None
        self.ratio = 0.98
        self.redirectStatusCodes = [301, 302, 307]
        self.setup()

    def setup(self):
        firstPath = self.testPath + self.suffix
        firstResponse = self.requester.request(firstPath)
        self.invalidStatus = firstResponse.status
        if self.invalidStatus == 404:
            # Using the response status code is enough :-}
            return

        # look for redirects
        secondPath = RandomUtils.randString(omit=self.testPath) + self.suffix
        secondResponse = self.requester.request(secondPath)
        if firstResponse.status in self.redirectStatusCodes and firstResponse.redirect and secondResponse.redirect:
            self.redirectRegExp = self.generateRedirectRegExp(firstResponse.redirect, secondResponse.redirect)

        # Analyze response bodies
        self.dynamicParser = DynamicContentParser(self.requester, firstPath, firstResponse.body, secondResponse.body)
        baseRatio = float("{0:.2f}".format(self.dynamicParser.comparisonRatio))  # Rounding to 2 decimals
        # If response length is small, adjust ratio
        if len(firstResponse) < 2000:
            baseRatio -= 0.1
        if baseRatio < self.ratio:
            self.ratio = baseRatio

    def generateRedirectRegExp(self, firstLocation, secondLocation):
        if firstLocation is None or secondLocation is None:
            return None
        sm = SequenceMatcher(None, firstLocation, secondLocation)
        marks = []
        for blocks in sm.get_matching_blocks():
            i = blocks[0]
            n = blocks[2]
            # empty block
            if n == 0:
                continue
            mark = firstLocation[i:i + n]
            marks.append(mark)
        regexp = "^.*{0}.*$".format(".*".join(map(re.escape, marks)))
        return regexp

    def scan(self, path, response):
        if self.invalidStatus == 404 and response.status == 404:
            return False
        if self.invalidStatus != response.status:
            return True

        redirectToInvalid = False
        if self.redirectRegExp is not None  and response.redirect is not None:
            redirectToInvalid = re.match(self.redirectRegExp, response.redirect) is not None
            # If redirection doesn't match the rule, mark as found
            if not redirectToInvalid:
                return True

        ratio = self.dynamicParser.compareTo(response.body)
        if ratio >= self.ratio:
            return False
        elif redirectToInvalid and ratio >= (self.ratio - 0.15):
            return False
        return True

解读下这段代码，思考一个问题：怎么判断一个文件是否存在？
你也许会想到：

1. 取一个随机字符串，将其添加到url，如果返回404，那么我们就以status作为判别状态

2. 如果返回是 302|301 这样的status，怎么判别呢？像淘宝、百度等比较大型的网站为了用户体验，都不直接返回404的，而是通过跳转，重定向到一个错误的页面

3. 为了fuzz第二点，即判断 302|301 到底是真实文件的跳转还是重定向到错误？dirsearch 采用了响应头中Location和页面相似度的方法

4. 流程

取两个不存在的页面，拿到两个页面的Location，通过generateRedirectRegExp()函数产生Location的正则表达式，如果不满足这个这个正则表达式，证明是一个存在的页面，即：我们可以通过status判断这个页面
如果满足Location正则表达式, 用 self.dynamicParser.compareTo(response.body) 来对内容进行对比，如果相似度在一定范围内，则认为是不存在的页面，否则就认为这个页面存在

5. 优点：是通过Location,可以减少对内容进行相似度对比次数，增加程序速度

6. 缺点：

如果同一后缀如：1xxxx.php和2.xxxx.php 返回error.html, 而1xxxx.jsp和2xxxx.jsp返回404.html,那么就会产生很多误判，因为作者只用了一种后缀，不信的可以用dirsearch扫一下百度，一大堆误判 302|301 的
不知道你们发现没，dirsearch的requests设定的allow_redirects=False 即不跟随302|301，那么问题来了，在取页面相似度时，也没有跟随，那不就是取的301或者302的内容吗，301和302在不跳转的时候，绝大多数情况下都是空页面，那相似度对比就没用了，所以我怀疑作者是为了交一份作业吧？(嘻嘻嘻)，开个玩笑，dirsearch还是非常强大的，特别是他的暂停、重连、输出感觉非常棒，这也就是我为啥子在dirsearch上动刀，构建自己的hiddenSensor

0x03 hiddenSensor

1. 解决第六点的问题

取多个后缀，这里我选择常见的php|jsp|asp
直接跟随重定向
关键代码


import sys
sys.path.append('../../')


from difflib import SequenceMatcher
from thirdparty.sqlmap import DynamicContentParser
import re
import random
import string
import urllib.parse

#import requests
#from .Requester import Requester


class Fuzzer(object):
    def __init__(self, requester, path=None):
        self.requester = requester
        self.path = path
        self.suffix = ['php', 'jsp', 'asp']
        self.redirection_code = ['301', '302', '303', '307']
        self.base_ratio = 0.98
        self.flag = False
        self.redirection_regexp = []
        self.setup()

    def getRandomPath(self):
        letters = string.ascii_letters + string.digits
        return ''.join(random.choice(letters) for i in range(8))

    def generateRedirectRegExp(self, firstLocation, secondLocation):
        if firstLocation is None or secondLocation is None:
            return None
        sm = SequenceMatcher(None, firstLocation, secondLocation)
        marks = []
        for blocks in sm.get_matching_blocks():
            i = blocks[0]
            n = blocks[2]
            # empty block
            if n == 0:
                continue
            mark = firstLocation[i:i + n]
            if mark.startswith('http') or mark.startswith('https'):
                marks.append(mark)
        regexp = "^.*{0}.*$".format(".*".join(map(re.escape, marks))
                                    ).replace('http', '(https|http)')
        return regexp

    def getDmain(self, url):
        url_parser = urllib.parse.urlparse(url)
        return url_parser.scheme + '://' + url_parser.netloc

    def getHistory(self, history):
        history = re.findall('\d+', history)
        history = history[0] if len(history) >= 1 else []
        return str(history)

    def setup(self):
        if self.path is None or self.path is '':
            self.path = self.getRandomPath()

        firstpath_php = self.path + '.' + self.suffix[0]
        res1_php = self.requester.request(firstpath_php, True)
        secondpath_php = self.getRandomPath() + '.' + self.suffix[0]
        res2_php = self.requester.request(secondpath_php, True)

        firstpath_jsp = self.path + '.' + self.suffix[1]
        res1_jsp = self.requester.request(firstpath_jsp, True)
        secondpath_jsp = self.getRandomPath() + '.' + self.suffix[1]
        res2_jsp = self.requester.request(secondpath_jsp, True)

        firstpath_asp = self.path + '.' + self.suffix[2]
        res1_asp = self.requester.request(firstpath_asp, True)
        secondpath_asp = self.getRandomPath() + '.' + self.suffix[2]
        res2_asp = self.requester.request(secondpath_asp, True)

        if res1_asp.status_code == 404 and res1_php.status_code == 404 and res1_jsp.status_code == 404:
            self.flag = True
        else:

            if self.getHistory(str(res1_php.history)) in self.redirection_code and self.getHistory(str(res2_php.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_php.url, res2_php.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if self.getHistory(str(res1_jsp.history)) in self.redirection_code and self.getHistory(str(res2_jsp.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_jsp.url, res2_jsp.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if self.getHistory(str(res1_asp.history)) in self.redirection_code and self.getHistory(str(res2_asp.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_asp.url, res2_asp.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if res1_asp.status_code == 404 and res1_php.status_code == 404 and res1_jsp.status_code == 404:
                self.flag = True

            self.dynamic_php = DynamicContentParser(
                self.requester, firstpath_php, res1_php.text, res2_php.text)
            if self.dynamic_php is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_php.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

            self.dynamic_jsp = DynamicContentParser(
                self.requester, firstpath_jsp, res1_jsp.text, res2_jsp.text)
            if self.dynamic_jsp is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_jsp.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

            self.dynamic_asp = DynamicContentParser(
                self.requester, firstpath_asp, res1_asp.text, res2_asp.text)
            if self.dynamic_asp is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_asp.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

    def fuzz(self, cmp_page):
        if self.flag == True:
            if cmp_page.status_code == 404:
                return False
            else:
                return True
        else:
            if cmp_page.status_code == 404:
                return False
            redirectToInvalid = []
            for express in self.redirection_regexp:
                if express is not None:
                    redirectToInvalid.append(
                        re.match(express, cmp_page.url) is not None)
            if not any(redirectToInvalid):
                return True
            ratio_php = self.dynamic_php.compareTo(cmp_page.text)
            ratio_jsp = self.dynamic_jsp.compareTo(cmp_page.text)
            ratio_asp = self.dynamic_asp.compareTo(cmp_page.text)
            if self.base_ratio <= ratio_php or self.base_ratio <= ratio_jsp or self.base_ratio <= ratio_asp:
                return False
            elif any(redirectToInvalid) and ((self.ratio - 0.15) <= ratio_php or (self.ratio - 0.15) <= ratio_jsp or (self.ratio - 0.15) <= ratio_asp):
                return False
            return True


if __name__ == '__main__':
    req = Requester('https://www.baidu.com/')
    fuzzer = Fuzzer(req)
    print(fuzzer.fuzz(requests.get('https://www.baidu.com/hello.php')))

2. --ctf参数针对讨厌的bak、swp等文件

3. 喜欢的 star一下吧(:

4. 源码：https://github.com/youncyb/hiddenSensor

0x04 hiddenSensor

1. 支持平台

macOS|Linux|Windows
python3

2. 用法

usage: hiddenSensor.py [-h] [-u URL] [-L URLLIST] [-e EXTENSION] [-H HEADERS]
                       [--user-agent USER_AGENT] [--random-agent] [-c COOKIES]
                       [-r RECURSIVE] [--proxy PROXY] [-s DELAY]
                       [--timeout TIMEOUT] [-m MAX_RETRIES] [-t THREADS_COUNT]
                       [-404 PATH_404] [--lowercase] [--uppercase]
                       [--dicts-path WORDLIST] [--ctf]

optional arguments:
  -h, --help            show this help message and exit

madatory settings:
  -u URL, --url URL     target
  -L URLLIST, --urlList URLLIST
                        url file path
  -e EXTENSION, --extension EXTENSION
                        the extension of website type (default : "php")

connection settings:
  -H HEADERS, --headers HEADERS
                        set headers
  --user-agent USER_AGENT
                        user-agent you want to specify
  --random-agent        random-agent (default: False)
  -c COOKIES, --cookie COOKIES
                        cookie you want to specify (example: -c
                        "domain=xxx;path=xxx")
  -r RECURSIVE, --recursive RECURSIVE
                        Recursive blasting subdir (default: 0 layers)
  --proxy PROXY         set proxy (http proxy,example:--proxy
                        http://127.0.0.1:1090)
  -s DELAY, --delay DELAY
                        time.sleep(delay) every request (default: 0)
  --timeout TIMEOUT     max time every request is waiting (default: 30 s)
  -m MAX_RETRIES, --max-retries MAX_RETRIES
                        max retries when meeting network problem (default: 5)

other settings:
  -t THREADS_COUNT, --thread THREADS_COUNT
                        max thread count you want to specify (default: 10)
  -404 PATH_404, --404-page PATH_404
                        the 404 page you want to specify (example: if
                        error.php -404 "error")
  --lowercase           force to be lowercase
  --uppercase           force to be uppercase
  --dicts-path WORDLIST
                        other dictionary you want to specify
  --ctf                 if it's specified, process will find sensor file
                        (xxx.php.bak, .xxx.php.swp ...)

example:python3 hiddenSensor.py -u http://www.xxx.com/ -e php --ctf

3. 特点

支持多线程
支持http头部定制
支持多个url扫描
支持暂停(ctrl+c)、继续
支持自定义字典，不过db里面的应该够了
支持自定义延时、最大重试次数
支持http代理
支持定义404路径
支持自定义几层递归扫描
支持.bak|.swp等文件扫描

构建自己的web目录扫描器

0x01 写在前面

0x02 对dirsearch的分析

0x03 hiddenSensor

0x04 hiddenSensor

1. 支持平台

2. 用法

3. 特点

4. 感谢`dirsearch`

mssql 渗透

msfvenom自动补全

Comments NOTHING

发表评论取消回复

0x01 写在前面

0x02 对dirsearch的分析

0x03 hiddenSensor

0x04 hiddenSensor

1. 支持平台

2. 用法

3. 特点

4. 感谢dirsearch

分享到：

mssql 渗透

msfvenom自动补全

Comments NOTHING

发表评论取消回复

4. 感谢`dirsearch`