构建自己的web目录扫描器

youncyb 发布于 2019-05-25 2922 次阅读 Tools


0x01 写在前面

打ctf的时候,有时候很烦一些隐藏的敏感文件,如:swp、bak等文件,虽然有FileSensor这个工具了,但用起来不太智能,于是就想着能不能结合下目录扫描+敏感文件泄露,于是就有了以下的研究:hiddenSensor

0x02 对dirsearch的分析

  1. 入口
class Program(object):
    def __init__(self):
        self.script_path = (os.path.dirname(os.path.realpath(__file__)))
        self.arguments = ArgumentParser(self.script_path)
        self.output = CLIOutput()
        self.controller = Controller(self.script_path, self.arguments, self.output)

ArgumentParserCLIOutput没什么好看的,直接看Controller:

 self.fuzzer = Fuzzer(self.requester, self.dictionary, testFailPath=self.arguments.testFailPath, threads=self.arguments.threadsCount, matchCallbacks=matchCallbacks, notFoundCallbacks=notFoundCallbacks, errorCallbacks=errorCallbacks)

跟进fuzzer:

def setupScanners(self):
   if len(self.scanners) != 0:
       self.scanners = {}
   self.defaultScanner = Scanner(self.requester, self.testFailPath, "")
   self.scanners['/'] = Scanner(self.requester, self.testFailPath, "/")
   for extension in self.dictionary.extensions:
       self.scanners[extension] = Scanner(
           self.requester, self.testFailPath, "." + extension)

fuzzer调用了 Scanner, 看来核心就在Scanner里面了:

import re
from difflib import SequenceMatcher

from lib.utils import RandomUtils
from thirdparty.sqlmap import DynamicContentParser


class ScannerException(Exception):
    pass


class Scanner(object):
    def __init__(self, requester, testPath=None, suffix=None):
        if testPath is None or testPath is "":
            self.testPath = RandomUtils.randString()
        else:
            self.testPath = testPath
        self.suffix = suffix if suffix is not None else ""
        self.requester = requester
        self.tester = None
        self.redirectRegExp = None
        self.invalidStatus = None
        self.dynamicParser = None
        self.ratio = 0.98
        self.redirectStatusCodes = [301, 302, 307]
        self.setup()

    def setup(self):
        firstPath = self.testPath + self.suffix
        firstResponse = self.requester.request(firstPath)
        self.invalidStatus = firstResponse.status
        if self.invalidStatus == 404:
            # Using the response status code is enough :-}
            return

        # look for redirects
        secondPath = RandomUtils.randString(omit=self.testPath) + self.suffix
        secondResponse = self.requester.request(secondPath)
        if firstResponse.status in self.redirectStatusCodes and firstResponse.redirect and secondResponse.redirect:
            self.redirectRegExp = self.generateRedirectRegExp(firstResponse.redirect, secondResponse.redirect)

        # Analyze response bodies
        self.dynamicParser = DynamicContentParser(self.requester, firstPath, firstResponse.body, secondResponse.body)
        baseRatio = float("{0:.2f}".format(self.dynamicParser.comparisonRatio))  # Rounding to 2 decimals
        # If response length is small, adjust ratio
        if len(firstResponse) < 2000:
            baseRatio -= 0.1
        if baseRatio < self.ratio:
            self.ratio = baseRatio

    def generateRedirectRegExp(self, firstLocation, secondLocation):
        if firstLocation is None or secondLocation is None:
            return None
        sm = SequenceMatcher(None, firstLocation, secondLocation)
        marks = []
        for blocks in sm.get_matching_blocks():
            i = blocks[0]
            n = blocks[2]
            # empty block
            if n == 0:
                continue
            mark = firstLocation[i:i + n]
            marks.append(mark)
        regexp = "^.*{0}.*$".format(".*".join(map(re.escape, marks)))
        return regexp

    def scan(self, path, response):
        if self.invalidStatus == 404 and response.status == 404:
            return False
        if self.invalidStatus != response.status:
            return True

        redirectToInvalid = False
        if self.redirectRegExp is not None  and response.redirect is not None:
            redirectToInvalid = re.match(self.redirectRegExp, response.redirect) is not None
            # If redirection doesn't match the rule, mark as found
            if not redirectToInvalid:
                return True

        ratio = self.dynamicParser.compareTo(response.body)
        if ratio >= self.ratio:
            return False
        elif redirectToInvalid and ratio >= (self.ratio - 0.15):
            return False
        return True

解读下这段代码,思考一个问题:怎么判断一个文件是否存在?
你也许会想到:

1. 取一个随机字符串,将其添加到url,如果返回404,那么我们就以status作为判别状态

2. 如果返回是 302|301 这样的status,怎么判别呢?像淘宝、百度等比较大型的网站为了用户体验,都不直接返回404的,而是通过跳转,重定向到一个错误的页面

3. 为了fuzz第二点,即判断 302|301 到底是真实文件的跳转还是重定向到错误?dirsearch 采用了响应头中Location和页面相似度的方法

4. 流程

  1. 取两个不存在的页面,拿到两个页面的Location,通过generateRedirectRegExp()函数产生Location的正则表达式,如果不满足这个这个正则表达式,证明是一个存在的页面,即:我们可以通过status判断这个页面
  2. 如果满足Location正则表达式, 用 self.dynamicParser.compareTo(response.body) 来对内容进行对比,如果相似度在一定范围内,则认为是不存在的页面,否则就认为这个页面存在

5. 优点:是通过Location,可以减少对内容进行相似度对比次数,增加程序速度

6. 缺点:

  1. 如果同一后缀 如:1xxxx.php2.xxxx.php 返回error.html, 而1xxxx.jsp2xxxx.jsp返回404.html,那么就会产生很多误判,因为作者只用了一种后缀,不信的可以用dirsearch扫一下百度,一大堆误判 302|301 的
  2. 不知道你们发现没,dirsearch的requests设定的allow_redirects=False 即不跟随302|301,那么问题来了,在取页面相似度时,也没有跟随,那不就是取的301或者302的内容吗,301和302在不跳转的时候,绝大多数情况下都是空页面,那相似度对比就没用了,所以我怀疑作者是为了交一份作业吧?(嘻嘻嘻),开个玩笑,dirsearch还是非常强大的,特别是他的暂停、重连、输出感觉非常棒,这也就是我为啥子在dirsearch上动刀,构建自己的hiddenSensor

0x03 hiddenSensor

1. 解决第六点的问题

  1. 取多个后缀,这里我选择常见的php|jsp|asp
  2. 直接跟随重定向
  3. 关键代码

import sys
sys.path.append('../../')


from difflib import SequenceMatcher
from thirdparty.sqlmap import DynamicContentParser
import re
import random
import string
import urllib.parse

#import requests
#from .Requester import Requester


class Fuzzer(object):
    def __init__(self, requester, path=None):
        self.requester = requester
        self.path = path
        self.suffix = ['php', 'jsp', 'asp']
        self.redirection_code = ['301', '302', '303', '307']
        self.base_ratio = 0.98
        self.flag = False
        self.redirection_regexp = []
        self.setup()

    def getRandomPath(self):
        letters = string.ascii_letters + string.digits
        return ''.join(random.choice(letters) for i in range(8))

    def generateRedirectRegExp(self, firstLocation, secondLocation):
        if firstLocation is None or secondLocation is None:
            return None
        sm = SequenceMatcher(None, firstLocation, secondLocation)
        marks = []
        for blocks in sm.get_matching_blocks():
            i = blocks[0]
            n = blocks[2]
            # empty block
            if n == 0:
                continue
            mark = firstLocation[i:i + n]
            if mark.startswith('http') or mark.startswith('https'):
                marks.append(mark)
        regexp = "^.*{0}.*$".format(".*".join(map(re.escape, marks))
                                    ).replace('http', '(https|http)')
        return regexp

    def getDmain(self, url):
        url_parser = urllib.parse.urlparse(url)
        return url_parser.scheme + '://' + url_parser.netloc

    def getHistory(self, history):
        history = re.findall('\d+', history)
        history = history[0] if len(history) >= 1 else []
        return str(history)

    def setup(self):
        if self.path is None or self.path is '':
            self.path = self.getRandomPath()

        firstpath_php = self.path + '.' + self.suffix[0]
        res1_php = self.requester.request(firstpath_php, True)
        secondpath_php = self.getRandomPath() + '.' + self.suffix[0]
        res2_php = self.requester.request(secondpath_php, True)

        firstpath_jsp = self.path + '.' + self.suffix[1]
        res1_jsp = self.requester.request(firstpath_jsp, True)
        secondpath_jsp = self.getRandomPath() + '.' + self.suffix[1]
        res2_jsp = self.requester.request(secondpath_jsp, True)

        firstpath_asp = self.path + '.' + self.suffix[2]
        res1_asp = self.requester.request(firstpath_asp, True)
        secondpath_asp = self.getRandomPath() + '.' + self.suffix[2]
        res2_asp = self.requester.request(secondpath_asp, True)

        if res1_asp.status_code == 404 and res1_php.status_code == 404 and res1_jsp.status_code == 404:
            self.flag = True
        else:

            if self.getHistory(str(res1_php.history)) in self.redirection_code and self.getHistory(str(res2_php.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_php.url, res2_php.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if self.getHistory(str(res1_jsp.history)) in self.redirection_code and self.getHistory(str(res2_jsp.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_jsp.url, res2_jsp.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if self.getHistory(str(res1_asp.history)) in self.redirection_code and self.getHistory(str(res2_asp.history)) in self.redirection_code:
                regExp = self.generateRedirectRegExp(
                    res1_asp.url, res2_asp.url)
                self.redirection_regexp.append(
                    regExp) if regExp not in self.redirection_regexp else 0

            if res1_asp.status_code == 404 and res1_php.status_code == 404 and res1_jsp.status_code == 404:
                self.flag = True

            self.dynamic_php = DynamicContentParser(
                self.requester, firstpath_php, res1_php.text, res2_php.text)
            if self.dynamic_php is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_php.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

            self.dynamic_jsp = DynamicContentParser(
                self.requester, firstpath_jsp, res1_jsp.text, res2_jsp.text)
            if self.dynamic_jsp is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_jsp.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

            self.dynamic_asp = DynamicContentParser(
                self.requester, firstpath_asp, res1_asp.text, res2_asp.text)
            if self.dynamic_asp is not None:
                ratio = float('{0:.2f}'.format(
                    self.dynamic_asp.comparisonRatio))
                if self.base_ratio > ratio:
                    self.base_ratio = ratio

    def fuzz(self, cmp_page):
        if self.flag == True:
            if cmp_page.status_code == 404:
                return False
            else:
                return True
        else:
            if cmp_page.status_code == 404:
                return False
            redirectToInvalid = []
            for express in self.redirection_regexp:
                if express is not None:
                    redirectToInvalid.append(
                        re.match(express, cmp_page.url) is not None)
            if not any(redirectToInvalid):
                return True
            ratio_php = self.dynamic_php.compareTo(cmp_page.text)
            ratio_jsp = self.dynamic_jsp.compareTo(cmp_page.text)
            ratio_asp = self.dynamic_asp.compareTo(cmp_page.text)
            if self.base_ratio <= ratio_php or self.base_ratio <= ratio_jsp or self.base_ratio <= ratio_asp:
                return False
            elif any(redirectToInvalid) and ((self.ratio - 0.15) <= ratio_php or (self.ratio - 0.15) <= ratio_jsp or (self.ratio - 0.15) <= ratio_asp):
                return False
            return True


if __name__ == '__main__':
    req = Requester('https://www.baidu.com/')
    fuzzer = Fuzzer(req)
    print(fuzzer.fuzz(requests.get('https://www.baidu.com/hello.php')))

2. --ctf参数针对讨厌的bak、swp等文件

3. 喜欢的 star一下吧(:

4. 源码:https://github.com/youncyb/hiddenSensor

0x04 hiddenSensor

1. 支持平台

macOS|Linux|Windows
python3

2. 用法
usage: hiddenSensor.py [-h] [-u URL] [-L URLLIST] [-e EXTENSION] [-H HEADERS]
                       [--user-agent USER_AGENT] [--random-agent] [-c COOKIES]
                       [-r RECURSIVE] [--proxy PROXY] [-s DELAY]
                       [--timeout TIMEOUT] [-m MAX_RETRIES] [-t THREADS_COUNT]
                       [-404 PATH_404] [--lowercase] [--uppercase]
                       [--dicts-path WORDLIST] [--ctf]

optional arguments:
  -h, --help            show this help message and exit

madatory settings:
  -u URL, --url URL     target
  -L URLLIST, --urlList URLLIST
                        url file path
  -e EXTENSION, --extension EXTENSION
                        the extension of website type (default : "php")

connection settings:
  -H HEADERS, --headers HEADERS
                        set headers
  --user-agent USER_AGENT
                        user-agent you want to specify
  --random-agent        random-agent (default: False)
  -c COOKIES, --cookie COOKIES
                        cookie you want to specify (example: -c
                        "domain=xxx;path=xxx")
  -r RECURSIVE, --recursive RECURSIVE
                        Recursive blasting subdir (default: 0 layers)
  --proxy PROXY         set proxy (http proxy,example:--proxy
                        http://127.0.0.1:1090)
  -s DELAY, --delay DELAY
                        time.sleep(delay) every request (default: 0)
  --timeout TIMEOUT     max time every request is waiting (default: 30 s)
  -m MAX_RETRIES, --max-retries MAX_RETRIES
                        max retries when meeting network problem (default: 5)

other settings:
  -t THREADS_COUNT, --thread THREADS_COUNT
                        max thread count you want to specify (default: 10)
  -404 PATH_404, --404-page PATH_404
                        the 404 page you want to specify (example: if
                        error.php -404 "error")
  --lowercase           force to be lowercase
  --uppercase           force to be uppercase
  --dicts-path WORDLIST
                        other dictionary you want to specify
  --ctf                 if it's specified, process will find sensor file
                        (xxx.php.bak, .xxx.php.swp ...)

example:python3 hiddenSensor.py -u http://www.xxx.com/ -e php --ctf

3. 特点
  1. 支持多线程
  2. 支持http头部定制
  3. 支持多个url扫描
  4. 支持暂停(ctrl+c)、继续
  5. 支持自定义字典,不过db里面的应该够了
  6. 支持自定义延时、最大重试次数
  7. 支持http代理
  8. 支持定义404路径
  9. 支持自定义几层递归扫描
  10. 支持.bak|.swp等文件扫描
4. 感谢dirsearch