一 前言
| 好多人对技术的理解都停留在懂得使用即可,因而只会用而不会灵活用,俗话说好奇害死猫,不然我也不会在凌晨1.48的时候决定写这篇博客,好吧不啰嗦了 |
| |
| 继续上一篇文章,后我有个问题(上文:"一篇文章理解web框架本质&手撸一个web框架https://www.cnblogs.com/xiaoyuanqujing/articles/11641028.html") |
| |
| 问:django是如何通过网络socket层接收数据并将请求转发给django的urls层 |
| |
| 有的人张口就来啊: |
| |
| 就是通过wsgi(Web Server Gateway Interface)啊 |
| |
| Django框架完全遵循wsgi协议,底层采用socket、socketserver、select网络模型实现,可以利用操作系统的非堵塞和线程池等特性 |
| |
| Django本身是用python代码实现的wsgi服务,并发非常低,默认6个 |
| |
| 而线上部署django项目时一般采用C语言实现的uWSGI |
| |
| 上面说的没错,但是,这他妈用得着你说吗!egon来带大家读一读源码吧,这应该可以说明一切 |
| |
| ps:储备知识 |
| |
| 网络编程:https://www.cnblogs.com/linhaifeng/articles/6129246.html |
| |
| 注意socketserver一定要看,因为我默认你会了这个东西 |
二 顺藤摸瓜
看源码,找到程序的入口是第一步,很简单,我们怎么启动django来着
python3.6 manage.py runserver 8088
| 好了,就它manage.py,我们来看看它里面都干了些啥(读源码不必面面俱到,找到关键代码即可) |
"""Django’s command-line utility for administrative tasks."""
import os
import sys
def main():
os.environ.setdefault(‘DJANGO_SETTINGS_MODULE’, ‘Foo.settings’)
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn’t import Django. Are you sure it’s installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
if name == ‘main‘:
main()
| 执行了 django.core.management下的execute_from_command_line ,关键代码是execute_from_command_line,看它就好 |
| |
| 但在这之前需要提一嘴django.core.management是一个包,在导入时会执行其下的__init__.py,这里面不仅有我们将要看的execute_from_command_line,其实还做了很多其他事 |
| 我们来看一下django.core.management.__init__.py中的execute_from_command_line |
def execute_from_command_line(argv=None):
"""Run a ManagementUtility."""
utility = ManagementUtility(argv) # 调用当前文件中的类ManagementUtility产生对象,这个类就在该函数的上方,一找就能找到
utility.execute() # 调用类ManagementUtility中的方法execute
| 关键代码utility.execute() ,去类ManagementUtility中可以找到,该方法特别长,就不列举了,一连串if条件就是判断参数是否合法 |
| |
| 关键代码在self.fetch_command(subcommand).run_from_argv(self.argv),链式调用,我们一点一点来看 |
| |
| 先看fetch_command(subcommand),即fetch_command('runserver'),就在类ManagementUtility中往上翻可以找到该方法,关键代码在注释里我都标注好了 |
def fetch_command(self, subcommand):
"""self.fetch_command
是利用django内置的命令管理工具去匹配到具体的模块,例如self.fetch_command(subcommand)其实就相当于是self.fetch_command(‘runserver’),它最终找到了==django.contrib.staticfiles.management.commands.runserver.Command==这个命令工具。
django中的命令工具代码组织采用的是策略模式+接口模式,也就是说django.core.management.commands这个目录下面存在各种命令工具,每个工具下面都有一个Command接口,当匹配到’runserver’时调用’runserver’命令工具的Command接口,当匹配到’migrate’时调用’migrate’命令工具的Command接口。
"""
commands = get_commands() # 关键代码1
try:
app_name = commands[subcommand] # 关键代码2
except KeyError:
if os.environ.get(‘DJANGO_SETTINGS_MODULE’):
settings.INSTALLED_APPS
else:
sys.stderr.write("No Django settings specified.\n")
possible_matches = get_close_matches(subcommand, commands)
sys.stderr.write(‘Unknown command: %r’ % subcommand)
if possible_matches:
sys.stderr.write(‘. Did you mean %s?’ % possible_matches[0])
sys.stderr.write("\nType ‘%s help’ for usage.\n" % self.prog_name)
sys.exit(1)
if isinstance(app_name, BaseCommand):
klass = app_name
else:
klass = load_command_class(app_name, subcommand) # 关键代码3
return klass # 关键代码4
| 关键代码1:get_comands()会返回一个字典 |
| |
| 关键代码2:app_name = commands[subcommand]取值操作即app_name='django.core' |
| |
| 关键代码3:klass = load_command_class(app_name, subcommand)即klass = load_command_class('django.core' ,’runserver‘),自己去看很简单,klass=django.core.management.commands.runserver.Command类 |
| |
| 好啦,此时我们得知self.fetch_command(subcommand)得到的是类Command,好多人就在这懵逼了,接下来链式调用应该去找run_from_argv(self.argv)了,但是在Command类中怎么也找不到,傻逼了吧,去Command的父类BaseCommand |
里找啊,傻叉(这就是好多人看源码的心态,看着看着就崩了,崩几次就疯了)
class BaseCommand:
def run_from_argv(self, argv):
"""
run_from_argv的作用是初始化中间件、启动服务,也就是拉起wgsi(但实际上并不是由它来直接完成,而是由后续很多其他代码来完成),直观上看它应该是runserver.Command对象的一个方法,但实际上要稍微更复杂一些,因为没有列出关联代码,所以在下一个代码块中进行说明。
"""
self._called_from_command_line = True
parser = self.create_parser(argv[0], argv[1])
| options = parser.parse_args(argv[2:]) |
| cmd_options = vars(options) |
| args = cmd_options.pop('args', ()) |
| handle_default_options(options) |
| try: |
| self.execute(*args, **cmd_options) |
| except Exception as e: |
| if options.traceback or not isinstance(e, CommandError): |
| raise |
| if isinstance(e, SystemCheckError): |
| self.stderr.write(str(e), lambda x: x) |
| else: |
| self.stderr.write('%s: %s' % (e.__class__.__name__, e)) |
| sys.exit(1) |
| finally: |
| try: |
| connections.close_all() |
| except ImproperlyConfigured: |
| |
| pass |
| |
| 关键代码self.execute(*args, **cmd_options),注意了,这个execute应该去Command类里找啊,因为该self是Command类的对象啊,让我们回到Command类中,找execute |
class Command(BaseCommand):
。。。。。。。
def execute(self, *args, *options):
if options[‘no_color’]:
super().execute(args, **options) # 关键代码1
| def get_handler(self, *args, **options): |
| """Return the default WSGI handler for the runner.""" |
| return get_internal_wsgi_application() |
| |
| def handle(self, *args, **options): |
| if not settings.DEBUG and not settings.ALLOWED_HOSTS: |
| raise CommandError('You must set settings.ALLOWED_HOSTS if DEBUG is False.') |
| |
| self.use_ipv6 = options['use_ipv6'] |
| if self.use_ipv6 and not socket.has_ipv6: |
| raise CommandError('Your Python does not support IPv6.') |
| self._raw_ipv6 = False |
| if not options['addrport']: |
| self.addr = '' |
| self.port = self.default_port |
| else: |
| m = re.match(naiveip_re, options['addrport']) |
| if m is None: |
| raise CommandError('"%s" is not a valid port number ' |
| 'or address:port pair.' % options['addrport']) |
| self.addr, _ipv4, _ipv6, _fqdn, self.port = m.groups() |
| if not self.port.isdigit(): |
| raise CommandError("%r is not a valid port number." % self.port) |
| if self.addr: |
| if _ipv6: |
| self.addr = self.addr[1:-1] |
| self.use_ipv6 = True |
| self._raw_ipv6 = True |
| elif self.use_ipv6 and not _fqdn: |
| raise CommandError('"%s" is not a valid IPv6 address.' % self.addr) |
| if not self.addr: |
| self.addr = self.default_addr_ipv6 if self.use_ipv6 else self.default_addr |
| self._raw_ipv6 = self.use_ipv6 |
| self.run(**options) |
| |
| def run(self, **options): |
| use_reloader = options['use_reloader'] |
| |
| if use_reloader: |
| autoreload.run_with_reloader(self.inner_run, **options) |
| else: |
| self.inner_run(None, **options) |
| |
| def inner_run(self, *args, **options): |
| autoreload.raise_last_exception() |
| |
| threading = options['use_threading'] |
| shutdown_message = options.get('shutdown_message', '') |
| quit_command = 'CTRL-BREAK' if sys.platform == 'win32' else 'CONTROL-C' |
| |
| self.stdout.write("Performing system checks...\n\n") |
| self.check(display_num_errors=True) |
| self.check_migrations() |
| now = datetime.now().strftime('%B %d, %Y - %X') |
| self.stdout.write(now) |
| self.stdout.write(( |
| "Django version %(version)s, using settings %(settings)r\n" |
| "Starting development server at %(protocol)s://%(addr)s:%(port)s/\n" |
| "Quit the server with %(quit_command)s.\n" |
| ) % { |
| "version": self.get_version(), |
| "settings": settings.SETTINGS_MODULE, |
| "protocol": self.protocol, |
| "addr": '[%s]' % self.addr if self._raw_ipv6 else self.addr, |
| "port": self.port, |
| "quit_command": quit_command, |
| }) |
| |
| try: |
| handler = self.get_handler(*args, **options) |
| run(self.addr, int(self.port), handler, |
| ipv6=self.use_ipv6, threading=threading, server_cls=self.server_cls) |
| except socket.error as e: |
| ERRORS = { |
| errno.EACCES: "You don't have permission to access that port.", |
| errno.EADDRINUSE: "That port is already in use.", |
| errno.EADDRNOTAVAIL: "That IP address can't be assigned to.", |
| } |
| try: |
| error_text = ERRORS[e.errno] |
| except KeyError: |
| error_text = e |
| self.stderr.write("Error: %s" % error_text) |
| os._exit(1) |
| except KeyboardInterrupt: |
| if shutdown_message: |
| self.stdout.write(shutdown_message) |
| sys.exit(0) |
| |
| 关键代码1:super().execute(*args, **options)会去父类BaseCommand中找到excute方法,该方法中的关键代码为output = self.handle(*args, **options),该self是Command类的对象,所以接着去Command类中找到handle方法 |
| |
| 关键代码2->关键代码3->关键代码4->定位到一个run方法,该方法就在本文件开头位置导入过 |
from django.core.servers.basehttp import (
WSGIServer, get_internal_wsgi_application, run,
)
| 截止到该部分,实际上就是一个初始化过程,全部都为'runserver'服务,虽然很多代码我没有列出来,但是它确实做了一些,例如参数解析、端口指定检测、ipv4检测、ipv6检测、端口是否占用、线程检查等工作。 |
| 接下来我把注意力放在django.core.servers.basehttp下的run函数上,代码如下 |
def run(addr, port, wsgi_handler, ipv6=False, threading=False, server_cls=WSGIServer): # 形参wsgi_handler的值为StaticFilesHandler
"""知会各个对象启动wsgi服务"""
server_address = (addr, port)
if threading:
httpd_cls = type(‘WSGIServer’, (socketserver.ThreadingMixIn, server_cls), {}) # 关键代码1
else:
httpd_cls = server_cls
httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6) # 关键代码2
if threading:
httpd.daemon_threads = True
httpd.set_app(wsgi_handler) # 关键代码3
httpd.serve_forever() # 关键代码4
| 关键代码1:调用内置元类type创建一个类WSGIServer,该类继承了(socketserver.ThreadingMixIn, WSGIServer),去代码块WSGIServer类中查看它本身只继承了wsgiref.simple_server.WSGIServer、object这两个类,通过type重新创建一下是给类WSGIServer强行添加了一个爹socketserver.ThreadingMixIn,这么做的意义是每次调用类WSGIServer的时候都会单独启用一个线程来处理,说完了WSGIServer的第一个基类,我们再来说它的第二个基类WSGIServer完整的继承家族 |
django.core.servers.basehttp.WSGIServer
wsgiref.simple_server.WSGIServer、 socketserver.ThreadingMixIn
http.server.HTTPServer
socketserver.TCPServer
socketserver.BaseServer
object
httpd_cls这个变量被定义完成之后,由于大量的继承关系,它其实已经不单纯的属于django,它是一个传统意义上的WSGI服务对象了。
| 关键代码2:httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)这行代码非常重要,因为它是WSGI服务器与django之间相互通信的唯一枢纽通道,也就是说,当WSGI服务对象收到socket请求后,会将这个请求传递给django的WSGIRequestHandler(下节会列出WSGIRequestHandler是如何工作的)。 |
关键代码3:httpd.set_app(wsgi_handler)是将django.contrib.staticfiles.handlers.StaticFilesHandler 传递给WSGIServer当作一个application,当WSGIServer收到网络请求后,可以将数据分发给django.core.servers.basehttp.WSGIRequestHandler,最终由django.core.servers.basehttp.WSGIRequestHandler将数据传递给application(即:django.contrib.staticfiles.handlers.StaticFilesHandler)。
关键代码4:httpd.serve.forever()启动非堵塞网络监听服务。
| 总结:综上所述其实都是在为启动django服务而做准备,大致内容如下 |
三 httpd.serve.forever()后续事宜
| 承接上一小节httpd.serve_forever我们接着聊,httpd.serve_forever调用的是socketserver.BaseServer.serve_forever方法(关于socketserver的源码解析点击这里,下面我直接说流程,原理不再累述)。 |
| serve_forever就是开启了一个while来无限监听网络层的scoket请求,当一条请求过来时,就层层转交到django.core.servers.basehttp.WSGIRequestHandler代码如下 |
class WSGIRequestHandler(simple_server.WSGIRequestHandler):
protocol_version = ‘HTTP/1.1’
| def address_string(self): |
| return self.client_address[0] |
| |
| def log_message(self, format, *args): |
| extra = { |
| 'request': self.request, |
| 'server_time': self.log_date_time_string(), |
| } |
| if args[1][0] == '4': |
| if args[0].startswith('\x16\x03'): |
| extra['status_code'] = 500 |
| logger.error( |
| "You're accessing the development server over HTTPS, but " |
| "it only supports HTTP.\n", extra=extra, |
| ) |
| return |
| |
| if args[1].isdigit() and len(args[1]) == 3: |
| status_code = int(args[1]) |
| extra['status_code'] = status_code |
| |
| if status_code >= 500: |
| level = logger.error |
| elif status_code >= 400: |
| level = logger.warning |
| else: |
| level = logger.info |
| else: |
| level = logger.info |
| |
| level(format, *args, extra=extra) |
| |
| def get_environ(self): |
| for k in self.headers: |
| if '_' in k: |
| del self.headers[k] |
| |
| return super().get_environ() |
| |
| def handle(self): |
| self.close_connection = True |
| self.handle_one_request() |
| while not self.close_connection: |
| self.handle_one_request() |
| try: |
| self.connection.shutdown(socket.SHUT_WR) |
| except (socket.error, AttributeError): |
| pass |
| |
| def handle_one_request(self): |
| """Copy of WSGIRequestHandler.handle() but with different ServerHandler""" |
| self.raw_requestline = self.rfile.readline(65537) |
| if len(self.raw_requestline) > 65536: |
| self.requestline = '' |
| self.request_version = '' |
| self.command = '' |
| self.send_error(414) |
| return |
| |
| if not self.parse_request(): |
| return |
| |
| handler = ServerHandler( |
| self.rfile, self.wfile, self.get_stderr(), self.get_environ() |
| ) |
| handler.request_handler = self |
| handler.run(self.server.get_app()) |
| |
| 关键代码:方法handle,至于如何调用到它,需要从WSGIRequestHandler的实例化说起,上面我们提到当执行self.RequestHandler(request, client_address, self)时等同于执行django.core.servers.basehttp.WSGIRequestHandler(request, client_address, self),而WSGIRequestHandler的继承的父类们如下 |
| 实例化类WSGIRequestHandler时发现它并没有__init__和__call__方法,需要去父类中找,最终在socketserver.BaseRequestHandler中找到,它调用了self.hande方法,注意self.handle并不是直接调用BaseRequestHandler中的handle,根据对象属性的查找关系,会去django.core.servers.basehttp.WSGIRequestHandler类中找,找到了handle,其实是相当于回调了handle,代码如下 |
| |
| def handle(self): |
| self.close_connection = True |
| self.handle_one_request() |
| while not self.close_connection: |
| self.handle_one_request() |
| try: |
| self.connection.shutdown(socket.SHUT_WR) |
| except (socket.error, AttributeError): |
| pass |
| |
| 关键代码:self.handle_one_request()直接在当前类中找到,代码如下 |
| |
| def handle_one_request(self): |
| """Copy of WSGIRequestHandler.handle() but with different ServerHandler""" |
| self.raw_requestline = self.rfile.readline(65537) |
| if len(self.raw_requestline) > 65536: |
| self.requestline = '' |
| self.request_version = '' |
| self.command = '' |
| self.send_error(414) |
| return |
| |
| if not self.parse_request(): |
| return |
| |
| |
| handler = ServerHandler( |
| self.rfile, self.wfile, self.get_stderr(), self.get_environ() |
| ) |
| |
| handler.request_handler = self |
| handler.run(self.server.get_app()) |
| |
| 关键代码1:实例化了ServerHandler对象。 |
| |
| 关键代码2:意思是将django.contrib.staticfiles.handlers.StaticFilesHandler转交给ServerHandler去运行,而ServerHandler对象并没有run方法,去它的父类们中去找, |
| 最终在 wsgiref.handlers.BaseHandler 中找到了run方法,代码如下 |
class BaseHandler:
…………
| def run(self, application): |
| try: |
| self.setup_environ() |
| self.result = application(self.environ, self.start_response) |
| self.finish_response() |
| except: |
| try: |
| self.handle_error() |
| except: |
| self.close() |
| raise |
| |
| 关键代码:application(self.environ, self.start_response)也就相当于是django.contrib.staticfiles.handlers.StaticFilesHandler.__call__(self.environ, lf.start_response)。 |
class StaticFilesHandler(WSGIHandler): # django专门用来处理静态文件的类
"""
WSGI middleware that intercepts calls to the static files directory, as
defined by the STATIC_URL setting, and serves those files.
"""
| |
| handles_files = True |
| |
| def __init__(self, application): |
| self.application = application |
| self.base_url = urlparse(self.get_base_url()) |
| super().__init__() |
| |
| def load_middleware(self): |
| |
| |
| pass |
| |
| def get_base_url(self): |
| utils.check_settings() |
| return settings.STATIC_URL |
| |
| def _should_handle(self, path): |
| """ |
| Check if the path should be handled. Ignore the path if: |
| * the host is provided as part of the base_url |
| * the request's path isn't under the media path (or equal) |
| """ |
| return path.startswith(self.base_url[2]) and not self.base_url[1] |
| |
| def file_path(self, url): |
| """ |
| Return the relative path to the media file on disk for the given URL. |
| """ |
| relative_url = url[len(self.base_url[2]):] |
| return url2pathname(relative_url) |
| |
| def serve(self, request): |
| """Serve the request path.""" |
| return serve(request, self.file_path(request.path), insecure=True) |
| |
| def get_response(self, request): |
| from django.http import Http404 |
| |
| if self._should_handle(request.path): |
| try: |
| return self.serve(request) |
| except Http404 as e: |
| return response_for_exception(request, e) |
| return super().get_response(request) |
| |
| def __call__(self, environ, start_response): |
| if not self._should_handle(get_path_info(environ)): |
| return self.application(environ, start_response) |
| return super().__call__(environ, start_response) |
| |
| 关键代码1:self.application(environ, start_response) ,先说self.application是个啥呢,可以看到在该类的__init__方法中执行了一个self.application = application,那它的值到底是啥呢??? |
| |
| 教你一招,源码读到这里,不必再回头,读源码的窍门在于读一点记录一点,遇到看不懂的变量打印一下值看一下即可,最好不要重复回头,那样只会让你更晕,例如我们用管理用户(修改django源码需要权限)修改文件django.contrib.staticfiles.handlers.StaticFilesHandler加一行打印代码, |
| |
| def __init__(self, application): |
| self.application = application |
| print('django源码打印--->self.application值为',self.application) |
| self.base_url = urlparse(self.get_base_url()) |
| super().__init__() |
| |
| 然后重启django可以看到self.application的值为,去查看类django.core.handlers.wsgi.WSGIHandler 的实例化发现加载了中间件self.load_middleware(),至此我们完成分析如何从wsgi服务到将url请求信息转交给django,剩下的就是django的内部流程啦,我们有机会再继续剖析吧 |
| |
| 另外补充:可以用同样的手法查看envion变量,该变量非常重要,http协议的请求信息都被放入了environ变量中。我们分析流程中的WSGIServer类主要用于处理socket请求和对接WSGIRequestHandler,WSGIRequestHandler类主要针对environ进行预处理和对接WSGIServerHandler,而ServerHandler类则主要用于执行应用程序(application)和返回响应给WSGIServer,OK我说完了,去你麻痹的源码分析,屁用没有 |