python

python

解决Ubuntu下,/usr/bin/pycompile无法找到模块ConfigParser

Python网站管理员 Published the article • 0 comments • 360 views • 2019-04-17 14:52 • 来自相关话题

pycompile出现异常,找不到模块ConfigParser,期初以为是自己没有安装,后来用pip安装尝试安装,已经安装了

```
Traceback (most recent call last):
File "/usr/bin/pycompile", line 35, in
from debpython.version import SUPPORTED, debsorted, vrepr, \
File "/usr/share/python/debpython/version.py", line 24, in
from ConfigParser import SafeConfigParser
ImportError: No module named 'ConfigParser'
```

ConfigParser是python2.x的一个参数parse模块,但是python3.x已经是用小写了```configparser```,加上自己的linux环境主要用的是python3.5,所以断定这个pycompile还是用的python2.x

```bash
whereis pycompile
# /usr/bin/pycompile
mv /usr/bin/pycompile /usr/bin/pycompile.backup
ln -s /usr/bin/py3compile /usr/bin/pycompile
```

用命令```whereis```查看了一下pycompile的路径,然后在该目录下找到了3.x版本的,果断备份,添加新的软链。 查看全部


pycompile出现异常,找不到模块ConfigParser,期初以为是自己没有安装,后来用pip安装尝试安装,已经安装了

```
Traceback (most recent call last):
File "/usr/bin/pycompile", line 35, in
from debpython.version import SUPPORTED, debsorted, vrepr, \
File "/usr/share/python/debpython/version.py", line 24, in
from ConfigParser import SafeConfigParser
ImportError: No module named 'ConfigParser'
```

ConfigParser是python2.x的一个参数parse模块,但是python3.x已经是用小写了```configparser```,加上自己的linux环境主要用的是python3.5,所以断定这个pycompile还是用的python2.x

```bash
whereis pycompile
# /usr/bin/pycompile
mv /usr/bin/pycompile /usr/bin/pycompile.backup
ln -s /usr/bin/py3compile /usr/bin/pycompile
```

用命令```whereis```查看了一下pycompile的路径,然后在该目录下找到了3.x版本的,果断备份,添加新的软链。

How to deal with the CryptographyDeprecationWarning in python fabric?

Python网站管理员 Published the article • 0 comments • 3757 views • 2019-01-28 15:57 • 来自相关话题

```text
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fut
ure version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
m.add_string(self.Q_C.public_numbers().encode_point())
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a fu
ture version. Please use EllipticCurvePublicKey.from_encoded_point
self.curve, Q_S_bytes
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fu
ture version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
hm.add_string(self.Q_C.public_numbers().encode_point())
```

finding the keywork in the source code, you will see class CryptographyDeprecationWarning. invoked by function ```_verify_openssl_version```.

```
"OpenSSL version 1.0.1 is no longer supported by the OpenSSL "
"project, please upgrade. A future version of cryptography will "
"drop support for it.",
``` 查看全部
```text
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fut
ure version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
m.add_string(self.Q_C.public_numbers().encode_point())
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a fu
ture version. Please use EllipticCurvePublicKey.from_encoded_point
self.curve, Q_S_bytes
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fu
ture version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
hm.add_string(self.Q_C.public_numbers().encode_point())
```

finding the keywork in the source code, you will see class CryptographyDeprecationWarning. invoked by function ```_verify_openssl_version```.

```
"OpenSSL version 1.0.1 is no longer supported by the OpenSSL "
"project, please upgrade. A future version of cryptography will "
"drop support for it.",
```

python解决唱吧歌词解密的问题?

Python网站管理员 Published the article • 0 comments • 279 views • 2018-11-22 16:14 • 来自相关话题

做唱吧歌词解密的时候选择了语言python,对于字节解码的时候用到了chr函数,但是chr函数参数限制在0 ~ 0xff(255),如果需要chr的值出现负数怎么办呢?我记得php用chr函数的时候支持负数,于是翻阅了一下php的源码,发现php做了一次按位与操作 c &= c,这样一来就不会再出现错误:

```python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(-165)
Traceback (most recent call last):
File "", line 1, in
ValueError: chr() arg not in range(256)
>>> chr(-165 & 0xff)
'['
>>>
```

python解决唱吧歌词解密的问题?

```python
# -*-- coding:utf-8 -*--
import re
import os


class ChangBaDecrypt(object):
encrypt_key = [-50, -45, 110, 105, 64, 90, 97, 119, 94, 50, 116, 71, 81, 54, -91, -68, ]

def __init__(self):
pass

def decrypt(self, content):
decrypt_content = bytearray()
for i in range(0, len(content)):
var = content[i] ^ self.encrypt_key[i % 16]
decrypt_content.append(var & 0xff)
return decrypt_content.decode('utf-8')

def decrypt_by_file(self, filename):
with open(filename, 'rb') as f:
content = f.read()
f.close()
decrypt = self.decrypt(content)
if re.match(r'\[\d+,\d+\]', decrypt):
return decrypt


changba = ChangBaDecrypt()
decrypt = changba.decrypt_by_file(os.path.join(os.path.curdir, '../tests/data/a89f8523a6724a915c6b2038c928b342.zrce'))
print(decrypt)
``` 查看全部
做唱吧歌词解密的时候选择了语言python,对于字节解码的时候用到了chr函数,但是chr函数参数限制在0 ~ 0xff(255),如果需要chr的值出现负数怎么办呢?我记得php用chr函数的时候支持负数,于是翻阅了一下php的源码,发现php做了一次按位与操作 c &= c,这样一来就不会再出现错误:

```python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(-165)
Traceback (most recent call last):
File "", line 1, in
ValueError: chr() arg not in range(256)
>>> chr(-165 & 0xff)
'['
>>>
```

python解决唱吧歌词解密的问题?

```python
# -*-- coding:utf-8 -*--
import re
import os


class ChangBaDecrypt(object):
encrypt_key = [-50, -45, 110, 105, 64, 90, 97, 119, 94, 50, 116, 71, 81, 54, -91, -68, ]

def __init__(self):
pass

def decrypt(self, content):
decrypt_content = bytearray()
for i in range(0, len(content)):
var = content[i] ^ self.encrypt_key[i % 16]
decrypt_content.append(var & 0xff)
return decrypt_content.decode('utf-8')

def decrypt_by_file(self, filename):
with open(filename, 'rb') as f:
content = f.read()
f.close()
decrypt = self.decrypt(content)
if re.match(r'\[\d+,\d+\]', decrypt):
return decrypt


changba = ChangBaDecrypt()
decrypt = changba.decrypt_by_file(os.path.join(os.path.curdir, '../tests/data/a89f8523a6724a915c6b2038c928b342.zrce'))
print(decrypt)
```

python解决window的进程powershell.exe占用cpu使用率过高

Python网站管理员 Published the article • 0 comments • 3460 views • 2018-11-13 18:38 • 来自相关话题

### 背景

window的进程powershell.exe占用cpu使用率过高,时常导致计算机卡顿,于是写了一个python的脚本监测这个脚本,当cpu使用到达了40%以上,直接终结这个进程。

### 依赖
这里用的库是```psutil```,这是一个跨平台的进程和系统工具,能够方便我们对cpu和内存等监测

```bash
pip install pipenv
pipenv install psutil
```

### 解决方法

脚本为循环任务,间隔为10s

```python
import psutil
import time

while True:
for proc in psutil.process_iter(attrs=['pid', 'name']):
if proc.name() == 'powershell.exe':
cpu_percent = proc.cpu_percent()
print('current cpu percent: %s' % str(cpu_percent))
if cpu_percent > 40:
proc.terminate()
print('powershell.exe has been terminate')

time.sleep(5)
``` 查看全部
### 背景

window的进程powershell.exe占用cpu使用率过高,时常导致计算机卡顿,于是写了一个python的脚本监测这个脚本,当cpu使用到达了40%以上,直接终结这个进程。

### 依赖
这里用的库是```psutil```,这是一个跨平台的进程和系统工具,能够方便我们对cpu和内存等监测

```bash
pip install pipenv
pipenv install psutil
```

### 解决方法

脚本为循环任务,间隔为10s

```python
import psutil
import time

while True:
for proc in psutil.process_iter(attrs=['pid', 'name']):
if proc.name() == 'powershell.exe':
cpu_percent = proc.cpu_percent()
print('current cpu percent: %s' % str(cpu_percent))
if cpu_percent > 40:
proc.terminate()
print('powershell.exe has been terminate')

time.sleep(5)
```

Python基础知识:快速了解字典的增删改查以及自定义不可变字典

Python网站管理员 Published the article • 0 comments • 341 views • 2018-11-06 10:08 • 来自相关话题

字典在很多的高级语言中是很常见的,java中的hashmap,php中的键值对的数组,python中的是dict,它是一个可变的容器模型,可以存储任意的数据结构,但是容器中的每个元素都是以键值对的形式存在的,形如key => value,python中是用冒号分隔,每个键值对用逗号分隔,所有的键值对用大括号包围起来。当然字典中还可以包含其他的字典,层级深度可以任意。有点儿像json,如果不了解python中的字典和json之间的转换可以看看这篇文章。

```python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'};
>>> dict['Name']
'Zara'
>>> dict['notExist']
Traceback (most recent call last):
File "", line 1, in
KeyError: 'notExist'
>>> del dict['Age']
>>> dict
{'Name': 'Zara', 'Class': 'First'}
>>> dict.clear()
>>> dict
{}
>>> if 'Age' in dict:
... print True
... else:
... print False
...
False
```

那么如何遍历一个字典的对象呢?这里有几种方法,第一种是我们直接for循环,拿到key,那么值就是字典的索引key;第二种就是字典有个方法是items,我们可以遍历这个items的返回值。千万要注意字典不是元组,是不会直接返回key,value这样的结果的。

```python
>>> for key, value in dict:
... print('%s => %s' % (key, value))
...
Traceback (most recent call last):
File "", line 1, in
ValueError: too many values to unpack
>>> for key, value in dict.items():
... print('%s => %s' % (key, value))
...
Beth => 9102
Alice => 2341
Cecil => 3258
>>> for key in dict:
... print('%s => %s' % (key, dict[key]))
...
Beth => 9102
Alice => 2341
Cecil => 3258
```

要注意一点的是如果这个key不在字典内,我们直接使用的话会出现异常KeyError,我们需要预先判断这个key是否在字典内,有两种方法,has_key(key)和key in dict来判断这个key是否在字典内,不过第二种方法更容易接受点,因为它更加的接近我们的语言。del这个key的所以会把元素从字典中直接删除,调用clear()方法可以直接把字典清空,然后就剩下一个空的字典{}。

有的时候我们需要一个不可变的字典,只能通过实例化的时候初始化这个字典,不能添加、更新、和删除。这个时候我们需要用到模块collections.MultiMapping了,集成类MultiMapping,改写它的方法,将__setitem__、__delitem__直接抛出异常,这样我们就得到了一个不可变的字典了。 查看全部
字典在很多的高级语言中是很常见的,java中的hashmap,php中的键值对的数组,python中的是dict,它是一个可变的容器模型,可以存储任意的数据结构,但是容器中的每个元素都是以键值对的形式存在的,形如key => value,python中是用冒号分隔,每个键值对用逗号分隔,所有的键值对用大括号包围起来。当然字典中还可以包含其他的字典,层级深度可以任意。有点儿像json,如果不了解python中的字典和json之间的转换可以看看这篇文章。

```python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'};
>>> dict['Name']
'Zara'
>>> dict['notExist']
Traceback (most recent call last):
File "", line 1, in
KeyError: 'notExist'
>>> del dict['Age']
>>> dict
{'Name': 'Zara', 'Class': 'First'}
>>> dict.clear()
>>> dict
{}
>>> if 'Age' in dict:
... print True
... else:
... print False
...
False
```

那么如何遍历一个字典的对象呢?这里有几种方法,第一种是我们直接for循环,拿到key,那么值就是字典的索引key;第二种就是字典有个方法是items,我们可以遍历这个items的返回值。千万要注意字典不是元组,是不会直接返回key,value这样的结果的。

```python
>>> for key, value in dict:
... print('%s => %s' % (key, value))
...
Traceback (most recent call last):
File "", line 1, in
ValueError: too many values to unpack
>>> for key, value in dict.items():
... print('%s => %s' % (key, value))
...
Beth => 9102
Alice => 2341
Cecil => 3258
>>> for key in dict:
... print('%s => %s' % (key, dict[key]))
...
Beth => 9102
Alice => 2341
Cecil => 3258
```

要注意一点的是如果这个key不在字典内,我们直接使用的话会出现异常KeyError,我们需要预先判断这个key是否在字典内,有两种方法,has_key(key)和key in dict来判断这个key是否在字典内,不过第二种方法更容易接受点,因为它更加的接近我们的语言。del这个key的所以会把元素从字典中直接删除,调用clear()方法可以直接把字典清空,然后就剩下一个空的字典{}。

有的时候我们需要一个不可变的字典,只能通过实例化的时候初始化这个字典,不能添加、更新、和删除。这个时候我们需要用到模块collections.MultiMapping了,集成类MultiMapping,改写它的方法,将__setitem__、__delitem__直接抛出异常,这样我们就得到了一个不可变的字典了。

如果你还在用xrang的话,可能low爆了,我来介绍一下itertools

Python网站管理员 Published the article • 0 comments • 254 views • 2018-10-26 16:20 • 来自相关话题

itertools是python内置的高效好用的迭代模块,迭代器有一个特色就是惰性求值,也就是只有当这个值被使用的时候才会计算,不需要预先在内存中生成这些数据,所以在遍历大文件、或者无限集合数组的时候,它的优势就格外的突出。

itertools的迭代器函数有三种类型

1. 无限迭代器:可以生成一个无限的序列,比如等差、等比、自然数都可以。
2. 有限迭代器:可以接收一个或者多个序列,然后组合、过滤、或者分组。
3. 组合生成器:序列的排列、组合,求序列的笛卡儿积等。

无限迭代器有三个函数,count(firstval=0, step=1)、cycle(iterable)、repeat(object [,times])。
```bash
>>> import itertools
>>>
>>> dir(itertools)
['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee']
```

count的使用方法,firstval是开始数值,step(默认为1)是步长。比如开始值为5,step为2,那么生成的无限序列就是:5,7,9,11...,是一个开始值为5,差值为2的等差无限序列,你不需要担心没有更多的内存来存储这些数据,他们只有在使用的时候才会计算,如果你只需要5-20的数值,只需要用if判断一下,然后break就可以了。

cycle则是反复循环数值

```python
from __future__ import print_function
import itertools
cycle_strings = itertools.cycle('ABC')
i = 1
for string in cycle_strings:
if i == 10:
break
print('%d => %s' % (i, string), end=' ')
i += 1
```

### repeat反复生成一个对象
```python
from __future__ import print_function
import itertools
for item in itertools.repeat('hello world', 3):
print(item)
```

### itertools chain
```bash
>>> from __future__ import print_function
>>> import itertools
>>> for item in itertools.chain([1, 2, 3], ['a', 'b', 'c']):
... print(item, end=' ')
...
1 2 3 a b c
```


### 答疑:
为什么需要from __future__ import print_function?
- 由于我们需要不换行答应数据内容,所以需要在文件第一行引入这个hack,然后添加一个参数end,使其等于空格,完成我们不换行,空格相间输出。 查看全部

itertools是python内置的高效好用的迭代模块,迭代器有一个特色就是惰性求值,也就是只有当这个值被使用的时候才会计算,不需要预先在内存中生成这些数据,所以在遍历大文件、或者无限集合数组的时候,它的优势就格外的突出。

itertools的迭代器函数有三种类型

1. 无限迭代器:可以生成一个无限的序列,比如等差、等比、自然数都可以。
2. 有限迭代器:可以接收一个或者多个序列,然后组合、过滤、或者分组。
3. 组合生成器:序列的排列、组合,求序列的笛卡儿积等。

无限迭代器有三个函数,count(firstval=0, step=1)、cycle(iterable)、repeat(object [,times])。
```bash
>>> import itertools
>>>
>>> dir(itertools)
['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee']
```

count的使用方法,firstval是开始数值,step(默认为1)是步长。比如开始值为5,step为2,那么生成的无限序列就是:5,7,9,11...,是一个开始值为5,差值为2的等差无限序列,你不需要担心没有更多的内存来存储这些数据,他们只有在使用的时候才会计算,如果你只需要5-20的数值,只需要用if判断一下,然后break就可以了。

cycle则是反复循环数值

```python
from __future__ import print_function
import itertools
cycle_strings = itertools.cycle('ABC')
i = 1
for string in cycle_strings:
if i == 10:
break
print('%d => %s' % (i, string), end=' ')
i += 1
```

### repeat反复生成一个对象
```python
from __future__ import print_function
import itertools
for item in itertools.repeat('hello world', 3):
print(item)
```

### itertools chain
```bash
>>> from __future__ import print_function
>>> import itertools
>>> for item in itertools.chain([1, 2, 3], ['a', 'b', 'c']):
... print(item, end=' ')
...
1 2 3 a b c
```


### 答疑:
为什么需要from __future__ import print_function?
- 由于我们需要不换行答应数据内容,所以需要在文件第一行引入这个hack,然后添加一个参数end,使其等于空格,完成我们不换行,空格相间输出。

scrapy爬虫的正确使用姿势,从入门安装开始(一)

Python网站管理员 Published the article • 0 comments • 692 views • 2018-08-24 17:40 • 来自相关话题

### scrapy 官方介绍

```
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
```

1. 这是一个开源协作的框架,用于从网站中提取你需要的数据,并且,快速,简单,可扩展。
2. 总所周知,这是一个很强大的爬虫框架,能够帮助我们从众多的网页中提取出我们需要的数据,且快速易于扩展。
3. 它是由```python```编写的,所以完成编码后,我们可以运行在```Linux```、```Windows```、```Mac``` 和 ```BSD```

#### 官方数据

- 24k stars, 6k forks and 1.6k watchers on GitHub
- 4.0k followers on Twitter
- 8.7k questions on StackOverflow

#### 版本要求

- Python 2.7 or Python 3.4+
- Works on Linux, Windows, Mac OSX, BSD

#### 快速安装

如果不知道怎么安装```pip```,可以查看这篇文章[《如何快速了解pip编译安装python的包管理工具pip?》](http://www.sourcedev.cc/article/131)

```
pip install scrapy
```

### 创建项目

```bash
root@ubuntu:/# scrapy startproject -h
Usage
=====
scrapy startproject [project_dir]

Create new project

Options
=======
--help, -h show this help message and exit

Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
root@ubuntu:/# scrapy startproject helloDemo
New Scrapy project 'helloDemo', using template directory '/usr/local/lib/python3.5/dist-packages/scrapy/templates/project', created in:
/helloDemo

You can start your first spider with:
cd helloDemo
scrapy genspider example example.com
root@ubuntu:/# cd helloDemo/
root@ubuntu:/helloDemo# ls
helloDemo scrapy.cfg

root@ubuntu:/helloDemo# scrapy crawl spider baidu
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_MODULES': ['helloDemo.spiders'], 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'helloDemo'}
Usage
=====
scrapy crawl [options]

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported
root@ubuntu:/helloDemo# scrapy crawl baidu
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'helloDemo', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'SPIDER_MODULES': ['helloDemo.spiders']}
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-24 17:33:07 [scrapy.core.engine] INFO: Spider opened
2018-08-24 17:33:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-24 17:33:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to from
2018-08-24 17:33:07 [scrapy.core.engine] DEBUG: Crawled (200) (referer: None)
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt:
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Closing spider (finished)
2018-08-24 17:33:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
'downloader/request_bytes': 443,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 1125,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 8, 24, 9, 33, 8, 11376),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'memusage/max': 52117504,
'memusage/startup': 52117504,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 8, 24, 9, 33, 7, 430751)}
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Spider closed (finished)
```

#### 代码目录结构
```base
root@ubuntu:/helloDemo# tree
.
├── helloDemo
│   ├── __init__.py
│   ├── items.py # 实体,数据结构
│   ├── middlewares.py # 爬虫的中间件
│   ├── pipelines.py # 管道,数据的存储
│   ├── __pycache__
│   │   ├── __init__.cpython-35.pyc
│   │   └── settings.cpython-35.pyc
│   ├── settings.py # 全局设置
│   └── spiders # 爬虫蜘蛛项目
│   ├── baidu.py # 上面创建的baidu爬虫的项目
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-35.pyc
└── scrapy.cfg
```

spiders/baidu.py是我们需要我们处理数据的地方,response是抓取时返回的整个html DOM结构
```python
# -*- coding: utf-8 -*-
import scrapy


class BaiduSpider(scrapy.Spider):
name = 'baidu'
allowed_domains = ['wwww.baidu.com']
start_urls = ['http://wwww.baidu.com/']

def parse(self, response):
pass
```

### 后面的文章我会继续介绍scrapy的用法


### 参考资源

1. [scrapy官网地址](https://scrapy.org)
2. [scrapy官方文档](https://docs.scrapy.org/en/latest/)
3. [github开源项目源码](https://github.com/scrapy/scrapy)
4. [pip的安装](http://www.sourcedev.cc/article/131) 查看全部

### scrapy 官方介绍

```
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
```

1. 这是一个开源协作的框架,用于从网站中提取你需要的数据,并且,快速,简单,可扩展。
2. 总所周知,这是一个很强大的爬虫框架,能够帮助我们从众多的网页中提取出我们需要的数据,且快速易于扩展。
3. 它是由```python```编写的,所以完成编码后,我们可以运行在```Linux```、```Windows```、```Mac``` 和 ```BSD```

#### 官方数据

- 24k stars, 6k forks and 1.6k watchers on GitHub
- 4.0k followers on Twitter
- 8.7k questions on StackOverflow

#### 版本要求

- Python 2.7 or Python 3.4+
- Works on Linux, Windows, Mac OSX, BSD

#### 快速安装

如果不知道怎么安装```pip```,可以查看这篇文章[《如何快速了解pip编译安装python的包管理工具pip?》](http://www.sourcedev.cc/article/131)

```
pip install scrapy
```

### 创建项目

```bash
root@ubuntu:/# scrapy startproject -h
Usage
=====
scrapy startproject [project_dir]

Create new project

Options
=======
--help, -h show this help message and exit

Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
root@ubuntu:/# scrapy startproject helloDemo
New Scrapy project 'helloDemo', using template directory '/usr/local/lib/python3.5/dist-packages/scrapy/templates/project', created in:
/helloDemo

You can start your first spider with:
cd helloDemo
scrapy genspider example example.com
root@ubuntu:/# cd helloDemo/
root@ubuntu:/helloDemo# ls
helloDemo scrapy.cfg

root@ubuntu:/helloDemo# scrapy crawl spider baidu
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_MODULES': ['helloDemo.spiders'], 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'helloDemo'}
Usage
=====
scrapy crawl [options]

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported
root@ubuntu:/helloDemo# scrapy crawl baidu
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'helloDemo', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'SPIDER_MODULES': ['helloDemo.spiders']}
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-24 17:33:07 [scrapy.core.engine] INFO: Spider opened
2018-08-24 17:33:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-24 17:33:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to http://www.baidu.com/robots.txt> from http://wwww.baidu.com/robots.txt>
2018-08-24 17:33:07 [scrapy.core.engine] DEBUG: Crawled (200) http://www.baidu.com/robots.txt> (referer: None)
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: http://wwww.baidu.com/>
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Closing spider (finished)
2018-08-24 17:33:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
'downloader/request_bytes': 443,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 1125,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 8, 24, 9, 33, 8, 11376),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'memusage/max': 52117504,
'memusage/startup': 52117504,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 8, 24, 9, 33, 7, 430751)}
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Spider closed (finished)
```

#### 代码目录结构
```base
root@ubuntu:/helloDemo# tree
.
├── helloDemo
│   ├── __init__.py
│   ├── items.py # 实体,数据结构
│   ├── middlewares.py # 爬虫的中间件
│   ├── pipelines.py # 管道,数据的存储
│   ├── __pycache__
│   │   ├── __init__.cpython-35.pyc
│   │   └── settings.cpython-35.pyc
│   ├── settings.py # 全局设置
│   └── spiders # 爬虫蜘蛛项目
│   ├── baidu.py # 上面创建的baidu爬虫的项目
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-35.pyc
└── scrapy.cfg
```

spiders/baidu.py是我们需要我们处理数据的地方,response是抓取时返回的整个html DOM结构
```python
# -*- coding: utf-8 -*-
import scrapy


class BaiduSpider(scrapy.Spider):
name = 'baidu'
allowed_domains = ['wwww.baidu.com']
start_urls = ['http://wwww.baidu.com/']

def parse(self, response):
pass
```

### 后面的文章我会继续介绍scrapy的用法


### 参考资源

1. [scrapy官网地址](https://scrapy.org)
2. [scrapy官方文档](https://docs.scrapy.org/en/latest/)
3. [github开源项目源码](https://github.com/scrapy/scrapy)
4. [pip的安装](http://www.sourcedev.cc/article/131)

如何快速了解pip编译安装python的包管理工具pip?

Python网站管理员 Published the article • 0 comments • 345 views • 2018-08-24 17:30 • 来自相关话题

### 背景介绍

```pip```是python的一个包管理工具,如果我们想为我们的python项目导入一些三方的库,其实是很方便的

### 安装

```bash
pip install SomePackage

// 例如 for example
pip install SomePackage-1.0-py2.py3-none-any.whl
```
没错就是这么的方便,只需要```pip install```即可,

### python pypi管理

```https://pypi.org/search/?q=docker```,在之前我们如果想要查找其他的项目或者依赖是比较麻烦,因为这个是官方最近才退出来的项目,项目搜索姿势,搜索表单提交你要搜索的关键词即可,
项目详情页列出了安装方式,简介,如何使用,文档,以及作者,开源项目地址,还有关注热度。所以开发python的项目变得越来越方便,可以搜索到更多的项目,
以及学习到更多的优秀项目。

### 安装pip

获取源码
```bash
wget https://bootstrap.pypa.io/get-pip.py

#or

curl https://bootstrap.pypa.io/get-pip.py -o setup-pip.py

# install
python setup-pip.py

# check install and version
root@ubuntu:/# pip -V
pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
```
没错安装就是这么的方便

### pip常用命令介绍
```
root@ubuntu:/# pip -v

Usage:
pip [options]

Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.

```

- pip install 安装包
- pip download 下载包
- pip uninstall 卸载包
- pip list 查看已经下载安装的包
- pip config 查看配置
- pip search 搜索资源包
- pip help 查看命令

### pip国内镜像资源,提升你的下载和安装速度

pip安装资源是一件比较频繁的操作,但是这些资源如果需要从国外下载下来的话十分的缓慢,而且极容易出错,所以我们需要配置为国内的镜像地址

#### 国内资源

新版ubuntu要求使用https源,要注意。
- 清华:https://pypi.tuna.tsinghua.edu.cn/simple
- 阿里云:http://mirrors.aliyun.com/pypi/simple/
- 中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
- 华中理工大学:http://pypi.hustunique.com/
- 山东理工大学:http://pypi.sdutlinux.org/
- 豆瓣:http://pypi.douban.com/simple/

##### 临时安装

```bash
pip install -i http://pypi.douban.com/simple scrapy
```

#### 修改配置,永久安装

```bash
vim ~/.pip/pip.conf

[global]
timeout = 60
index-url = https://pypi.douban.com/simple
```
修改index-url即可

### requirement.txt的使用

```requirement.txt```里面是放置了这个项目所以依赖的所有的依赖,一个列表,一个依赖一行。
如:
scrapy
pymysql
...
pip如何执行批量安装,如何批量安装requirement.txt
```bash
pip install -r requirement.txt

# usage
pip install -h

# -r, --requirement Install from the given requirements file. This option can be used multiple times.
#
```

### 总结

- 随着项目越来越大,我们就需要一个统一的工具来管理我们的项目模块和三方库,```pip```是不二之选
- 善用工具,善于查找,```google```, linux 命令加```-h or help```,或者```man```命令,都可以帮我们很快找到我们想要的说明
- 当然前提是我们的英语至少可以阅读理解

# 参考网址

1. [pip官方地址](https://pip.pypa.io/en/stable/installing/)
2. [pypi](https://pypi.org/search)
3. [国内镜像资源](http://www.cnblogs.com/microman/p/6107879.html) 查看全部

### 背景介绍

```pip```是python的一个包管理工具,如果我们想为我们的python项目导入一些三方的库,其实是很方便的

### 安装

```bash
pip install SomePackage

// 例如 for example
pip install SomePackage-1.0-py2.py3-none-any.whl
```
没错就是这么的方便,只需要```pip install```即可,

### python pypi管理

```https://pypi.org/search/?q=docker```,在之前我们如果想要查找其他的项目或者依赖是比较麻烦,因为这个是官方最近才退出来的项目,项目搜索姿势,搜索表单提交你要搜索的关键词即可,
项目详情页列出了安装方式,简介,如何使用,文档,以及作者,开源项目地址,还有关注热度。所以开发python的项目变得越来越方便,可以搜索到更多的项目,
以及学习到更多的优秀项目。

### 安装pip

获取源码
```bash
wget https://bootstrap.pypa.io/get-pip.py

#or

curl https://bootstrap.pypa.io/get-pip.py -o setup-pip.py

# install
python setup-pip.py

# check install and version
root@ubuntu:/# pip -V
pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
```
没错安装就是这么的方便

### pip常用命令介绍
```
root@ubuntu:/# pip -v

Usage:
pip [options]

Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.

```

- pip install 安装包
- pip download 下载包
- pip uninstall 卸载包
- pip list 查看已经下载安装的包
- pip config 查看配置
- pip search 搜索资源包
- pip help 查看命令

### pip国内镜像资源,提升你的下载和安装速度

pip安装资源是一件比较频繁的操作,但是这些资源如果需要从国外下载下来的话十分的缓慢,而且极容易出错,所以我们需要配置为国内的镜像地址

#### 国内资源

新版ubuntu要求使用https源,要注意。
- 清华:https://pypi.tuna.tsinghua.edu.cn/simple
- 阿里云:http://mirrors.aliyun.com/pypi/simple/
- 中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
- 华中理工大学:http://pypi.hustunique.com/
- 山东理工大学:http://pypi.sdutlinux.org/
- 豆瓣:http://pypi.douban.com/simple/

##### 临时安装

```bash
pip install -i http://pypi.douban.com/simple scrapy
```

#### 修改配置,永久安装

```bash
vim ~/.pip/pip.conf

[global]
timeout = 60
index-url = https://pypi.douban.com/simple
```
修改index-url即可

### requirement.txt的使用

```requirement.txt```里面是放置了这个项目所以依赖的所有的依赖,一个列表,一个依赖一行。
如:
scrapy
pymysql
...
pip如何执行批量安装,如何批量安装requirement.txt
```bash
pip install -r requirement.txt

# usage
pip install -h

# -r, --requirement Install from the given requirements file. This option can be used multiple times.
#
```

### 总结

- 随着项目越来越大,我们就需要一个统一的工具来管理我们的项目模块和三方库,```pip```是不二之选
- 善用工具,善于查找,```google```, linux 命令加```-h or help```,或者```man```命令,都可以帮我们很快找到我们想要的说明
- 当然前提是我们的英语至少可以阅读理解

# 参考网址

1. [pip官方地址](https://pip.pypa.io/en/stable/installing/)
2. [pypi](https://pypi.org/search)
3. [国内镜像资源](http://www.cnblogs.com/microman/p/6107879.html)

scrapy+php爬虫网站的配置

PHP网站管理员 Published the article • 0 comments • 319 views • 2018-02-23 19:58 • 来自相关话题

### 环境配置需求

1. 需要python配置的环境,python3.6
下载地址:
`https://www.python.org/ftp/pyt ... 4.zip`
安装好后,可以需要配置系统的环境:
`https://www.cnblogs.com/dangeal/p/5455005.html`
2. php网站,thinkphp
可参考:
`https://jingyan.baidu.com/arti ... .html`
3. python依赖先安装pip:
`http://blog.csdn.net/nomey_mr/ ... 95984`

#### pip 是python的依赖管理工具,可以安装python所需要的功能扩展

#### cmd命令行执行如下
```
pip install scrapy
pip install pymysql
```

#### 爬虫启动
```
cd SpiderArticle的目录
python run.py
```

### 数据库的配置,爬虫的配置

> SpiderArticle/settings.py 最下面几行

```python
MYSQL_HOST = '127.0.0.1'
MYSQL_DBNAME = 'simple_cms'
MYSQL_USER = 'root'
MYSQL_PASSWD = '123123'
```

4. 导入数据库,phpmyadmin等工具导入
doc/database.sql

5. php网站的数据库配置

application/database.php 需要配置你设置的数据库还有账号密码

```php 查看全部
### 环境配置需求

1. 需要python配置的环境,python3.6
下载地址:
`https://www.python.org/ftp/pyt ... 4.zip`
安装好后,可以需要配置系统的环境:
`https://www.cnblogs.com/dangeal/p/5455005.html`
2. php网站,thinkphp
可参考:
`https://jingyan.baidu.com/arti ... .html`
3. python依赖先安装pip:
`http://blog.csdn.net/nomey_mr/ ... 95984`

#### pip 是python的依赖管理工具,可以安装python所需要的功能扩展

#### cmd命令行执行如下
```
pip install scrapy
pip install pymysql
```

#### 爬虫启动
```
cd SpiderArticle的目录
python run.py
```

### 数据库的配置,爬虫的配置

> SpiderArticle/settings.py 最下面几行

```python
MYSQL_HOST = '127.0.0.1'
MYSQL_DBNAME = 'simple_cms'
MYSQL_USER = 'root'
MYSQL_PASSWD = '123123'
```

4. 导入数据库,phpmyadmin等工具导入
doc/database.sql

5. php网站的数据库配置

application/database.php 需要配置你设置的数据库还有账号密码

```php
return [
// 服务器地址
'hostname' => '127.0.0.1',
// 数据库名
'database' => 'simple_cms',
// 用户名
'username' => 'root',
// 密码
'password' => '123123',
];
```

python目录文件迭代器yield所有文件

Python网站管理员 Published the article • 0 comments • 300 views • 2018-02-23 18:11 • 来自相关话题

### python中递归使用关键词`yield`

```
yield anthing;
```

### python脚本
```python
# -*-- coding:utf-8 -*--

import os

def get_recursive_file_list( path, base_name):
current_files = os.listdir(path)
for file_name in current_files:
full_file_name = os.path.join(path, file_name)
file_name = full_file_name.replace(base_name, '')

if os.path.isdir(full_file_name):
next_level_files = get_recursive_file_list(full_file_name, base_name)
for files in next_level_files:
yield files
else:
yield file_name

a = (get_recursive_file_list('d://test', 'd://test'))
for item in a:
print(item)
``` 查看全部
### python中递归使用关键词`yield`

```
yield anthing;
```

### python脚本
```python
# -*-- coding:utf-8 -*--

import os

def get_recursive_file_list( path, base_name):
current_files = os.listdir(path)
for file_name in current_files:
full_file_name = os.path.join(path, file_name)
file_name = full_file_name.replace(base_name, '')

if os.path.isdir(full_file_name):
next_level_files = get_recursive_file_list(full_file_name, base_name)
for files in next_level_files:
yield files
else:
yield file_name

a = (get_recursive_file_list('d://test', 'd://test'))
for item in a:
print(item)
```

解决Ubuntu下,/usr/bin/pycompile无法找到模块ConfigParser

Python网站管理员 Published the article • 0 comments • 360 views • 2019-04-17 14:52 • 来自相关话题

pycompile出现异常,找不到模块ConfigParser,期初以为是自己没有安装,后来用pip安装尝试安装,已经安装了

```
Traceback (most recent call last):
File "/usr/bin/pycompile", line 35, in
from debpython.version import SUPPORTED, debsorted, vrepr, \
File "/usr/share/python/debpython/version.py", line 24, in
from ConfigParser import SafeConfigParser
ImportError: No module named 'ConfigParser'
```

ConfigParser是python2.x的一个参数parse模块,但是python3.x已经是用小写了```configparser```,加上自己的linux环境主要用的是python3.5,所以断定这个pycompile还是用的python2.x

```bash
whereis pycompile
# /usr/bin/pycompile
mv /usr/bin/pycompile /usr/bin/pycompile.backup
ln -s /usr/bin/py3compile /usr/bin/pycompile
```

用命令```whereis```查看了一下pycompile的路径,然后在该目录下找到了3.x版本的,果断备份,添加新的软链。 查看全部


pycompile出现异常,找不到模块ConfigParser,期初以为是自己没有安装,后来用pip安装尝试安装,已经安装了

```
Traceback (most recent call last):
File "/usr/bin/pycompile", line 35, in
from debpython.version import SUPPORTED, debsorted, vrepr, \
File "/usr/share/python/debpython/version.py", line 24, in
from ConfigParser import SafeConfigParser
ImportError: No module named 'ConfigParser'
```

ConfigParser是python2.x的一个参数parse模块,但是python3.x已经是用小写了```configparser```,加上自己的linux环境主要用的是python3.5,所以断定这个pycompile还是用的python2.x

```bash
whereis pycompile
# /usr/bin/pycompile
mv /usr/bin/pycompile /usr/bin/pycompile.backup
ln -s /usr/bin/py3compile /usr/bin/pycompile
```

用命令```whereis```查看了一下pycompile的路径,然后在该目录下找到了3.x版本的,果断备份,添加新的软链。

How to deal with the CryptographyDeprecationWarning in python fabric?

Python网站管理员 Published the article • 0 comments • 3757 views • 2019-01-28 15:57 • 来自相关话题

```text
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fut
ure version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
m.add_string(self.Q_C.public_numbers().encode_point())
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a fu
ture version. Please use EllipticCurvePublicKey.from_encoded_point
self.curve, Q_S_bytes
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fu
ture version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
hm.add_string(self.Q_C.public_numbers().encode_point())
```

finding the keywork in the source code, you will see class CryptographyDeprecationWarning. invoked by function ```_verify_openssl_version```.

```
"OpenSSL version 1.0.1 is no longer supported by the OpenSSL "
"project, please upgrade. A future version of cryptography will "
"drop support for it.",
``` 查看全部
```text
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fut
ure version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
m.add_string(self.Q_C.public_numbers().encode_point())
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a fu
ture version. Please use EllipticCurvePublicKey.from_encoded_point
self.curve, Q_S_bytes
c:\users\administrator\.virtualenvs\spiderworker-dgts38t8\lib\site-packages\paramiko\kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a fu
ture version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
hm.add_string(self.Q_C.public_numbers().encode_point())
```

finding the keywork in the source code, you will see class CryptographyDeprecationWarning. invoked by function ```_verify_openssl_version```.

```
"OpenSSL version 1.0.1 is no longer supported by the OpenSSL "
"project, please upgrade. A future version of cryptography will "
"drop support for it.",
```

python解决唱吧歌词解密的问题?

Python网站管理员 Published the article • 0 comments • 279 views • 2018-11-22 16:14 • 来自相关话题

做唱吧歌词解密的时候选择了语言python,对于字节解码的时候用到了chr函数,但是chr函数参数限制在0 ~ 0xff(255),如果需要chr的值出现负数怎么办呢?我记得php用chr函数的时候支持负数,于是翻阅了一下php的源码,发现php做了一次按位与操作 c &= c,这样一来就不会再出现错误:

```python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(-165)
Traceback (most recent call last):
File "", line 1, in
ValueError: chr() arg not in range(256)
>>> chr(-165 & 0xff)
'['
>>>
```

python解决唱吧歌词解密的问题?

```python
# -*-- coding:utf-8 -*--
import re
import os


class ChangBaDecrypt(object):
encrypt_key = [-50, -45, 110, 105, 64, 90, 97, 119, 94, 50, 116, 71, 81, 54, -91, -68, ]

def __init__(self):
pass

def decrypt(self, content):
decrypt_content = bytearray()
for i in range(0, len(content)):
var = content[i] ^ self.encrypt_key[i % 16]
decrypt_content.append(var & 0xff)
return decrypt_content.decode('utf-8')

def decrypt_by_file(self, filename):
with open(filename, 'rb') as f:
content = f.read()
f.close()
decrypt = self.decrypt(content)
if re.match(r'\[\d+,\d+\]', decrypt):
return decrypt


changba = ChangBaDecrypt()
decrypt = changba.decrypt_by_file(os.path.join(os.path.curdir, '../tests/data/a89f8523a6724a915c6b2038c928b342.zrce'))
print(decrypt)
``` 查看全部
做唱吧歌词解密的时候选择了语言python,对于字节解码的时候用到了chr函数,但是chr函数参数限制在0 ~ 0xff(255),如果需要chr的值出现负数怎么办呢?我记得php用chr函数的时候支持负数,于是翻阅了一下php的源码,发现php做了一次按位与操作 c &= c,这样一来就不会再出现错误:

```python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> chr(-165)
Traceback (most recent call last):
File "", line 1, in
ValueError: chr() arg not in range(256)
>>> chr(-165 & 0xff)
'['
>>>
```

python解决唱吧歌词解密的问题?

```python
# -*-- coding:utf-8 -*--
import re
import os


class ChangBaDecrypt(object):
encrypt_key = [-50, -45, 110, 105, 64, 90, 97, 119, 94, 50, 116, 71, 81, 54, -91, -68, ]

def __init__(self):
pass

def decrypt(self, content):
decrypt_content = bytearray()
for i in range(0, len(content)):
var = content[i] ^ self.encrypt_key[i % 16]
decrypt_content.append(var & 0xff)
return decrypt_content.decode('utf-8')

def decrypt_by_file(self, filename):
with open(filename, 'rb') as f:
content = f.read()
f.close()
decrypt = self.decrypt(content)
if re.match(r'\[\d+,\d+\]', decrypt):
return decrypt


changba = ChangBaDecrypt()
decrypt = changba.decrypt_by_file(os.path.join(os.path.curdir, '../tests/data/a89f8523a6724a915c6b2038c928b342.zrce'))
print(decrypt)
```

python解决window的进程powershell.exe占用cpu使用率过高

Python网站管理员 Published the article • 0 comments • 3460 views • 2018-11-13 18:38 • 来自相关话题

### 背景

window的进程powershell.exe占用cpu使用率过高,时常导致计算机卡顿,于是写了一个python的脚本监测这个脚本,当cpu使用到达了40%以上,直接终结这个进程。

### 依赖
这里用的库是```psutil```,这是一个跨平台的进程和系统工具,能够方便我们对cpu和内存等监测

```bash
pip install pipenv
pipenv install psutil
```

### 解决方法

脚本为循环任务,间隔为10s

```python
import psutil
import time

while True:
for proc in psutil.process_iter(attrs=['pid', 'name']):
if proc.name() == 'powershell.exe':
cpu_percent = proc.cpu_percent()
print('current cpu percent: %s' % str(cpu_percent))
if cpu_percent > 40:
proc.terminate()
print('powershell.exe has been terminate')

time.sleep(5)
``` 查看全部
### 背景

window的进程powershell.exe占用cpu使用率过高,时常导致计算机卡顿,于是写了一个python的脚本监测这个脚本,当cpu使用到达了40%以上,直接终结这个进程。

### 依赖
这里用的库是```psutil```,这是一个跨平台的进程和系统工具,能够方便我们对cpu和内存等监测

```bash
pip install pipenv
pipenv install psutil
```

### 解决方法

脚本为循环任务,间隔为10s

```python
import psutil
import time

while True:
for proc in psutil.process_iter(attrs=['pid', 'name']):
if proc.name() == 'powershell.exe':
cpu_percent = proc.cpu_percent()
print('current cpu percent: %s' % str(cpu_percent))
if cpu_percent > 40:
proc.terminate()
print('powershell.exe has been terminate')

time.sleep(5)
```

Python基础知识:快速了解字典的增删改查以及自定义不可变字典

Python网站管理员 Published the article • 0 comments • 341 views • 2018-11-06 10:08 • 来自相关话题

字典在很多的高级语言中是很常见的,java中的hashmap,php中的键值对的数组,python中的是dict,它是一个可变的容器模型,可以存储任意的数据结构,但是容器中的每个元素都是以键值对的形式存在的,形如key => value,python中是用冒号分隔,每个键值对用逗号分隔,所有的键值对用大括号包围起来。当然字典中还可以包含其他的字典,层级深度可以任意。有点儿像json,如果不了解python中的字典和json之间的转换可以看看这篇文章。

```python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'};
>>> dict['Name']
'Zara'
>>> dict['notExist']
Traceback (most recent call last):
File "", line 1, in
KeyError: 'notExist'
>>> del dict['Age']
>>> dict
{'Name': 'Zara', 'Class': 'First'}
>>> dict.clear()
>>> dict
{}
>>> if 'Age' in dict:
... print True
... else:
... print False
...
False
```

那么如何遍历一个字典的对象呢?这里有几种方法,第一种是我们直接for循环,拿到key,那么值就是字典的索引key;第二种就是字典有个方法是items,我们可以遍历这个items的返回值。千万要注意字典不是元组,是不会直接返回key,value这样的结果的。

```python
>>> for key, value in dict:
... print('%s => %s' % (key, value))
...
Traceback (most recent call last):
File "", line 1, in
ValueError: too many values to unpack
>>> for key, value in dict.items():
... print('%s => %s' % (key, value))
...
Beth => 9102
Alice => 2341
Cecil => 3258
>>> for key in dict:
... print('%s => %s' % (key, dict[key]))
...
Beth => 9102
Alice => 2341
Cecil => 3258
```

要注意一点的是如果这个key不在字典内,我们直接使用的话会出现异常KeyError,我们需要预先判断这个key是否在字典内,有两种方法,has_key(key)和key in dict来判断这个key是否在字典内,不过第二种方法更容易接受点,因为它更加的接近我们的语言。del这个key的所以会把元素从字典中直接删除,调用clear()方法可以直接把字典清空,然后就剩下一个空的字典{}。

有的时候我们需要一个不可变的字典,只能通过实例化的时候初始化这个字典,不能添加、更新、和删除。这个时候我们需要用到模块collections.MultiMapping了,集成类MultiMapping,改写它的方法,将__setitem__、__delitem__直接抛出异常,这样我们就得到了一个不可变的字典了。 查看全部
字典在很多的高级语言中是很常见的,java中的hashmap,php中的键值对的数组,python中的是dict,它是一个可变的容器模型,可以存储任意的数据结构,但是容器中的每个元素都是以键值对的形式存在的,形如key => value,python中是用冒号分隔,每个键值对用逗号分隔,所有的键值对用大括号包围起来。当然字典中还可以包含其他的字典,层级深度可以任意。有点儿像json,如果不了解python中的字典和json之间的转换可以看看这篇文章。

```python
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'};
>>> dict['Name']
'Zara'
>>> dict['notExist']
Traceback (most recent call last):
File "", line 1, in
KeyError: 'notExist'
>>> del dict['Age']
>>> dict
{'Name': 'Zara', 'Class': 'First'}
>>> dict.clear()
>>> dict
{}
>>> if 'Age' in dict:
... print True
... else:
... print False
...
False
```

那么如何遍历一个字典的对象呢?这里有几种方法,第一种是我们直接for循环,拿到key,那么值就是字典的索引key;第二种就是字典有个方法是items,我们可以遍历这个items的返回值。千万要注意字典不是元组,是不会直接返回key,value这样的结果的。

```python
>>> for key, value in dict:
... print('%s => %s' % (key, value))
...
Traceback (most recent call last):
File "", line 1, in
ValueError: too many values to unpack
>>> for key, value in dict.items():
... print('%s => %s' % (key, value))
...
Beth => 9102
Alice => 2341
Cecil => 3258
>>> for key in dict:
... print('%s => %s' % (key, dict[key]))
...
Beth => 9102
Alice => 2341
Cecil => 3258
```

要注意一点的是如果这个key不在字典内,我们直接使用的话会出现异常KeyError,我们需要预先判断这个key是否在字典内,有两种方法,has_key(key)和key in dict来判断这个key是否在字典内,不过第二种方法更容易接受点,因为它更加的接近我们的语言。del这个key的所以会把元素从字典中直接删除,调用clear()方法可以直接把字典清空,然后就剩下一个空的字典{}。

有的时候我们需要一个不可变的字典,只能通过实例化的时候初始化这个字典,不能添加、更新、和删除。这个时候我们需要用到模块collections.MultiMapping了,集成类MultiMapping,改写它的方法,将__setitem__、__delitem__直接抛出异常,这样我们就得到了一个不可变的字典了。

如果你还在用xrang的话,可能low爆了,我来介绍一下itertools

Python网站管理员 Published the article • 0 comments • 254 views • 2018-10-26 16:20 • 来自相关话题

itertools是python内置的高效好用的迭代模块,迭代器有一个特色就是惰性求值,也就是只有当这个值被使用的时候才会计算,不需要预先在内存中生成这些数据,所以在遍历大文件、或者无限集合数组的时候,它的优势就格外的突出。

itertools的迭代器函数有三种类型

1. 无限迭代器:可以生成一个无限的序列,比如等差、等比、自然数都可以。
2. 有限迭代器:可以接收一个或者多个序列,然后组合、过滤、或者分组。
3. 组合生成器:序列的排列、组合,求序列的笛卡儿积等。

无限迭代器有三个函数,count(firstval=0, step=1)、cycle(iterable)、repeat(object [,times])。
```bash
>>> import itertools
>>>
>>> dir(itertools)
['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee']
```

count的使用方法,firstval是开始数值,step(默认为1)是步长。比如开始值为5,step为2,那么生成的无限序列就是:5,7,9,11...,是一个开始值为5,差值为2的等差无限序列,你不需要担心没有更多的内存来存储这些数据,他们只有在使用的时候才会计算,如果你只需要5-20的数值,只需要用if判断一下,然后break就可以了。

cycle则是反复循环数值

```python
from __future__ import print_function
import itertools
cycle_strings = itertools.cycle('ABC')
i = 1
for string in cycle_strings:
if i == 10:
break
print('%d => %s' % (i, string), end=' ')
i += 1
```

### repeat反复生成一个对象
```python
from __future__ import print_function
import itertools
for item in itertools.repeat('hello world', 3):
print(item)
```

### itertools chain
```bash
>>> from __future__ import print_function
>>> import itertools
>>> for item in itertools.chain([1, 2, 3], ['a', 'b', 'c']):
... print(item, end=' ')
...
1 2 3 a b c
```


### 答疑:
为什么需要from __future__ import print_function?
- 由于我们需要不换行答应数据内容,所以需要在文件第一行引入这个hack,然后添加一个参数end,使其等于空格,完成我们不换行,空格相间输出。 查看全部

itertools是python内置的高效好用的迭代模块,迭代器有一个特色就是惰性求值,也就是只有当这个值被使用的时候才会计算,不需要预先在内存中生成这些数据,所以在遍历大文件、或者无限集合数组的时候,它的优势就格外的突出。

itertools的迭代器函数有三种类型

1. 无限迭代器:可以生成一个无限的序列,比如等差、等比、自然数都可以。
2. 有限迭代器:可以接收一个或者多个序列,然后组合、过滤、或者分组。
3. 组合生成器:序列的排列、组合,求序列的笛卡儿积等。

无限迭代器有三个函数,count(firstval=0, step=1)、cycle(iterable)、repeat(object [,times])。
```bash
>>> import itertools
>>>
>>> dir(itertools)
['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee']
```

count的使用方法,firstval是开始数值,step(默认为1)是步长。比如开始值为5,step为2,那么生成的无限序列就是:5,7,9,11...,是一个开始值为5,差值为2的等差无限序列,你不需要担心没有更多的内存来存储这些数据,他们只有在使用的时候才会计算,如果你只需要5-20的数值,只需要用if判断一下,然后break就可以了。

cycle则是反复循环数值

```python
from __future__ import print_function
import itertools
cycle_strings = itertools.cycle('ABC')
i = 1
for string in cycle_strings:
if i == 10:
break
print('%d => %s' % (i, string), end=' ')
i += 1
```

### repeat反复生成一个对象
```python
from __future__ import print_function
import itertools
for item in itertools.repeat('hello world', 3):
print(item)
```

### itertools chain
```bash
>>> from __future__ import print_function
>>> import itertools
>>> for item in itertools.chain([1, 2, 3], ['a', 'b', 'c']):
... print(item, end=' ')
...
1 2 3 a b c
```


### 答疑:
为什么需要from __future__ import print_function?
- 由于我们需要不换行答应数据内容,所以需要在文件第一行引入这个hack,然后添加一个参数end,使其等于空格,完成我们不换行,空格相间输出。

scrapy爬虫的正确使用姿势,从入门安装开始(一)

Python网站管理员 Published the article • 0 comments • 692 views • 2018-08-24 17:40 • 来自相关话题

### scrapy 官方介绍

```
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
```

1. 这是一个开源协作的框架,用于从网站中提取你需要的数据,并且,快速,简单,可扩展。
2. 总所周知,这是一个很强大的爬虫框架,能够帮助我们从众多的网页中提取出我们需要的数据,且快速易于扩展。
3. 它是由```python```编写的,所以完成编码后,我们可以运行在```Linux```、```Windows```、```Mac``` 和 ```BSD```

#### 官方数据

- 24k stars, 6k forks and 1.6k watchers on GitHub
- 4.0k followers on Twitter
- 8.7k questions on StackOverflow

#### 版本要求

- Python 2.7 or Python 3.4+
- Works on Linux, Windows, Mac OSX, BSD

#### 快速安装

如果不知道怎么安装```pip```,可以查看这篇文章[《如何快速了解pip编译安装python的包管理工具pip?》](http://www.sourcedev.cc/article/131)

```
pip install scrapy
```

### 创建项目

```bash
root@ubuntu:/# scrapy startproject -h
Usage
=====
scrapy startproject [project_dir]

Create new project

Options
=======
--help, -h show this help message and exit

Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
root@ubuntu:/# scrapy startproject helloDemo
New Scrapy project 'helloDemo', using template directory '/usr/local/lib/python3.5/dist-packages/scrapy/templates/project', created in:
/helloDemo

You can start your first spider with:
cd helloDemo
scrapy genspider example example.com
root@ubuntu:/# cd helloDemo/
root@ubuntu:/helloDemo# ls
helloDemo scrapy.cfg

root@ubuntu:/helloDemo# scrapy crawl spider baidu
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_MODULES': ['helloDemo.spiders'], 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'helloDemo'}
Usage
=====
scrapy crawl [options]

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported
root@ubuntu:/helloDemo# scrapy crawl baidu
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'helloDemo', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'SPIDER_MODULES': ['helloDemo.spiders']}
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-24 17:33:07 [scrapy.core.engine] INFO: Spider opened
2018-08-24 17:33:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-24 17:33:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to from
2018-08-24 17:33:07 [scrapy.core.engine] DEBUG: Crawled (200) (referer: None)
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt:
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Closing spider (finished)
2018-08-24 17:33:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
'downloader/request_bytes': 443,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 1125,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 8, 24, 9, 33, 8, 11376),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'memusage/max': 52117504,
'memusage/startup': 52117504,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 8, 24, 9, 33, 7, 430751)}
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Spider closed (finished)
```

#### 代码目录结构
```base
root@ubuntu:/helloDemo# tree
.
├── helloDemo
│   ├── __init__.py
│   ├── items.py # 实体,数据结构
│   ├── middlewares.py # 爬虫的中间件
│   ├── pipelines.py # 管道,数据的存储
│   ├── __pycache__
│   │   ├── __init__.cpython-35.pyc
│   │   └── settings.cpython-35.pyc
│   ├── settings.py # 全局设置
│   └── spiders # 爬虫蜘蛛项目
│   ├── baidu.py # 上面创建的baidu爬虫的项目
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-35.pyc
└── scrapy.cfg
```

spiders/baidu.py是我们需要我们处理数据的地方,response是抓取时返回的整个html DOM结构
```python
# -*- coding: utf-8 -*-
import scrapy


class BaiduSpider(scrapy.Spider):
name = 'baidu'
allowed_domains = ['wwww.baidu.com']
start_urls = ['http://wwww.baidu.com/']

def parse(self, response):
pass
```

### 后面的文章我会继续介绍scrapy的用法


### 参考资源

1. [scrapy官网地址](https://scrapy.org)
2. [scrapy官方文档](https://docs.scrapy.org/en/latest/)
3. [github开源项目源码](https://github.com/scrapy/scrapy)
4. [pip的安装](http://www.sourcedev.cc/article/131) 查看全部

### scrapy 官方介绍

```
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
```

1. 这是一个开源协作的框架,用于从网站中提取你需要的数据,并且,快速,简单,可扩展。
2. 总所周知,这是一个很强大的爬虫框架,能够帮助我们从众多的网页中提取出我们需要的数据,且快速易于扩展。
3. 它是由```python```编写的,所以完成编码后,我们可以运行在```Linux```、```Windows```、```Mac``` 和 ```BSD```

#### 官方数据

- 24k stars, 6k forks and 1.6k watchers on GitHub
- 4.0k followers on Twitter
- 8.7k questions on StackOverflow

#### 版本要求

- Python 2.7 or Python 3.4+
- Works on Linux, Windows, Mac OSX, BSD

#### 快速安装

如果不知道怎么安装```pip```,可以查看这篇文章[《如何快速了解pip编译安装python的包管理工具pip?》](http://www.sourcedev.cc/article/131)

```
pip install scrapy
```

### 创建项目

```bash
root@ubuntu:/# scrapy startproject -h
Usage
=====
scrapy startproject [project_dir]

Create new project

Options
=======
--help, -h show this help message and exit

Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure
root@ubuntu:/# scrapy startproject helloDemo
New Scrapy project 'helloDemo', using template directory '/usr/local/lib/python3.5/dist-packages/scrapy/templates/project', created in:
/helloDemo

You can start your first spider with:
cd helloDemo
scrapy genspider example example.com
root@ubuntu:/# cd helloDemo/
root@ubuntu:/helloDemo# ls
helloDemo scrapy.cfg

root@ubuntu:/helloDemo# scrapy crawl spider baidu
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:00 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_MODULES': ['helloDemo.spiders'], 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'helloDemo'}
Usage
=====
scrapy crawl [options]

crawl: error: running 'scrapy crawl' with more than one spider is no longer supported
root@ubuntu:/helloDemo# scrapy crawl baidu
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: helloDemo)
2018-08-24 17:33:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'helloDemo', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'helloDemo.spiders', 'SPIDER_MODULES': ['helloDemo.spiders']}
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-24 17:33:07 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-24 17:33:07 [scrapy.core.engine] INFO: Spider opened
2018-08-24 17:33:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-24 17:33:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to http://www.baidu.com/robots.txt> from http://wwww.baidu.com/robots.txt>
2018-08-24 17:33:07 [scrapy.core.engine] DEBUG: Crawled (200) http://www.baidu.com/robots.txt> (referer: None)
2018-08-24 17:33:07 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: http://wwww.baidu.com/>
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Closing spider (finished)
2018-08-24 17:33:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
'downloader/request_bytes': 443,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 1125,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 8, 24, 9, 33, 8, 11376),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'memusage/max': 52117504,
'memusage/startup': 52117504,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 8, 24, 9, 33, 7, 430751)}
2018-08-24 17:33:08 [scrapy.core.engine] INFO: Spider closed (finished)
```

#### 代码目录结构
```base
root@ubuntu:/helloDemo# tree
.
├── helloDemo
│   ├── __init__.py
│   ├── items.py # 实体,数据结构
│   ├── middlewares.py # 爬虫的中间件
│   ├── pipelines.py # 管道,数据的存储
│   ├── __pycache__
│   │   ├── __init__.cpython-35.pyc
│   │   └── settings.cpython-35.pyc
│   ├── settings.py # 全局设置
│   └── spiders # 爬虫蜘蛛项目
│   ├── baidu.py # 上面创建的baidu爬虫的项目
│   ├── __init__.py
│   └── __pycache__
│   └── __init__.cpython-35.pyc
└── scrapy.cfg
```

spiders/baidu.py是我们需要我们处理数据的地方,response是抓取时返回的整个html DOM结构
```python
# -*- coding: utf-8 -*-
import scrapy


class BaiduSpider(scrapy.Spider):
name = 'baidu'
allowed_domains = ['wwww.baidu.com']
start_urls = ['http://wwww.baidu.com/']

def parse(self, response):
pass
```

### 后面的文章我会继续介绍scrapy的用法


### 参考资源

1. [scrapy官网地址](https://scrapy.org)
2. [scrapy官方文档](https://docs.scrapy.org/en/latest/)
3. [github开源项目源码](https://github.com/scrapy/scrapy)
4. [pip的安装](http://www.sourcedev.cc/article/131)

如何快速了解pip编译安装python的包管理工具pip?

Python网站管理员 Published the article • 0 comments • 345 views • 2018-08-24 17:30 • 来自相关话题

### 背景介绍

```pip```是python的一个包管理工具,如果我们想为我们的python项目导入一些三方的库,其实是很方便的

### 安装

```bash
pip install SomePackage

// 例如 for example
pip install SomePackage-1.0-py2.py3-none-any.whl
```
没错就是这么的方便,只需要```pip install```即可,

### python pypi管理

```https://pypi.org/search/?q=docker```,在之前我们如果想要查找其他的项目或者依赖是比较麻烦,因为这个是官方最近才退出来的项目,项目搜索姿势,搜索表单提交你要搜索的关键词即可,
项目详情页列出了安装方式,简介,如何使用,文档,以及作者,开源项目地址,还有关注热度。所以开发python的项目变得越来越方便,可以搜索到更多的项目,
以及学习到更多的优秀项目。

### 安装pip

获取源码
```bash
wget https://bootstrap.pypa.io/get-pip.py

#or

curl https://bootstrap.pypa.io/get-pip.py -o setup-pip.py

# install
python setup-pip.py

# check install and version
root@ubuntu:/# pip -V
pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
```
没错安装就是这么的方便

### pip常用命令介绍
```
root@ubuntu:/# pip -v

Usage:
pip [options]

Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.

```

- pip install 安装包
- pip download 下载包
- pip uninstall 卸载包
- pip list 查看已经下载安装的包
- pip config 查看配置
- pip search 搜索资源包
- pip help 查看命令

### pip国内镜像资源,提升你的下载和安装速度

pip安装资源是一件比较频繁的操作,但是这些资源如果需要从国外下载下来的话十分的缓慢,而且极容易出错,所以我们需要配置为国内的镜像地址

#### 国内资源

新版ubuntu要求使用https源,要注意。
- 清华:https://pypi.tuna.tsinghua.edu.cn/simple
- 阿里云:http://mirrors.aliyun.com/pypi/simple/
- 中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
- 华中理工大学:http://pypi.hustunique.com/
- 山东理工大学:http://pypi.sdutlinux.org/
- 豆瓣:http://pypi.douban.com/simple/

##### 临时安装

```bash
pip install -i http://pypi.douban.com/simple scrapy
```

#### 修改配置,永久安装

```bash
vim ~/.pip/pip.conf

[global]
timeout = 60
index-url = https://pypi.douban.com/simple
```
修改index-url即可

### requirement.txt的使用

```requirement.txt```里面是放置了这个项目所以依赖的所有的依赖,一个列表,一个依赖一行。
如:
scrapy
pymysql
...
pip如何执行批量安装,如何批量安装requirement.txt
```bash
pip install -r requirement.txt

# usage
pip install -h

# -r, --requirement Install from the given requirements file. This option can be used multiple times.
#
```

### 总结

- 随着项目越来越大,我们就需要一个统一的工具来管理我们的项目模块和三方库,```pip```是不二之选
- 善用工具,善于查找,```google```, linux 命令加```-h or help```,或者```man```命令,都可以帮我们很快找到我们想要的说明
- 当然前提是我们的英语至少可以阅读理解

# 参考网址

1. [pip官方地址](https://pip.pypa.io/en/stable/installing/)
2. [pypi](https://pypi.org/search)
3. [国内镜像资源](http://www.cnblogs.com/microman/p/6107879.html) 查看全部

### 背景介绍

```pip```是python的一个包管理工具,如果我们想为我们的python项目导入一些三方的库,其实是很方便的

### 安装

```bash
pip install SomePackage

// 例如 for example
pip install SomePackage-1.0-py2.py3-none-any.whl
```
没错就是这么的方便,只需要```pip install```即可,

### python pypi管理

```https://pypi.org/search/?q=docker```,在之前我们如果想要查找其他的项目或者依赖是比较麻烦,因为这个是官方最近才退出来的项目,项目搜索姿势,搜索表单提交你要搜索的关键词即可,
项目详情页列出了安装方式,简介,如何使用,文档,以及作者,开源项目地址,还有关注热度。所以开发python的项目变得越来越方便,可以搜索到更多的项目,
以及学习到更多的优秀项目。

### 安装pip

获取源码
```bash
wget https://bootstrap.pypa.io/get-pip.py

#or

curl https://bootstrap.pypa.io/get-pip.py -o setup-pip.py

# install
python setup-pip.py

# check install and version
root@ubuntu:/# pip -V
pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
```
没错安装就是这么的方便

### pip常用命令介绍
```
root@ubuntu:/# pip -v

Usage:
pip [options]

Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.

```

- pip install 安装包
- pip download 下载包
- pip uninstall 卸载包
- pip list 查看已经下载安装的包
- pip config 查看配置
- pip search 搜索资源包
- pip help 查看命令

### pip国内镜像资源,提升你的下载和安装速度

pip安装资源是一件比较频繁的操作,但是这些资源如果需要从国外下载下来的话十分的缓慢,而且极容易出错,所以我们需要配置为国内的镜像地址

#### 国内资源

新版ubuntu要求使用https源,要注意。
- 清华:https://pypi.tuna.tsinghua.edu.cn/simple
- 阿里云:http://mirrors.aliyun.com/pypi/simple/
- 中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
- 华中理工大学:http://pypi.hustunique.com/
- 山东理工大学:http://pypi.sdutlinux.org/
- 豆瓣:http://pypi.douban.com/simple/

##### 临时安装

```bash
pip install -i http://pypi.douban.com/simple scrapy
```

#### 修改配置,永久安装

```bash
vim ~/.pip/pip.conf

[global]
timeout = 60
index-url = https://pypi.douban.com/simple
```
修改index-url即可

### requirement.txt的使用

```requirement.txt```里面是放置了这个项目所以依赖的所有的依赖,一个列表,一个依赖一行。
如:
scrapy
pymysql
...
pip如何执行批量安装,如何批量安装requirement.txt
```bash
pip install -r requirement.txt

# usage
pip install -h

# -r, --requirement Install from the given requirements file. This option can be used multiple times.
#
```

### 总结

- 随着项目越来越大,我们就需要一个统一的工具来管理我们的项目模块和三方库,```pip```是不二之选
- 善用工具,善于查找,```google```, linux 命令加```-h or help```,或者```man```命令,都可以帮我们很快找到我们想要的说明
- 当然前提是我们的英语至少可以阅读理解

# 参考网址

1. [pip官方地址](https://pip.pypa.io/en/stable/installing/)
2. [pypi](https://pypi.org/search)
3. [国内镜像资源](http://www.cnblogs.com/microman/p/6107879.html)

scrapy+php爬虫网站的配置

PHP网站管理员 Published the article • 0 comments • 319 views • 2018-02-23 19:58 • 来自相关话题

### 环境配置需求

1. 需要python配置的环境,python3.6
下载地址:
`https://www.python.org/ftp/pyt ... 4.zip`
安装好后,可以需要配置系统的环境:
`https://www.cnblogs.com/dangeal/p/5455005.html`
2. php网站,thinkphp
可参考:
`https://jingyan.baidu.com/arti ... .html`
3. python依赖先安装pip:
`http://blog.csdn.net/nomey_mr/ ... 95984`

#### pip 是python的依赖管理工具,可以安装python所需要的功能扩展

#### cmd命令行执行如下
```
pip install scrapy
pip install pymysql
```

#### 爬虫启动
```
cd SpiderArticle的目录
python run.py
```

### 数据库的配置,爬虫的配置

> SpiderArticle/settings.py 最下面几行

```python
MYSQL_HOST = '127.0.0.1'
MYSQL_DBNAME = 'simple_cms'
MYSQL_USER = 'root'
MYSQL_PASSWD = '123123'
```

4. 导入数据库,phpmyadmin等工具导入
doc/database.sql

5. php网站的数据库配置

application/database.php 需要配置你设置的数据库还有账号密码

```php 查看全部
### 环境配置需求

1. 需要python配置的环境,python3.6
下载地址:
`https://www.python.org/ftp/pyt ... 4.zip`
安装好后,可以需要配置系统的环境:
`https://www.cnblogs.com/dangeal/p/5455005.html`
2. php网站,thinkphp
可参考:
`https://jingyan.baidu.com/arti ... .html`
3. python依赖先安装pip:
`http://blog.csdn.net/nomey_mr/ ... 95984`

#### pip 是python的依赖管理工具,可以安装python所需要的功能扩展

#### cmd命令行执行如下
```
pip install scrapy
pip install pymysql
```

#### 爬虫启动
```
cd SpiderArticle的目录
python run.py
```

### 数据库的配置,爬虫的配置

> SpiderArticle/settings.py 最下面几行

```python
MYSQL_HOST = '127.0.0.1'
MYSQL_DBNAME = 'simple_cms'
MYSQL_USER = 'root'
MYSQL_PASSWD = '123123'
```

4. 导入数据库,phpmyadmin等工具导入
doc/database.sql

5. php网站的数据库配置

application/database.php 需要配置你设置的数据库还有账号密码

```php
return [
// 服务器地址
'hostname' => '127.0.0.1',
// 数据库名
'database' => 'simple_cms',
// 用户名
'username' => 'root',
// 密码
'password' => '123123',
];
```

python目录文件迭代器yield所有文件

Python网站管理员 Published the article • 0 comments • 300 views • 2018-02-23 18:11 • 来自相关话题

### python中递归使用关键词`yield`

```
yield anthing;
```

### python脚本
```python
# -*-- coding:utf-8 -*--

import os

def get_recursive_file_list( path, base_name):
current_files = os.listdir(path)
for file_name in current_files:
full_file_name = os.path.join(path, file_name)
file_name = full_file_name.replace(base_name, '')

if os.path.isdir(full_file_name):
next_level_files = get_recursive_file_list(full_file_name, base_name)
for files in next_level_files:
yield files
else:
yield file_name

a = (get_recursive_file_list('d://test', 'd://test'))
for item in a:
print(item)
``` 查看全部
### python中递归使用关键词`yield`

```
yield anthing;
```

### python脚本
```python
# -*-- coding:utf-8 -*--

import os

def get_recursive_file_list( path, base_name):
current_files = os.listdir(path)
for file_name in current_files:
full_file_name = os.path.join(path, file_name)
file_name = full_file_name.replace(base_name, '')

if os.path.isdir(full_file_name):
next_level_files = get_recursive_file_list(full_file_name, base_name)
for files in next_level_files:
yield files
else:
yield file_name

a = (get_recursive_file_list('d://test', 'd://test'))
for item in a:
print(item)
```