Air9

And it's turtles all the way down...

⇝

Air9

And it's turtles all the way down...

bootstrap-keyboard-tagIndex

2016-06-28

这两天在 Coursera 上学习 Bootstrap
第四周的作业是把非 JavaScript 实现的 Modal 改成用 JavaScript 实现(https://www.coursera.org/learn/web-frameworks/peer/jAXUU/assignment-4-detailed-instructions-and-submission)

按照 http://getbootstrap.com/javascript/#modals-methods 找到了解决方案:

<script>
$('#reserveButton').on('click', function () {
$('#reserveModal').modal('toggle');
});
</script>

作业是解决了,但是发现一个小问题:不管我怎么改keyboard属性,都不能用esc关闭modal

搜了一下得知:在modal所在div中加入tabindex="-1"属性即可
那么问题又来了…这个tabindex="-1"是什么鬼…

搜索加测试了一下之后终于搞清楚了
tabindex用来帮助键盘党使用tab键定位网页元素,按其值由小到大跳转,默认是 0
如果值相等,则按先后顺序跳转

在未加入tabindex="-1"时,modal和主页中所有元素值都是 0
当前键盘的焦点仍然在主页而非modal上,相当于在主页按esc,modal监听不到所以无任何反应
加入tabindex="-1"后,modal的优先级最高,键盘焦点到了modal上,这时再按esc就能正确触发关闭modal事件了

参考

  • http://blog.163.com/huan12_8/blog/static/1305190902011274739628/
  • https://segmentfault.com/q/1010000004954562
  • http://www.w3school.com.cn/tags/att_standard_tabindex.asp
  • http://www.mamicode.com/info-detail-494399.html
  • Coursera
  • JavaScript
  • bootstrap
  • keyboard
  • tagIndex
  • JavaScript

Show >>

Permission denied (publickey)

2016-06-16

今天把学习 flask 的本地库上传到 github 时报错

$ git push -u origin master
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

在 stackoverflow 找到解决方案

git remote set-url origin git@github.com:lut/EvolutionApp.git
git remote show origin

简单记录一下…

  • git
  • github
  • ssh
  • git

Show >>

Python-Cookbook-1.09 简化 translate 方法

2016-05-25

string.maketrans

Help on built-in function maketrans in module strop:
maketrans(...)
maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
(END)

生成一个供string.translate使用的 ASCII 表,其中frm中的所有字符都依序被替换成to中字符:

>>> maketrans('abc', 'fed')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`feddefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>>

string.translate

Help on function translate in module string:
translate(s, table, deletions='')
translate(s,table [,deletions]) -> string
Return a copy of the string s, where all characters occurring
in the optional argument deletions are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256. The
deletions argument is not allowed for Unicode strings.
(END)

(也可以s.translate(table, deletions='')为格式)

以maketrans生成的映射表为基准进行字符转换:

>>> translate('abcdef', a)
'feddef'
>>> translate('abcdef', a, 'd')
'fedef'
>>> translate('abcdef', a, 'dd')
'fedef'
>>> translate('abcdef', a, 'de')
'fedf'
>>> translate('abcdef', a, 'ade')
'edf'
>>>

自建一个返回闭包的工厂函数 translator

import string
def translator(frm='', to='', delete='', keep=None):
if len(to) == 1:
to = to * len(frm)
trans = string.maketrans(frm, to)
if keep is not None:
allchars = string.maketrans('', '')
delete = allchars.translate(allchars, keep.translate(allchars, delete))
def translate(s):
return s.translate(trans, delete)
return translate
if __name__ == '__main__':
digits_only = translator(keep=string.digits)
print digits_only('qwedwefaf24215')
no_digits = translator(delete=string.digits)
print no_digits('qwedwefaf24215')
  • Python
  • cookbook
  • closure
  • factory
  • string
  • Python

Show >>

Python-sorted函数中key的用法

2016-05-19

sorted函数的可用参数如下

sorted(iterable[, cmp[, key[, reverse]]])

其它几个还好理解,就是key的用法经常会忘记,所以记录一下备用

文档中说:

key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None (compare the elements directly).

我的理解是
key提供了一个函数,以iterable对象中的元素为唯一参数,返回一个与原元素一一对应的 key 值
然后再对以这些 key 值为元素的iterable进行排序
最后将这些 key 值替换回对应的原元素
排序完成

需要注意的就是False < True

然后是实例(参考https://segmentfault.com/q/1010000005111826/a-1020000005112829)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
s = 'aB23'
def sorted_with_key(s, key):
s = sorted(s, key=key)
print s
print 'keys: ',
print [key(x) for x in s]
print '\nstr.lower'
sorted_with_key(s, str.lower)
print '\nstr.islower'
sorted_with_key(s, str.islower)
print '\nlambda x: x.isdigit() and int(x) % 2 == 0'
sorted_with_key(s, lambda x: x.isdigit() and int(x) % 2 == 0)
print '\nlambda x: x.isdigit(), x.isdigit() and int(x) % 2==0, x.isupper(), x.islower(), x'
# 排序:小写-大写-奇数-偶数
sorted_with_key(s, lambda x: (x.isdigit(), x.isdigit() and int(x) %
2 == 0, x.isupper(), x.islower(), x))

output

str.lower
['2', '3', 'a', 'B']
keys: ['2', '3', 'a', 'b']
str.islower
['B', '2', '3', 'a']
keys: [False, False, False, True]
lambda x: x.isdigit() and int(x) % 2 == 0
['a', 'B', '3', '2']
keys: [False, False, False, True]
lambda x: (x.isdigit(), x.isdigit() and int(x) % 2==0, x.isupper(), x.islower(), x)
['a', 'B', '3', '2']
keys: [(False, False, False, True, 'a'), (False, False, True, False, 'B'), (True, False, False, False, '3'), (True, True, False, False, '2')]

2016.7.9 补充

今天看了 cookbook 5.2,Python 2.4 之前是不支持key的,书中提供了一个类似的思路,感觉对理解key的实现很有帮助,摘录如下:

def case_insensitive_sorted(string_list):
auxiliary_list = [(x.lower(), x) for x in string_list] # decorate
auxiliary_list.sort() # sort
return [x[1] for x in auxiliary_list] # undecorate
  • Python
  • sorted
  • key
  • Python

Show >>

Flask 数据库更新问题

2016-05-16

更新列

遇到的问题

学习到 8.4.6 测试登录时

(venv) $ python manage.py shell
>>> u = User(email='john@example.com', username='john', password='cat')
>>> db.session.add(u)
>>> db.session.commit()

报错说表中无 email 列

确认代码无误后,判断应该是没有成功更新 models.py 中新建的 email 列,简单尝试无果,决定重新看一遍相关内容加深理解,再着手解决

粗暴的更新方法

如果数据库表已经存在于数据库中, 那么 db.create_all() 不会重新创建或者更新这个表。如果修改模型后要把改动应用到现有的数据库中,这一特 性会带来不便。更新现有数据库表的粗暴方式是先删除旧表再重新创建:

>>> db.drop_all()
>>> db.create_all()

使用 Flask-Migrate 实现数据库迁移

更新表的更好方法是使用数据库迁移框架。源码版本控制工具可以跟踪源码文件的变化, 类似地,数据库迁移框架能跟踪数据库模式的变化,然后增量式的把变化应用到数据库中。

SQLAlchemy 的主力开发人员编写了一个迁移框架,称为 Alembic(https://alembic.readthedocs. org/en/latest/index.html) 。 除 了 直 接 使 用 Alembic 之 外, Flask 程 序 还 可 使 用 Flask-Migrate (http://flask-migrate.readthedocs.org/en/latest/)扩展。这个扩展对 Alembic 做了轻量级包装,并 集成到 Flask-Script 中,所有操作都通过 Flask-Script 命令完成。

安装与配置略过不提
几次尝试后,方才对几个主要命令的实际作用有了正确的理解

init

创建迁移仓库和脚本,并不会生成或更新数据库文件,migrations/versions/中为空,如果此时upgrade会生成一个只包含alembic_version表的数据库

$ sqlite3 data-dev.sqlite
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL
);
COMMIT;
sqlite>

migrate

检测对数据库的操作,生成迁移脚本保存到migrations/versions/中,用于数据库迁移

不过按官方文档所说,不一定能检测到所有对数据库的修改,所有需要自己对生成的迁移脚本进行检查,加上可能有遗漏的地方

upgrade

用于把上述迁移运用到数据库中,即至此才会真正对数据库进行更新

$ sqlite3 data-dev.sqlite
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL
);
INSERT INTO "alembic_version" VALUES('bb488872a057');
CREATE TABLE roles (
id INTEGER NOT NULL,
name VARCHAR(64),
PRIMARY KEY (id),
UNIQUE (name)
);
CREATE TABLE users (
id INTEGER NOT NULL,
email VARCHAR(64),
username VARCHAR(64),
role_id INTEGER,
password_hash VARCHAR(128),
PRIMARY KEY (id),
FOREIGN KEY(role_id) REFERENCES roles (id)
);
CREATE UNIQUE INDEX ix_users_email ON users (email);
CREATE UNIQUE INDEX ix_users_username ON users (username);
COMMIT;
sqlite>

管理员角色

遇到的问题

学习到 10.3.2后,发现以管理员邮箱注册的帐号并不能打开管理员级别的资料编辑器
手动查看数据库:

INSERT INTO "users" VALUES(1,'huamingrui@163.com','huamingrui',NULL,'pbkdf2:sha1:1000$wpqGMEz8$a3bf86fcb0be120a7510a8f702077eb2fdfa1980',1,NULL,NULL,NULL,'2016-05-17 14:17:48.930851','2016-05-17 14:19:00.735339');

发现role_id项为空,即角色没有被成功赋予

Role.insert_roles()

回去翻书,找到 9.3 最后的部分说到

在你阅读下一章之前,最好重新创建或者更新开发数据库,如此一来,那些在实现角色和 权限之前创建的用户账户就被赋予了角色。

然而实测发现,对于管理员用户,必须在注册之前就完成 9.1 最后的Role.insert_roles()步骤,才能成功为管理员邮箱用户赋予管理员角色

$ rm data-dev.sqlite
$ python manage.py db upgrade
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 02ccb3e6a553, empty message
$ sqlite3 data-dev.sqlite
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL
);
INSERT INTO "alembic_version" VALUES('02ccb3e6a553');
CREATE TABLE roles (
id INTEGER NOT NULL,
name VARCHAR(64),
"default" BOOLEAN,
permissions INTEGER,
PRIMARY KEY (id),
UNIQUE (name),
CHECK ("default" IN (0, 1))
);
CREATE TABLE users (
id INTEGER NOT NULL,
email VARCHAR(64),
username VARCHAR(64),
role_id INTEGER,
password_hash VARCHAR(128),
confirmed BOOLEAN,
name VARCHAR(64),
location VARCHAR(64),
about_me TEXT,
member_since DATETIME,
last_seen DATETIME,
PRIMARY KEY (id),
FOREIGN KEY(role_id) REFERENCES roles (id),
CHECK (confirmed IN (0, 1))
);
CREATE INDEX ix_roles_default ON roles ("default");
CREATE UNIQUE INDEX ix_users_email ON users (email);
CREATE UNIQUE INDEX ix_users_username ON users (username);
COMMIT;
sqlite> ^D
$ python manage.py shell
>>> Role.insert_roles()
>>> Role.query.all()
[<Role u'Moderator'>, <Role u'Administrator'>, <Role u'User'>]
>>> ^D
$ sqlite3 data-dev.sqlite
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL
);
INSERT INTO "alembic_version" VALUES('02ccb3e6a553');
CREATE TABLE roles (
id INTEGER NOT NULL,
name VARCHAR(64),
"default" BOOLEAN,
permissions INTEGER,
PRIMARY KEY (id),
UNIQUE (name),
CHECK ("default" IN (0, 1))
);
INSERT INTO "roles" VALUES(1,'Moderator',0,15);
INSERT INTO "roles" VALUES(2,'Administrator',0,255);
INSERT INTO "roles" VALUES(3,'User',1,7);
CREATE TABLE users (
id INTEGER NOT NULL,
email VARCHAR(64),
username VARCHAR(64),
role_id INTEGER,
password_hash VARCHAR(128),
confirmed BOOLEAN,
name VARCHAR(64),
location VARCHAR(64),
about_me TEXT,
member_since DATETIME,
last_seen DATETIME,
PRIMARY KEY (id),
FOREIGN KEY(role_id) REFERENCES roles (id),
CHECK (confirmed IN (0, 1))
);
CREATE INDEX ix_roles_default ON roles ("default");
CREATE UNIQUE INDEX ix_users_email ON users (email);
CREATE UNIQUE INDEX ix_users_username ON users (username);
COMMIT;
sqlite> ^D
$ python manage.py runserver

注册管理员邮箱

$ sqlite3 data-dev.sqlite
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE alembic_version (
version_num VARCHAR(32) NOT NULL
);
INSERT INTO "alembic_version" VALUES('02ccb3e6a553');
CREATE TABLE roles (
id INTEGER NOT NULL,
name VARCHAR(64),
"default" BOOLEAN,
permissions INTEGER,
PRIMARY KEY (id),
UNIQUE (name),
CHECK ("default" IN (0, 1))
);
INSERT INTO "roles" VALUES(1,'Moderator',0,15);
INSERT INTO "roles" VALUES(2,'Administrator',0,255);
INSERT INTO "roles" VALUES(3,'User',1,7);
CREATE TABLE users (
id INTEGER NOT NULL,
email VARCHAR(64),
username VARCHAR(64),
role_id INTEGER,
password_hash VARCHAR(128),
confirmed BOOLEAN,
name VARCHAR(64),
location VARCHAR(64),
about_me TEXT,
member_since DATETIME,
last_seen DATETIME,
PRIMARY KEY (id),
FOREIGN KEY(role_id) REFERENCES roles (id),
CHECK (confirmed IN (0, 1))
);
INSERT INTO "users" VALUES(1,'huamingrui@163.com','MrHua',2,'pbkdf2:sha1:1000$tSmBVC7j$6f3d994eb5b6b455347b56d3112a4cac26fc97e1',1,NULL,NULL,NULL,'2016-05-17 14:34:36.781740','2016-05-17 14:52:08.764862');
CREATE INDEX ix_roles_default ON roles ("default");
CREATE UNIQUE INDEX ix_users_email ON users (email);
CREATE UNIQUE INDEX ix_users_username ON users (username);
COMMIT;
sqlite>

  • Python
  • Flask-SQLAlchemy
  • Alembic
  • Flask-Migrate
  • Flask

Show >>

Spider-07-doctest

2016-04-08

知道创宇爬虫设计第七天: doctest

doctest是 Python 内建的模块,用于文档测试,正好可以拿来用于爬虫的自测功能

这个模块相对简单,直接贴代码

代码

...
def main():
''' Prepare and run the spider
Self-test:
>>> class Args(object):
... pass
...
>>> args = Args()
>>> args.url = 'www.baidu.com'
>>> args.depth = 1
>>> args.logfile = 'testself.log'
>>> args.loglevel = 4
>>> args.dbfile = 'testself.db'
>>> args.num_threads = 1
>>> args.key = ''
>>> set_logger(args.loglevel, args.logfile)
>>> logger.info(vars(args))
>>> spider = MySpider(args)
>>> spider.run()
'''
...
# Self test
if args.testself:
import doctest
print doctest.testmod()
return
...

另外其它部分也有少量修改,放在 https://github.com/answerrrrrrrrr/KnownsecSpider

参考

  • http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014319170285543a4d04751f8846908770660de849f285000
  • http://devdocs.io/python~2.7/library/doctest#doctest.DocTest
  • http://devdocs.io/python~2.7/library/argparse#argparse.Namespace
  • Python
  • Spider
  • doctest
  • Python

Show >>

Mac 安装 MySQL 并设置 utf-8

2016-04-07

下载安装

http://dev.mysql.com/downloads/mysql/5.6.html

关闭服务

系统偏好设置 - MySQL - Stop MySQL Server

环境变量

$ vim ~/.zshrc
...
# Add mysql
export PATH="$PATH":/usr/local/mysql/bin
...
$ source ~/.zshrc

设置 utf-8

$ sudo cp /usr/local/mysql/support-files/my-default.cnf /etc/my.cnf
$ sudo vim /etc/my.cnf
...
[client]
default-character-set = utf8
[mysqld]
default-storage-engine = INNODB
character-set-server = utf8
collation-server = utf8_general_ci
...

开启服务

系统偏好设置 - MySQL - Start MySQL Server

验证 utf-8

$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.29 MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like '%char%';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.29-osx10.8-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)
mysql>

mysql.connector

(venv3.5)$ pip3 install mysql-connector-python-rf

参考

  • http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014320107391860b39da6901ed41a296e574ed37104752000
  • http://blog.csdn.net/waleking/article/details/7620983
  • http://jingyan.baidu.com/article/48a42057e2b2b9a9242504a2.html
  • MySQL
  • UTF-8
  • Mac

Show >>

Spider-06-requests

2016-04-07

知道创宇爬虫设计第六天:requests

从实验室回学校之后,仅仅是换了个网络,却突然出现了乱码问题

折腾了半天编码无果,无意中发现把urllib2换成requests就好了
按照requests官方文档里的解释

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text.

难怪总在知乎看见人安利,确实是更好用一些啊。。。

具体用法也很简单方便
不过使用request.text返回结果时,标题依然会乱码

You can also access the response body as bytes, for non-text requests

改成使用request.content,果然一切正常了

代码

...
# request = urllib2.Request(url, headers=headers)
# result = urllib2.urlopen(request).read()
request = requests.get(url, headers=headers)
# result = request.text
result = request.content
...

参考

  • http://www.python-requests.org/en/master/
  • http://www.python-requests.org/en/master/user/quickstart/
  • http://blog.csdn.net/alpha5/article/details/24964009
  • Python
  • Spider
  • requests
  • Python

Show >>

Spider-05-Spider

2016-04-06

知道创宇爬虫设计第五天:Spider

准备工作完成的差不多了,今天尝试下把之前的模块都整合起来做一个初期版本

首先 What is the difference between web-crawling and web-scraping?
感觉其实这个答案比最佳答案更简洁明了

Web Crawling is what Google does - it goes around a website looking at links and building a database of the layout of that site and sites it links to

Web Scraping would be the progamatic analysis of a web page to load some data off of it, EG loading up BBC weather and ripping (scraping) the weather forcast off of it and placing it elsewhere or using it in another program.

我的理解就是web-crawling在于广度,web-scraping在于精度

BeautifulSoup

为了crawl,需要从页面提取出用于进一步爬取的 URL ,BeautifulSoup 正好能方便快捷地完成这个任务,上手也很简单,基本上看看官方文档就万事大吉了

FileHandler

在整合 logger 的时候发现一个问题,使用logging.config.fileConfig('logging.conf')的话,需要提前在配置文件里写定日志保存路径,为了配合参数设定其他路径,似乎(看了一下 FileHandler 的用法)只能额外添加一个了

...
# If logfile is not 'spider.log'
formatter = logging.Formatter(
'%(asctime)s - %(levelname)s - %(message)s')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
...

代码

目前爬虫已经基本能用,但偶尔还是会出现502,然后该往数据库里放些什么东西还有待考虑,另外也没有加上自测功能

myspider.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from threading import Thread
from Queue import Queue
from bs4 import BeautifulSoup
import urllib2
import argparse
import sqlite3
import logging
import logging.config
logging.config.fileConfig('logging.conf')
logger = logging.getLogger('spider')
levels = {
1: 'CRITICAL',
2: 'ERROR',
3: 'WARNING',
4: 'INFO',
5: 'DEBUG',
}
class MySqlite(object):
def __init__(self, dbfile):
try:
logger.warning("Open database %s" % dbfile)
self.conn = sqlite3.connect(dbfile)
except sqlite3.Error as e:
# print "Fail to connect %s: %s" % (dbfile, e) # e.args[0]
logger.error("Fail to connect %s: %s" % (dbfile, e))
return
self.cursor = self.conn.cursor()
def create(self, table):
try:
logger.warning("Create table %s if not exists" % table)
self.cursor.execute(
"CREATE TABLE IF NOT EXISTS %s (id INTEGER PRIMARY KEY \
AUTOINCREMENT, url VARCHAR(100), data VARCHAR(40))" % table)
self.conn.commit()
except sqlite3.Error as e:
logger.error("Fail to create %s: %s" % (table, e))
self.conn.rollback()
def insert(self, table, url, data):
try:
logger.warning(
"Insert (%s, %s) into table %s" % (url, data, table))
self.cursor.execute(
"INSERT INTO %s (url, data) VALUES ('%s', '%s')" %
(table, url, data))
self.conn.commit()
except sqlite3.Error as e:
logger.error(
"Fail to insert (%s, %s) into %s: %s" %
(url, data, table, e))
self.conn.rollback()
def close(self):
logger.info("Close database")
self.cursor.close()
self.conn.close()
class MyThreadPool(object):
def __init__(self, num_threads=10):
self.tasks = Queue(num_threads)
for i in xrange(1, num_threads+1):
# Initialize the pool with the number of num_threads
logger.info('Initialize thread %d' % i)
MyThread(self.tasks)
def add_task(self, func, *args, **kwargs):
self.tasks.put((func, args, kwargs))
logger.debug('Add task')
def wait_completion(self):
# Blocks until all items in the queue have been gotten and processed.
self.tasks.join()
logger.info('All tasks are done')
class MyThread(Thread):
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
# This must be set before start() is called. The entire Python program
# exits when no alive non-daemon threads are left.
self.daemon = True
self.start()
logger.debug('Thread started...')
def run(self):
while True:
# Block until an item is available.
func, args, kwargs = self.tasks.get()
try:
logger.warning('Thread is working...')
func(*args, **kwargs)
except Exception as e:
logger.error(e)
# Tells the queue that the processing on the task is complete.
self.tasks.task_done()
class MySpider(object):
def __init__(self, args):
''' Initialize the spider
'''
# Initialize args
self.url = args.url
self.depth = args.depth
self.logfile = args.logfile
self.dbfile = args.dbfile
self.num_threads = args.num_threads
self.key = args.key.lower()
self.selftest = args.selftest
# Store visited url
self.visited_urls = set()
# Initialize threadpool
self.threadpool = MyThreadPool(self.num_threads)
def run(self):
''' Run the spider
'''
if not self.url.startswith('http://'):
self.url = 'http://' + self.url
logger.critical('Start crawl on %s' % self.url)
self.threadpool.add_task(self.scrape, self.url, self.depth)
self.threadpool.wait_completion()
def scrape(self, url, depth):
''' Scrape the content of page
'''
# Open database with dbfile
db = MySqlite(self.dbfile)
# Create table with keyword
table = 'none' if not self.key else self.key
db.create(table)
# Avoid being recognized as robot
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/49.0.2623.110 Safari/537.36'
}
# Avoid repeat
if url in self.visited_urls:
logger.debug('%s had been crawled' % url)
return
else:
self.visited_urls.add(url)
logger.info('Crawling on %s' % url)
# Request with headers
try:
logger.warning('Open %s' % url)
request = urllib2.Request(url, headers=headers)
result = urllib2.urlopen(request).read()
except ValueError as e:
logger.error(e)
return
# Extract the title by BeautifulSoup
soup = BeautifulSoup(result, "lxml")
title = soup.title.string
logger.debug('title = %s' % title)
# Store url and title of the page with keyword into database
if self.key in result.lower():
table = 'none' if not self.key else self.key
db.insert(table, url, title)
logger.critical(
'KEYWORD:\'%s\' - URL:\'%s\' - TITLE:\'%s\' (DEPTH:%d)' %
(self.key, url, title, depth))
# Close database after modification
db.close()
# Go deeper into urls in result
self.crawl(soup, depth-1)
def crawl(self, soup, depth):
''' Crawl to new pages
'''
if depth > 0:
for link in soup.find_all('a'):
url = link.get('href')
# scrape new url
self.threadpool.add_task(self.scrape, url, depth)
# def stop(self):
# ''' Stop the spider
# '''
# logger.critical('OVER')
def args_parser():
''' Parse the args
'''
parser = argparse.ArgumentParser()
parser.add_argument(
'-u', '--url', dest='url', required=True,
help='specify the URL to start crawl'
)
parser.add_argument(
'-d', '--depth', dest='depth', default=1, type=int,
help='specify the depth of the spider (default: 1)'
)
parser.add_argument(
'-f', '--file', dest='logfile', default='spider.log',
help='specify the path of logfile (default: spider.log)'
)
parser.add_argument(
'-l', '--level', dest='loglevel', default=5, type=int,
choices=range(1, 6),
help='specify the verbose level of the log (default: 5)'
)
parser.add_argument(
'--dbfile', dest='dbfile', default='spider.db',
help='specify the path of sqlite dbfile (default: spider.db)'
)
parser.add_argument(
'--thread', dest='num_threads', default=10, type=int,
help='specify the size of thread pool (default: 10)'
)
parser.add_argument(
'--key', dest='key', default='',
help='specify the keyword (default: '')'
)
parser.add_argument(
'--selftest', action='store_true',
help='self-test'
)
args = parser.parse_args()
return args
def set_logger(loglevel, logfile):
''' Set the logger with loglevel and logfile
'''
logger.setLevel(levels[loglevel])
file_handler = logging.FileHandler(logfile)
# If logfile is not 'spider.log'
formatter = logging.Formatter(
'%(asctime)s - %(levelname)s - %(message)s')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
def main():
args = args_parser()
set_logger(args.loglevel, args.logfile)
logger.debug(args)
spider = MySpider(args)
spider.run()
if __name__ == '__main__':
main()

参考

  • http://beautifulsoup.readthedocs.org/zh_CN/latest/
  • http://devdocs.io/python~2.7/library/logging.handlers#logging.FileHandler
  • http://dongweiming.github.io/blog/archives/pa-chong-lian-xi/
  • Python
  • Spider
  • Python

Show >>

Sublime Preferences

2016-04-05

切换标签页

首先,Sublime 的所谓「标签页智能切换」很蛋疼,改成通用快捷键设定

Default (OSX).sublime-keymap
{
"keys": ["ctrl+tab"],
"command": "next_view"
},
{
"keys": ["ctrl+shift+tab"],
"command": "prev_view"
},

Vintage

这个是Sublime自带的vi模式
先注释掉Preferences.sublime-settings里的Vintage

Preferences.sublime-settings
...
"ignored_packages":[
// "Vintage"
],
...

再按照习惯把esc改成j j

Default (OSX).sublime-keymap
...
// Vintage
{
"keys": ["j", "j"],
"command": "exit_insert_mode",
"context":[
{
"key": "setting.command_mode",
"operand": false
},
{
"key": "setting.is_widget",
"operand": false
}
]
},
...

另外还有一个VintageEx,不过我倒是没有太大需求

SublimeREPL

无意中发现一个类似 Vim 下 quickrun 的插件SublimeREPL

在Perferences-Key Bindings - User中加入

Default (OSX).sublime-keymap
...
// SublimeREPL - Python
{
"keys": ["f2"],
"caption": "SublimeREPL: Python - RUN current file",
"command": "run_existing_window_command",
"args": {
"id": "repl_python_run",
"file": "config/Python/Main.sublime-menu",
}
},
...

这样一来,保存之后按F2即可快速运行 Python 脚本,不用来回切换终端了
C’est bon!

ExpandRegion

在知乎看到推荐感觉挺不错,用于快速扩展选区,也是在Perferences-Key Bindings - User中加入

Default (OSX).sublime-keymap
...
{
"keys": ["super+e"],
"command": "expand_region"
},
{
"keys": ["super+u"],
"command": "expand_region",
"args": {
"undo": true
},
"context": [
{
"key": "expand_region_soft_undo"
}
]
},
...

Anaconda

这个插件确实强大,不过有点小问题

import 时不能自动补全

在Stackoverflow找到解决方案
新建/Users/air9/Library/Application\ Support/Sublime\ Text\ 3/Packages/Python/Completion\ Rules.tmPreferences并加入如下内容

Completion Rules.tmPreferences
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>scope</key>
<string>source.python</string>
<key>settings</key>
<dict>
<key>cancelCompletion</key>
<string>^(.*\b(and|or)$)|(\s*(pass|return|and|or|(class|def)\s*[a-zA-Z_0-9]+)$)</string>
</dict>
</dict>
</plist>

保存后重启 Sublime 即可

不过第一次弄的时候,不知道怎么回事出现了配置文件被初始化的 bug,所有插件和改键都失效了。。。还好我碰巧刚刚做了备份,所以下次也要记得先备份一下

过于频繁的补全弹窗

随便按一个空格就以a开头弹窗提示我补全,有时甚至回车补全完成之后弹窗仍然消失不掉,一怒之下禁用掉了

Anaconda.sublime-settings
...
/*
Disable anaconda completion
WARNING: set this as true will totally disable anaconda completion
*/
// "disable_anaconda_completion": false,
"disable_anaconda_completion": true,
/*
...

Sublime 自带的轻量级补全其实已经满足我的日常需求了,装 Anaconda 主要还是为了 Tooltip 和 PEP8 提示

MarkdownEditing

这个其实用处不大,平常在 Mac 上写 Markdown 都用的 MacDown
不过默认会渲染 txt 还是挺蛋疼的,注释掉
顺便改个主题

Markdown.sublime-settings
{
"extensions":
[
"md",
"mdown",
// "txt"
],
"color_scheme": "Packages/MarkdownEditing/MarkdownEditor-Dark.tmTheme",
}
  • SublimeREPL
  • ExpandRegion
  • Anaconda
  • MarkdownEditing
  • Sublime

Show >>

Spider-04-sqlite3

2016-04-05

知道创宇爬虫设计第四天:sqlite3

这部分比较简单,需要注意的几点

  • connect, cursor
  • execute
  • commit, rollback
  • close

既然之前已经学会了使用logging,就可以尝试下配合着使用

代码

MySqlite.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sqlite3
import logging
import logging.config
logging.config.fileConfig('logging.conf')
levels = {
1: 'CRITICAL',
2: 'ERROR',
3: 'WARNING',
4: 'INFO',
5: 'DEBUG',
}
loglevel = 4
logger = logging.getLogger('spider')
logger.setLevel(levels[loglevel])
class MySqlite(object):
def __init__(self, dbfile):
try:
self.conn = sqlite3.connect(dbfile)
logger.info("Open database %s" % dbfile)
logger.debug("Open database %s" % dbfile)
except sqlite3.Error as e:
# print "Fail to connect %s: %s" % (dbfile, e) # e.args[0]
logger.error("Fail to connect %s: %s" % (dbfile, e))
return
self.cursor = self.conn.cursor()
def create(self, table):
try:
logger.info("Create table %s" % table)
self.cursor.execute(
"CREATE TABLE IF NOT EXISTS %s(Id INTEGER PRIMARY KEY \
AUTOINCREMENT, Data VARCHAR(40))" % table
)
self.conn.commit()
except sqlite3.Error as e:
logger.error("Fail to create %s: %s" % (table, e))
self.conn.rollback()
def insert(self, table, data):
try:
logger.info("Insert %s into table %s" % (data, table))
self.cursor.execute(
"INSERT INTO %s(Data) VALUES('%s')" % (table, data))
self.conn.commit()
except sqlite3.Error as e:
logger.error("Fail to insert %s into %s: %s" % (data, table, e))
self.conn.rollback()
def close(self):
logger.info("Close database")
self.cursor.close()
self.conn.close()
if __name__ == '__main__':
ms = MySqlite('spider.db')
ms.create('t1')
ms.insert('t1', 'test')
ms.close()
logging.conf
[loggers]
keys = root, spider
[handlers]
keys = consoleHandler, fileHandler
[formatters]
keys = simpleFormatter
[logger_root]
level = DEBUG
handlers = consoleHandler
[logger_spider]
level = DEBUG
handlers = consoleHandler, fileHandler
qualname = spider
propagate = 0
[handler_consoleHandler]
class = StreamHandler
level = DEBUG
formatter = simpleFormatter
args = (sys.stdout,)
[handler_fileHandler]
class = FileHandler
level = DEBUG
formatter = simpleFormatter
args = ('spider.log', 'w')
[formatter_simpleFormatter]
format = %(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt =

参考

  • https://docs.python.org/2/library/sqlite3.html
  • http://blog.sina.com.cn/s/blog_72603eac01013pbc.html
  • http://blog.csdn.net/jeepxiaozi/article/details/8808435
  • http://dongweiming.github.io/blog/archives/pa-chong-lian-xi/
  • http://devdocs.io/python~2.7/library/logging.config
  • Python
  • Spider
  • sqlite
  • Python

Show >>

Spider-03-argparse

2016-04-04

知道创宇爬虫设计第三天:argparse

题目原本是2012年的,要求的optparse模块已经过时,因此按照argparse来学习

提取要点如下:

  • class argparse.ArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=argparse.HelpFormatter, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True)
  • ArgumentParser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])
  • ArgumentParser通过parse_args()方法解析参数。它将检查命令行,把每个参数转换成恰当的类型并采取恰当的动作。在大部分情况下,这意味着将从命令行中解析出来的属性建立一个简单的 Namespace对象

代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse
def parse():
parser = argparse.ArgumentParser()
parser.add_argument(
'-u', '--url', dest='url', required=True,
help='specify the URL to start crawl'
)
parser.add_argument(
'-d', '--depth', dest='depth',
default=1, type=int,
help='specify the depth of the spider (default: 1)'
)
parser.add_argument(
'-f', '--file', dest='logfile',
default='spider.log',
help='specify the path of logfile (default: spider.log)'
)
parser.add_argument(
'-l', '--level', dest='loglevel', choices=range(1, 6),
default=1, type=int,
help='specify the verbose level of the log (default: 1)'
)
parser.add_argument(
'--dbfile', dest='dbfile',
default='spider.db',
help='specify the path of sqlite dbfile (default: spider.db)'
)
parser.add_argument(
'--thread', dest='num_threads',
default=10, type=int,
help='specify the size of thread pool (default: 10)'
)
parser.add_argument(
'--keyword', dest='keyword',
help='specify the keyword'
)
parser.add_argument(
'--selftest', action='store_true',
help='self-test'
)
args = parser.parse_args()
# > Namespace(
# dbfile='spider.db', depth=1, keyword=None,
# logfile='spider.log', loglevel=1, num_threads=10,
# selftest=False, url='www.baidu.com'
# )
return args
if __name__ == '__main__':
parse()

参考

  • https://docs.python.org/2/howto/argparse.html
  • https://docs.python.org/2/library/argparse.html#choices
  • http://python.usyiyi.cn/python_278/library/argparse.html
  • http://blog.xiayf.cn/2013/03/30/argparse/
  • Python
  • Spider
  • argparse
  • Python

Show >>

Spider-02-logging

2016-04-03

知道创宇爬虫设计第二天:logging

此文章的测试用例详细实用,对logging模块的解析也很不错,现把自己理解的要点摘录如下

  • 只要logging.getLogger(name)中名称参数name相同则返回的Logger实例就是同一个,且仅有一个,也即name与Logger实例一一对应
  • 子孙既会将消息分发给他的handler进行处理,也会传递给所有的祖先Logger处理
  • 若为Handler加Filter则所有使用了该Handler的Logger都会受到影响。而为Logger添加Filter只会影响到自身
  • 典型的多模块场景下使用logging的方式,是在main模块中配置logging,这个配置会作用于其所有子模块
  • 使用配置文件logging.config.fileConfig("logging.conf")(来源)
logging.conf
# 定义logger模块,root是父类,必需存在的,其它的是自定义。
# logging.getLogger(NAME)便相当于向logging模块注册了一种日志打印
# name 中用 . 表示 log 的继承关系
[loggers]
keys=root,infoLogger,errorLogger
# 定义handler
[handlers]
keys=infoHandler,errorHandler
# 定义格式化输出
[formatters]
keys=infoFmt,errorFmt
#--------------------------------------------------
# 实现上面定义的logger模块,必需是[logger_xxxx]这样的形式
#--------------------------------------------------
# [logger_xxxx] logger_模块名称
# level 级别,级别有DEBUG、INFO、WARNING、ERROR、CRITICAL
# handlers 处理类,可以有多个,用逗号分开
# qualname logger名称,应用程序通过 logging.getLogger获取。对于不能获取的名称,则记录到root模块。
# propagate 是否继承父类的log信息,0:否 1:是
[logger_root]
level=INFO
handlers=errorHandler
[logger_errorLogger]
level=ERROR
handlers=errorHandler
propagate=0
qualname=errorLogger
[logger_infoLogger]
level=INFO
handlers=infoHandler
propagate=0
qualname=infoLogger
#--------------------------------------------------
# handler
#--------------------------------------------------
# [handler_xxxx]
# class handler类名
# level 日志级别
# formatter,上面定义的formatter
# args handler初始化函数参数
[handler_infoHandler]
class=StreamHandler
level=INFO
formatter=infoFmt
args=(sys.stdout,)
[handler_errorHandler]
class=logging.handlers.TimedRotatingFileHandler
level=ERROR
formatter=errorFmt
# When computing the next rollover time for the first time (when the handler is created),
# the last modification time of an existing log file, or else the current time,
# is used to compute when the next rotation will occur.
# 这个功能太鸡肋了,是从handler被创建的时间算起,不能按自然时间 rotation 切分,除非程序一直运行,否则这个功能会有问题
# 临时解决方案参考下面的链接:Python 多进程日志记录
# http://blogread.cn/it/article/4175?f=wb2
args=('C:\\Users\\june\\Desktop\\error.log', 'M', 1, 5)
#--------------------------------------------------
# 日志格式
#--------------------------------------------------
# %(asctime)s 年-月-日 时-分-秒,毫秒 2013-04-26 20:10:43,745
# %(filename)s 文件名,不含目录
# %(pathname)s 目录名,完整路径
# %(funcName)s 函数名
# %(levelname)s 级别名
# %(lineno)d 行号
# %(module)s 模块名
# %(message)s 消息体
# %(name)s 日志模块名
# %(process)d 进程id
# %(processName)s 进程名
# %(thread)d 线程id
# %(threadName)s 线程名
[formatter_infoFmt]
format=%(asctime)s %(levelname)s %(message)s
datefmt=
class=logging.Formatter
[formatter_errorFmt]
format=%(asctime)s %(levelname)s %(message)s
datefmt=
class=logging.Formatter

测试用例

main.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Air9'
import logging
import logging.config
logging.config.fileConfig('main.conf')
root_logger = logging.getLogger('root')
root_logger.debug('test root logger')
logger = logging.getLogger('main')
logger.info('test main logger')
logger.info('start import mod')
import mod
logger.debug('test mod.testmod()')
mod.testmod()
root_logger.info('finish test')
main.conf
[loggers]
keys = root, main
[handlers]
keys = consoleHandler
[formatters]
keys = simpleFormatter
[logger_root]
level = DEBUG
handlers = consoleHandler
[logger_main]
level = DEBUG
handlers = consoleHandler
qualname = main
propagate = 0
[handler_consoleHandler]
class = StreamHandler
level = DEBUG
formatter = simpleFormatter
args = (sys.stdout,)
[formatter_simpleFormatter]
format = %(asctime)s - %(name)s - [line:%(lineno)d] - %(levelname)s - %(message)s
datefmt =
mod.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Air9'
import logging
import submod
logger = logging.getLogger('main.mod')
logger.info('logger main.mod')
def testmod():
logger.debug('test mod.testmod()')
submod.testsubmod()
submod.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Air9'
import logging
logger = logging.getLogger('main.mod.submod')
logger.info('submod.logger')
def testsubmod():
logger.debug('test submod.testsubmod()')
output
2016-04-04 14:46:24,148 - root - [line:13] - DEBUG - test root logger
2016-04-04 14:46:24,148 - main - [line:16] - INFO - test main logger
2016-04-04 14:46:24,148 - main - [line:17] - INFO - start import mod
2016-04-04 14:46:24,148 - main.mod.submod - [line:11] - INFO - submod.logger
2016-04-04 14:46:24,148 - main.mod - [line:12] - INFO - logger main.mod
2016-04-04 14:46:24,149 - main - [line:20] - DEBUG - test mod.testmod()
2016-04-04 14:46:24,149 - main.mod - [line:15] - DEBUG - test mod.testmod()
2016-04-04 14:46:24,149 - main.mod.submod - [line:14] - DEBUG - test submod.looger
2016-04-04 14:46:24,149 - root - [line:23] - INFO - finish test

参考

  • http://my.oschina.net/leejun2005/blog/126713
  • http://blog.chinaunix.net/uid-26000296-id-4372063.html
  • http://www.tuicool.com/articles/bmMfUfE
  • Python
  • Spider
  • logging
  • Python

Show >>

Spider-01-MyThreadPool

2016-04-02

知道创宇爬虫设计第一天:threadpool

题目要求自己实现线程池,研究了好几篇博客之后,大致提取出几个要点:

  • 使用默认的Thread()创建线程时,通常都是直接绑定一个具体的func
    但若使用线程池,初始化的线程数在有些时候可能会多于任务数
    因此,在自定义MyThread()时,采用先绑定整个任务队列,然后逐条取出任务func执行的方式
  • 使用MyThreadPool提前创建所需数目的线程,再分配给具体任务func
  • 自定义MyThread()中需设置self.daemon = True,否则完成所有任务后仍不会推出
  • Queue()提供了两个非常好用的方法
    • task_done()一条任务完成时通知整个队列,空闲下来的线程就可以被分配新任务
    • join()在所有任务执行完成之前阻塞主线程

代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from threading import Thread
from Queue import Queue
class MyThreadPool(object):
def __init__(self, num_threads=20):
self.tasks = Queue(num_threads)
for _ in xrange(num_threads):
# Init the pool with the number of num_threads
MyThread(self.tasks)
def add_task(self, func, *args, **kwargs):
self.tasks.put((func, args, kwargs))
def wait_completion(self):
# Blocks until all items in the queue have been gotten and processed.
self.tasks.join()
class MyThread(Thread):
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
# This must be set before start() is called. The entire Python program
# exits when no alive non-daemon threads are left.
self.daemon = True
self.start()
def run(self):
while True:
# Block until an item is available.
func, args, kwargs = self.tasks.get()
try:
func(*args, **kwargs)
except Exception as e:
print e
# Tells the queue that the processing on the task is complete.
self.tasks.task_done()
if __name__ == '__main__':
''' test task '''
from time import sleep
def nap(sec):
sleep(sec)
print 'Had a %ds nap...' % sec
tp = MyThreadPool(5)
nap_time = [i for i in xrange(1, 11)]
for i, t in enumerate(nap_time, 1):
print 'Worker No.%d needs a %ds nap.' % (i, t)
tp.add_task(nap, t)
tp.wait_completion()

参考

  1. python线程池
  2. 【Howie玩python】-多线程从0到1到澡堂子洗澡
  • Python
  • Spider
  • ThreadPool
  • Python

Show >>

ropgadget capstone

2016-03-10

import error

capstone是一个著名的反汇编框架
ropgadget会用到

但是每次pip install ropgadget之后都会报错
import error:ERROR: fail to load the dynamic library.

2016.5.10 补充:ropgadget 里的依赖的 capstone 版本为2.1,而目前最新版本为3.0.4,pip更新 ropgadget 后会降级。。。需pip install -U capstone

原因在于所需的动态库文件libcapstone.dylib没有位于capstone主目录
所以找到libcapstone.dylib并拷贝一个就可以了

有两种方法

sudo find / -name ‘libcapstone.*’

原来作者把动态库写在了这么一个奇葩的地址
/usr/local/lib/python2.7/site-packages/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/capstone/libcapstone.dylib
大概是写掉了一个/吧,参考 https://github.com/aquynh/capstone/pull/406

pip uninstall capstone

这个方法略显鸡贼,但更快一点
可以发现列出了所有与capstone相关的文件
其中最后一个就是要找的libcapstone.dylib

then

$ cp /usr/local/lib/python2.7/site-packages/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/capstone/libcapstone.dylib /usr/local/lib/python2.7/site-packages/capstone/libcapstone.dylib
$ ropgadget
Need a binary filename (--binary/--console or --help)

已经可以正常使用了

补充

在kali rolling里试了一下
第二种方法行不通
只能用find
可以找到一个/usr/lib/libcapstone.so.3
然后
cp /usr/lib/libcapstone.so.3 /usr/local/lib/python2.7/dist-packages/capstone/libcapstone.dylib
解决

  • capstone
  • python
  • ropgadget
  • rop

Show >>

hexo github

2016-03-09

hexo & github

一开始没有搞懂.deploy_git和.git的区别
后来发现hexo deploy到github上的内容只有纯粹的网页
才大致明白了hexo的工作流程

首先
hexo并没有生成页面文件
项目目录中除了配置文件以外就只有hex new出来的一些.md而已
hexo generate之后才会在public目录下生成一系列html,css等页面文件

hexo deploy之后
hexo才会将所有页面文件push到项目的master分支(在/_config.yml中指定)上
网站因而得以运作

但是这样一来
只有页面文件被放到了远程库
为了将配置文件和.md也放到Github
可以新建一个hexo分支来存放

git checkout -b hexo
git push origin hexo:hexo

这样一来
对hexo所做的修改也可以托管在Github上了

以后只需在hexo g生成页面后
先用hexo d发布到网站(即master分支)
然后add-commit-push所有改动到hexo分支

push conflict

不过在实际的第二次push时
出现了冲突

$ git push origin hexo:hexo
To git@github.com:answerrrrrrrrr/answerrrrrrrrr.github.com.git
! [rejected] hexo -> hexo (non-fast-forward)
error: failed to push some refs to 'git@github.com:answerrrrrrrrr/answerrrrrrrrr.github.com.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

参考廖雪峰的文章得以解决

$ git pull
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=origin/<branch> hexo
$ git branch --set-upstream hexo origin/hexo
The --set-upstream flag is deprecated and will be removed. Consider using --track or --set-upstream-to
Branch hexo set up to track remote branch hexo from origin.
$ git pull
Auto-merging index.html
CONFLICT (add/add): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
$ git status
On branch hexo
Your branch and 'origin/hexo' have diverged,
and have 1 and 1 different commit each, respectively.
(use "git pull" to merge the remote branch into yours)
You have unmerged paths.
(fix conflicts and run "git commit")
Unmerged paths:
(use "git add <file>..." to mark resolution)
both added: index.html
no changes added to commit (use "git add" and/or "git commit -a")
$ subl index.html
$ git add .
$ git commit -m 'index'
[hexo b7ed9be] index
$ git push origin hexo:hexo
  • git
  • github
  • hexo
  • hexo

Show >>

10.11 privoxy [Errno 61] Connection refused

2016-03-09

自从更新10.11以来
之前用scrapy写的一个小爬虫就用不成了
总是报错[Errno 61] Connection refused
但是因为不经常用也就没怎么上心
搜了一下sof
以为是网站做了防爬虫处理
但是折腾未果就去忙正经事了

结果今天挂着代理pip list --outdated的时候
居然报了同样的[Errno 61] Connection refused
才意识到可能是代理的问题

然而Shadowsocks一直使用正常
那问题就只可能出在Privoxy上了

果然lsof -i:8118之后
发觉Privoxy居然没有绑在8118端口上
难怪会「Connection refused」。。。

Google之后
这里提到更新10.11需要重装Privoxy
那就重装吧。。。
在SourceForge下载最新版
然后按以往一样
sudo vim /usr/local/etc/privoxy/config
修改如下两行
listen-address 127.0.0.1:8118
forward-socks5t / 127.0.0.1:1080 .
再启动Privoxy
sudo /Applications/Privoxy/startPrivoxy.sh
按以往经验
这时候应该就可以了
然而lsof -i:8118依然什么也没有

在这里找到问题所在
启用修改后的配置
cd /usr/local/sbin/
./privoxy --no-daemon /usr/local/etc/privoxy/config

这时候lsof -i:8118
终于见到了久违的Privoxy
然后爬虫也终于重生啦

另外
也是在这里发现可以用Privoxy来帮手机搭梯子
算是意外收获了~

  • Privoxy
  • Shadowsocks
  • gfw
  • Mac

Show >>

arch vmtools

2016-03-09

http://blog.csdn.net/zhxlianxin/article/details/17636933

pacman -S open-vm-tools open-vm-tools-modules

pacman -S gtkmm

cat /proc/version > /etc/arch-release

systemctl start vmtoolsd

systemctl enable vmtoolsd

  • Archlinux
  • vmtools
  • Archlinux

Show >>

arch gdb pip ropgadget

2016-03-09

gdb plugin: gdb-peda

sudo pacman -S gdb

git clone https://github.com/longld/peda.git ~/peda

echo "source ~/peda/peda.py" >> ~/.gdbinit

gdb plugin: libheap

curl https://raw.githubusercontent.com/answerrrrrrrrr/VRL/master/test/exploits/lib/libheap.py > libheap.py

sudo mv libheap.py /usr/lib/python2.7

gdb

(gdb) python from libheap import *

(gdb) heap -h

pip

curl https://bootstrap.pypa.io/get-pip.py > getpip.py

sudo python getpip.py

ROPgadget

sudo pacman -S python-capstone

sudo pip install ropgadget

If gets errors about capstone:

sudo pip install ropgadget --upgrade

  • gdb
  • Archlinux
  • pip
  • rop

Show >>

tmux copy2clipboard

2016-03-09

https://wiki.archlinux.org/index.php/Tmux#Mouse_scrolling

Mouse scrolling

Note: This interferes with selection buffer copying and pasting. To copy/paste to/from the selection buffer hold the shift key.

If you want to scroll with your mouse wheel, ensure mode-mouse is on in .tmux.conf

set -g mouse on

You can set scroll History with:

set -g history-limit 30000

For mouse wheel scrolling as from tmux 2.1 try adding one or both of these to ~/.tmux.conf

bind-key -T root WheelUpPane   if-shell -F -t = "#{alternate_on}" "send-keys -M" "select-pane -t =; copy-mode -e; send-keys -M"   
bind-key -T root WheelDownPane if-shell -F -t = "#{alternate_on}" "send-keys -M" "select-pane -t =; send-keys -M"

Copy from tmux to clipboard

Hold the shift key to select the text, then copy & paste.

  • Archlinux
  • tmux
  • tmux

Show >>

« Prev123Next »
© 2018 Air9
Hexo Theme Yilia by Litten
  • ⇝

tag:

  • Privoxy
  • Shadowsocks
  • gfw
  • Coursera
  • JavaScript
  • AngularJS
  • ngResource
  • Python
  • Flask-SQLAlchemy
  • Alembic
  • Flask-Migrate
  • Flask
  • relationship
  • backref
  • Linux
  • x86
  • elf
  • main
  • libc
  • start
  • init
  • gdb
  • sorted
  • key
  • Closure
  • Decorator
  • func_defaults
  • Q
  • asm
  • align
  • capstone
  • ida
  • Spider
  • ThreadPool
  • logging
  • argparse
  • sqlite
  • requests
  • doctest
  • SublimeREPL
  • ExpandRegion
  • Anaconda
  • MarkdownEditing
  • Archlinux
  • pip
  • vmtools
  • bootstrap
  • keyboard
  • tagIndex
  • Brackets
  • gulp
  • browser-sync
  • python
  • ropgadget
  • Refactoring
  • git
  • github
  • ssh
  • hexo
  • rss
  • sitemap
  • idapython
  • idc
  • ionic
  • rop
  • keystone
  • unicorn
  • Mac
  • mongodb
  • MySQL
  • UTF-8
  • markdown
  • evernote
  • sublime
  • vscode
  • proxy
  • iTerm
  • polipo
  • Charles
  • 科学上网
  • Powerline
  • cookbook
  • closure
  • factory
  • string
  • python-magic
  • mime
  • apt-get
  • raspi
  • ANSI
  • termcolor
  • colorama
  • redis
  • brew
  • tmux
  • 字符集
  • 编码
  • ASCII
  • Unicode

    缺失模块。
    1、请确保node版本大于6.2
    2、在博客根目录(注意不是yilia根目录)执行以下命令:
    npm i hexo-generator-json-content --save

    3、在根目录_config.yml里添加配置:

      jsonContent:
        meta: false
        pages: false
        posts:
          title: true
          date: true
          path: true
          text: false
          raw: false
          content: false
          slug: false
          updated: false
          comments: false
          link: false
          permalink: false
          excerpt: false
          categories: false
          tags: true