主流视频网站弹幕下载

如今主流的视频网站(如 bilibili,腾讯,爱奇艺,优酷,芒果 TV 等)都支持了弹幕,本文介绍了如何下载视频弹幕(.xml)文件并转化为字幕(.ass)文件,支持本地播放。

XML 格式弹幕

B 站是最早的一批弹幕网站之一,且比较成熟,弹幕可以直接以 XML 格式下载,非常方便,所以本文下载的弹幕均以 B 站的 XML 弹幕格式的简化为标准格式。

1
2
3
4
5
<?xml version="1.0" encoding="UTF-8"?>
<i>
<d p="5,1,20,16777215">这是一条弹幕</d>
...
</i>

每一条弹幕的属性 p 的格式为:

  1. 弹幕发送时间,相对于视频开始时间,以秒为单位
  2. 弹幕类型,1-3 为滚动弹幕、4 为底部、5 为顶端、6 为逆向、7 为精确、8 为高级
  3. 字体大小,25 为中,18 为小,Bilibili 只有这 2 个字号,本地 20 字号比较合适(电脑分辨率是 1920*1080)
  4. 弹幕颜色,RGB 颜色转为十进制后的值,16777215 为白色
  5. 弹幕发送时间,Unix 时间戳格式
  6. 弹幕池,0 为普通,1 为字幕,2 为特殊
  7. 发送人的 id
  8. 弹幕 id

一般只需要使用前 4 项即可。

Python 中利用 request 库来爬取网页结果:

1
2
3
4
5
6
7
import urllib.request

def get_response(url):
req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36')
response = urllib.request.urlopen(req).read().decode('utf-8')
return response

生成 XML 弹幕文件时需要检查是否有非法 XML 字符,并可以设置弹幕黑名单:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
filename = 'XML/' + title + '.xml'
contents = []
with open(filename, 'w', encoding='utf-8') as fout:
fout.write('<?xml version="1.0" encoding="UTF-8"?>\n')
fout.write('<i>\n')
illegal = False #标志是否有非法XML字符
for char in ['<', '>', '&', '\u0000', '\b']:
if char in j['content']:
illegal = True
break
if illegal:
continue
black_list = [''] #列出弹幕黑名单
if content not in contents and all(word not in content for word in black_list):
contents.append(content)
fout.write('<d p="' + str(timepoint) + ',' + str(ct) +',' + str(size) + ',' + str(color) + '">' + content + '</d>\n')
fout.write('</i>')

网上很多相关工具(如弹幕 ASS 转换工具等)可以将 XML 弹幕文件转换成 ASS 字幕文件。
基于弹幕 ASS 转换工具个性化设置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// 设置项,适合视频2倍速播放
var config = {
'playResX': 1440, // 屏幕分辨率宽(像素)
'playResY': 810, // 屏幕分辨率高(像素)
'fontlist': [ // 字形(会自动选择最前面一个可用的)
'黑体',
'Microsoft YaHei UI',
'Microsoft YaHei',
'文泉驿正黑',
'STHeitiSC',
],
'font_size': 1.2, // 字号(比例)
'r2ltime': 20, // 右到左弹幕持续时间(秒)
'fixtime': 5, // 固定弹幕持续时间(秒)
'opacity': 0.8, // 不透明度(比例)
'space': 0, // 弹幕间隔的最小水平距离(像素)
'max_delay': 6, // 最多允许延迟几秒出现弹幕
'bottom': 0, // 底端给字幕保留的空间(像素)
'use_canvas': true, // 是否使用canvas计算文本宽度(布尔值,Linux下的火狐默认否,其他默认是,Firefox bug #561361)
'debug': false, // 打印调试信息
};

腾讯视频弹幕下载

打开一个腾讯视频 PC 网页端,其源码中的 VIDEO_INFO 字段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
var VIDEO_INFO = {
"publish_date": "",
"leading_actor_id": [""],
"duration": ,
"guests": ,
"race_teams_id": ,
"type_name": ,
"tag": [

],
"singer_id": ,
"episode": ,
"race_stars_id": ,
"srcsite_name": ,
"type": ,
"title": ,
"leading_actor": [""],
"show_type": ,
"singer_name": ,
"danmu_status": ,
"second_title": ,
"positive_trailer": ,
"athlete": ,
"mv_stars": ,
"trytime_second": ,
"c_full": ,
"update_flag": ,
"first_recommand": ,
"desc": ,
"pioneer_tag": ,
"begin_time": ,
"upload_qq": ,
"category_map": [, ""],
"is_trailer": ,
"stars_name": ,
"pic_640_360": ,
"c_title_segment": ,
"guests_id": ,
"presenter_id": ,
"upload_src": ,
"athlete_id": ,
"sec_recommand": ,
"costar_id": ,
"relative_stars_id": ,
"relative_stars": ,
"drm": ,
"modify_time": ,
"tail_time": ,
"valid_tag_id": ,
"vid": ,
"pic_url": ,
"costar": ,
"race_teams_name": ,
"c_title_output": ,
"director_id": [""],
"title_en": ,
"stars": ,
"danmu": ,
"mv_stars_id": ,
"playright": [""],
"presenter": ,
"race_stars": ,
"view_all_count": ,
"c_tags_flag": ,
"c_has_adv_danmu": ,
"head_time": ,
"state": ,
"copyright_id": ,
"pic160x90": ,
"director": [""],
"famous_id": ,
"pioneer_tag_ids": ,
"trytime": ,
"famous_actor": ,
"video_checkup_time": ,
"": ,
"isFull":
};

其中所需的字段是duration、title、vid。
接下来通过vid找到targetid:http://bullet.video.qq.com/fcgi-bin/target/regist?otype=json&vid=(%vid%),打开此链接得到:
1
2
3
4
5
6
7
8
9
10
11
QZOutputJson = {
"danmukey":"bubble_flag=&targetid=&vid=&type=",
"display":,
"is_has_adv":,
"is_has_bubble":,
"open":,
"returncode":,
"returnmsg":,
"targetid":,
"userstatus":
}

然后就可以通过targetid得到弹幕:http://mfm.video.qq.com/danmu?timestamp=(%timestamp%)&target_id=(%targetid%),其中timestamp从0开始并且以30为增量,打开此链接得到(只截取了第一条弹幕):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"err_code":,
"err_msg":,
"peroid":,
"target_id":,
"count":,
"tol_up":,
"single_max_count":,
"session_key":,
"comments":[
{
"commentid":,
"content":,
"upcount":,
"isfriend":,
"isop":,
"isself":,
"timepoint":,
"headurl":,
"opername":,
"bb_bcolor":,
"bb_head":,
"bb_level":,
"bb_id":,
"rich_type":,
"uservip_degree":,
"content_style": "{\"color\":\"\",\"position\":}"
}
]
}

其中timepoint、content_style中的color、content字段可以组成xml弹幕格式。
全部python代码为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import requests
import json

def getHTMLText(url):
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
r.encoding='utf-8'
return r.text
except Exception as e:
print(e)
return ''

def get_tencent_danmu(url):
video_info = json.loads(str([s for s in getHTMLText(url).split('\n') if 'VIDEO_INFO' in str(s)]).strip('[\'var VIDEO_INFO = ').strip('\']'))
duration = video_info['duration']
title = video_info['title']
vid = video_info['vid']
targetid = json.loads(getHTMLText('http://bullet.video.qq.com/fcgi-bin/target/regist?otype=json&vid=' + vid).strip('QZOutputJson=').strip(';'))['targetid']
filename = 'XML/' + title + '.xml'
contents = []
print('\n' + title + ': ', end='')
with open(filename, 'w', encoding='utf-8') as fout:
fout.write('<?xml version="1.0" encoding="UTF-8"?>\n')
fout.write('<i>\n')
for i in range(int(duration) // 30 + 1):
timestamp = i*30
print(i/2, end='min, ')
response = getHTMLText('http://mfm.video.qq.com/danmu?timestamp=' + str(timestamp) + '&target_id=' + targetid)
if response == '':
continue
try:
danmu = json.loads(response, strict=False)
for j in danmu['comments']:
illegal = False #标志是否有非法XML字符
for char in ['<', '>', '&', '\u0000', '\b']:
if char in j['content']:
illegal = True
break
if illegal:
continue
timepoint = j['timepoint'] #弹幕发送时间
ct = 1 #弹幕样式
size = 20 #字体大小
# 获取颜色
if 'color' in j['content_style']:
content_style = json.loads(j['content_style'])
color = int(content_style['color'], 16)
else:
color = 16777215
content = j['content'] #弹幕内容
black_list = ['word']
if ':' in content:
content = content.split(':')[1].strip(' ').strip(' ')
if content not in contents and all(word not in content for word in black_list):
contents.append(content.strip(' ').strip(' '))
fout.write('<d p="' + str(timepoint) + ',' + str(ct) +',' + str(size) + ',' + str(color) + '">' + content + '</d>\n')
except Exception as e:
continue
fout.write('</i>')

爱奇艺视频弹幕下载

打开一个爱奇艺视频 PC 网页端,其源码中的 page-info 字段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"albumId":,
"albumName":,
"imageUrl":,
"tvId":,
"vid":,
"cid":,
"isSource":,
"contentType":,
"vType":,
"pType":,
"pageNo":,
"pageType":,
"userId":,
"pageUrl":,
"tvName":,
"isfeizhengpian":,
"categoryName":,
"categories":,
"downloadAllowed":,
"publicLevel":,
"payMark":,
"payMarkUrl":,
"vipType":[

],
"qiyiProduced":,
"exclusive":,
"tvYear":,
"duration":"::",
"wallId":,
"rewardAllowed":,
"commentAllowed":,
"heatShowTypes":,
"videoTemplate":,
"issueTime":
}

其中所需的字段是duration、tvName、albumId、tvId、cid。
duration由‘时:分:秒’格式转为秒:
1
2
3
4
5
duration_str = page_info['duration'].split(':')
duration = 0
for i in range(len(duration_str)-1):
duration = (duration + int(duration_str[i])) * 60
duration = duration + int(duration_str[-1])

然后就可以通过albumId、tvId、cid得到弹幕:http://cmts.iqiyi.com/bullet/(%tvId[-4:-2]%)/(%tvId[-2:]%)/(%tvId%)_300_(%page%).z?rn=0.(%16位随机数%)&business=danmu&is_iqiyi=true&is_video_page=true&tvid=(%tvid%)&albumid=(%albumid%)&categoryid=(%cid%)&qypid=01010021010000000000,其中tvId需要分割出倒数4-3位和倒数2-1位,page从1开始并且以1为增量,打开此链接得到(%tvId%)_300_(%page%).z的文件,这个文件是压缩的字节流需要解压。
Python中利用zlib库,dec = zlib.decompressobj(32 + zlib.MAX_WBITS)b = dec.decompress('z文件').decode("utf-8") 得到XML格式的弹幕(只截取了第一条弹幕):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?xml version="1.0" encoding="utf-8"?>

<danmu>
<code></code>
<data>
<entry>
<int></int>
<list>
<bulletInfo>
<contentId></contentId>
<content></content>
<showTime>1</showTime>
<font></font>
<color></color>
<opacity></opacity>
<position></position>
<background></background>
<contentType></contentType>
<isReply></isReply>
<likeCount></likeCount>
<plusCount></plusCount>
<dissCount></dissCount>
<userInfo>
<senderAvatar></senderAvatar>
<uid></uid>
<udid></udid>
<name></name>
</userInfo>
</bulletInfo>
</list>
</entry>
</data>
<sum></sum>
<validSum></validSum>
<duration></duration>
<ts></ts>
</danmu>


其中showTime、color、content字段可以组成xml弹幕格式(color需要从16进制转换成10进制)。
全部python代码为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import requests
import re
import json
from random import randint
import zlib
import xml.etree.ElementTree as ET

def getHTMLText(url, encode):
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
if encode == 'utf-8':
r.encoding='utf-8'
return r.text
elif encode == 'byte':
return r.content
except Exception as e:
print(e)
return ''

def get_iqiyi_danmu(url):
page_info = json.loads(re.search(r'page-info=\'(.*)\'( *):video-info', getHTMLText(url, 'utf-8')).group(1))
duration_str = page_info['duration'].split(':')
duration = 0
for i in range(len(duration_str)-1):
duration = (duration + int(duration_str[i])) * 60
duration = duration + int(duration_str[-1])
title = page_info['tvName']
albumid = page_info['albumId']
tvid = page_info['tvId']
categoryid = page_info['cid']
page = duration // (60 * 5) + 1
filename = 'XML/' + title + '.xml'
contents = []
with open(filename, 'w', encoding='utf-8') as fout:
fout.write('<?xml version="1.0" encoding="UTF-8"?>\n')
fout.write('<i>\n')
for i in range(duration // (60 * 5) + 1):
dec = zlib.decompressobj(32 + zlib.MAX_WBITS)
try:
b = dec.decompress(getHTMLText('http://cmts.iqiyi.com/bullet/' + str(tvid)[-4:-2] + '/' + str(tvid)[-2:] + '/' + str(tvid) + '_300_' + str(i+1) + '.z?rn=0.' + ''.join(['%s' % randint(0, 9) for num in range(0, 16)]) + '&business=danmu&is_iqiyi=true&is_video_page=true&tvid=' + str(tvid) + '&albumid=' + str(albumid) + '&categoryid=' + str(categoryid) + '&qypid=01010021010000000000', 'byte'))
print('page: ' + str(i))
except:
print(print('page not found: ' + str(i)))
try:
root = ET.fromstring(b.decode('utf-8'))
except Exception as e:
print(e)
continue
for bulletInfo in root.iter('bulletInfo'):
timepoint = bulletInfo[3].text #弹幕发送时间
ct = 1 #弹幕样式
size = 20 #字体大小
color = int(bulletInfo[5].text, 16) #颜色
content = bulletInfo[1].text #弹幕内容
black_list = ['word']
if content not in contents and all(word not in content for word in black_list):
contents.append(content)
fout.write('<d p="' + str(timepoint) + ',' + str(ct) +',' + str(size) + ',' + str(color) + '">' + content + '</d>\n')
fout.write('</i>')

优酷视频弹幕下载

打开一个优酷视频 PC 网页端,其源码中的 window.PageConfig 字段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
window.PageConfig = {
transfer_mode: ,
isDRM: ,
videoCategoryId: ,
isSimple: ,
videoId: ,
newVersion: ,
isDebug: ,
pid: ,
homeHost: ,
youku_homeurl: ,
catId: ,
playmode: ,
videoOwner: ,
videoOwner_en: ,
videoId2: ,
currentEncodeVid: ,
catName: ,
seconds: ,
bullet: ,
transfer: ,
panorama: ,
folderId: ,
fpos: ,
forder: ,
ftotalpos: ,
showid_en: ,
showid: ,
cp: ,
paid: ,
showtype: ,
tabs: ,
singerId: ,
loadinglogo: ,
lottery_open_sidetool: ,
lottery_id_sidetool: ,
lottery_sidetool: ,
page: {
type: ,
isdatetype: ,
year: ,
firstMon: ,
lastMon: ,
currMon: ,
episodeLast: ,
parentvideoid: ,
compeleted:
},
copytoclip: ,
playerUrl:
};
var str = "&ct=c&cs=&td=&s=&v=&u=&paid=&tt=";

其中所需的字段是seconds、tt、videoId。
然后就可以通过videoId得到弹幕:https://service.danmu.youku.com/list?mat=(%mat%)&ct=1001&iid=(%videoId%),其中mat从0开始并且以1为增量,打开此链接得到(只截取了第一条弹幕):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"count": ,
"filtered": ,
"result": [{
"aid": ,
"content": "",
"createtime": ,
"ct": ,
"extFields": {
"voteUp":
},
"id": ,
"iid": ,
"ipaddr": ,
"level": ,
"lid": ,
"mat": ,
"ouid": ,
"playat": ,
"propertis": "{\"pos\":,\"size\":,\"effect\":,\"color\":,\"dmfid\":}",
"status": ,
"type": ,
"uid": ,
"ver":
}],
"scm": "0"
}

其中playat、propertis中的color、content字段可以组成xml弹幕格式。
全部python代码为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import urllib.request
import re
import json

def get_response(url):
req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36')
response = urllib.request.urlopen(req).read().decode('utf-8')
return response

def get_youku_danmu(url):
res = get_response(url)
title = re.search(r'<title>(.*)</title>', res).group(1).split('—')[0]
iid = re.search(r'videoId: \'(\d*)\'', res).group(1)
duration = float(re.search(r'seconds: \'(.*)\',', res).group(1))
filename = 'XML/' + title.split('集 ')[0] + '.xml'
contents = []
with open(filename, 'w', encoding='utf-8') as fout:
fout.write('<?xml version="1.0" encoding="UTF-8"?>\n')
fout.write('<i>\n')
for mat in range(int(duration) // 60 + 1):
response = get_response('https://service.danmu.youku.com/list?mat=' + str(mat) + '&ct=1001&iid=' + iid)
danmu = json.loads(response)
print(str(mat) + '\tresult:' + str(len(danmu['result'])))
for i in range(len(danmu['result'])):
illegal = False #标志是否有非法XML字符
for char in ['<', '>', '&', '\u0000', '\b']:
if char in danmu['result'][i]['content']:
illegal = True
break
if illegal:
continue
playat = danmu['result'][i]['playat']/1000 #弹幕发送时间
ct = 1 #弹幕样式
size = 20 #字体大小
# 获取颜色
if 'color' in danmu['result'][i]['propertis']:
propertis = json.loads(danmu['result'][i]['propertis'])
color = propertis['color']
else:
color = 16777215
content = danmu['result'][i]['content'] #弹幕内容
black_list = ['word']
if content not in contents and all(word not in content for word in black_list):
contents.append(content)
fout.write('<d p="' + str(playat) + ',' + str(ct) +',' + str(size) + ',' + str(color) + '">' + content + '</d>\n')
fout.write('</i>')

芒果视频弹幕下载

打开一个芒果视频 PC 网页端,其网址(以 https://www.mgtv.com/b/9015/4828668.html 为例)中以 / 分割,倒数第二位是 cid,倒数第一位是 vid。
从源码中 <title>霸王别姬 - 视频在线观看 - 霸王别姬 - 芒果TV</title> 可获得 title。
然后就可以通过 cid 和 vid 得到弹幕:https://galaxy.bz.mgtv.com/rdbarrage?vid=(%vid%)&cid=(%cid%)&time=(%time%),其中 time 从 0 开始并且下一个 time 的值可从弹幕中得到,打开此链接得到(只截取了第一条弹幕):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"status":,
"msg":"操作成功",
"seq":"",
"data":{
"next":,
"interval":,
"items":[
{
"id":,
"type":,
"uid":,
"content":,
"time":
}
]
}
}

其中time、content字段可以组成xml弹幕格式。
全部python代码为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import urllib.request
import json
import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
def get_response(url):
req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36')
response = urllib.request.urlopen(req).read().decode('utf-8')
return response

def get_mangguo_danmu(url):
cid = url.split('/')[4]
vid = url.split('/')[5].strip('.html')
video_info = json.loads(get_response('https://pcweb.api.mgtv.com/video/info?vid=8244411&cid=335811'))
title = video_info['data']['info']['videoName']
filename = 'XML/' + title + '.xml'
contents = []
with open(filename, 'w', encoding='utf-8') as fout:
fout.write('<?xml version="1.0" encoding="UTF-8"?>\n')
fout.write('<i>\n')
time = 0
while True:
print('https://galaxy.bz.mgtv.com/rdbarrage?version=2.0.0&vid=' + vid + '&cid=' + cid + '&time=' + str(time))
danmu = json.loads(get_response('https://galaxy.bz.mgtv.com/rdbarrage?version=2.0.0&vid=' + vid + '&cid=' + cid + '&time=' + str(time)))
print(str(time))
if danmu['data']['items'] == None:
break
for j in danmu['data']['items']:
illegal = False #标志是否有非法XML字符
for char in ['<', '>', '&', '\u0000', '\b']:
if char in j['content']:
illegal = True
break
if illegal:
continue
timepoint = j['time']/1000 #弹幕发送时间
ct = 1 #弹幕样式
size = 20 #字体大小
color = 16777215 #弹幕颜色
content = j['content'] #弹幕内容
black_list = ['word']
if content not in contents and all(word not in content for word in black_list):
contents.append(content)
fout.write('<d p="' + str(timepoint) + ',' + str(ct) +',' + str(size) + ',' + str(color) + '">' + content + '</d>\n')
time = danmu['data']['next']
fout.write('</i>')

视频下载

You-Get 是一个命令行程序,提供便利的方式来下载网络上的媒体信息。
you-get 的功用:
1. 下载流行网站的音频、视频 (查看完整支持列表)
2. 在媒体播放器中观看在线视频,脱离浏览器与广告
3. 下载喜欢的网页上的图片
4. 下载任何非 HTML 内容,例如二进制文件

you-get 主要在 linux 等开源平台上运行,由于家用电脑大多为 windows 系统,安装方法如下:

下载相关安装包

以下是必要依赖,需要单独安装,除非于 Windows 下使用预包装包:
Python 3
FFmpeg 或者 [Libav] https://libav.org/

  1. 通过 pip 安装
    you-get 的官方版本通过 PyPI 分发,可从 PyPI 镜像中通过 pip 包管理器安装。务必使用版本 3 的 pip:
    $ pip3 install you-get

  2. Git clone
    $ git clone git://github.com/soimort/you-get.git
    将源码解压到任意目录即可

升级
考虑到 you-get 安装方法的差异,请使用:
$ pip3 install --upgrade you-get
或下载最新更新:
$ you-get https://github.com/soimort/you-get/archive/master.zip

使用 you-get

进入解压文件夹 you-get-develop 下,在该目录下打开 Windows Powershell。

输入 python you-get 视频网址即可使用下载功能(视频保存在 you-get-develop 目录下)。

腾讯视频下载

打开腾讯视频播放页,打开控制台(F12),Network 选项下搜索 "ts.m3u8" 字段,找到类似下面的网址:
https://apd-(32位字符串).v.smtcdns.com/moviets.tc.qq.com/(44位字符串)/uwMROfz0r5xgoaQXGdGnC2df64hwtZlCglRDKOjEZ_qQW-eC/(160位字符串)/(vid).(数字).ts.m3u8?ver=4

此 m3u8 文件存有 ts 索引相对地址:

1
2
3
4
5
6
7
#EXTM3U
#EXT-X-VERSION:
#EXT-X-MEDIA-SEQUENCE:
#EXT-X-TARGETDURATION:
#EXT-X-PLAYLIST-TYPE:
#EXTINF:(时长),
0(#)_(vid).(数字).(#).ts?index=(数字)&start=(数字)&end=(数字)&brs=(数字)&bre=(数字)&ver=4

可以利用如下代码下载并且合并ts文件:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import urllib.request

def get_response(url):
req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36')
response = urllib.request.urlopen(req).read()
return response

url_m3u8 = '(m3u8地址)'
EXTM3U = get_response(url_m3u8).decode('utf-8').split('\n')
ts = [i for i in EXTM3U if 'ts' in i]
url_header = url_m3u8[::-1].split('/', 1)[1][::-1]
for i in ts:
url_ts = url_header + '/' + i
with open('(文件名).ts', 'ab') as f:
f.write(get_response(url_ts))

批量进行弹幕 ASS 转换

安装 selenium
pip install selenium

如果用 chrome
查看 chrome 的版本号 (Chromium 72.0.3626.121)
https://chromedriver.storage.googleapis.com/LATEST_RELEASE_72.0.3626
https://chromedriver.storage.googleapis.com/index.html?path=72.0.3626.69/
下载相应 win32 版本
解压放入 python 根目录
修改 common.js
startDownload('\ufeff' + ass, name.replace(/\.[^.]*$/, '') + '.ass');
改为 return ass;