起因
最近我最喜欢的一个 B 站 up 主团队宣布解散了,在最后一条视频里也说了账号会还给公司,因为担心视频会被删除掉,所以想着都下到本地来
网上确实有插件可以下载视频,但是需要一个一个手动下,这样太麻烦也不优雅,所以就想着干嘛不用 python 写一个程序批量去下
声明
- 本文只是记录个人操作的过程,如果有侵权请告知本人删除
环境
Windows 10
python 3.8.10
思路
先通过接口拿到 up 主所有视频的 BV 号,这个接口经常会提示访问频繁,稍后再试,所以要加上一些失败重试的逻辑
这里设置的是失败后等待 1s,再去重试,会尝试 100 次,当拿到所有 BV 号时,返回一个列表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54def get_video_lists(page, max_retries=100):
global author
url = 'https://api.bilibili.com/x/space/arc/search?mid={}&ps=30&tid=0&pn={}&keyword=&order=pubdate&jsonp=jsonp'.format(
user_id, page)
headers = {
"accept": "application/json, text/plain, */*",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh,en;q=0.9,en-US;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6",
"cookie": "buvid3=52EE1424-8352-DE0D-C2F9-8CEFBD6D7D2024853infoc; i-wanna-go-back=-1; _uuid=D7F4D7102-F510C-9EFD-B44C-5A15BB3D2B9825216infoc; buvid4=79C7023E-28E0-B231-6510-54E406718DAA25965-022021913-c0D4n8mIkOPQS7cPZ5EOlQ%3D%3D; CURRENT_BLACKGAP=0; LIVE_BUVID=AUTO7016452474409017; rpdid=|(Rlllkm)mY0J'uYRlkRmRum; buvid_fp_plain=undefined; blackside_state=0; fingerprint=6c8532a24d1ddc22356289c4c2d1958f; buvid_fp=34e58163f7b4e31c1736ba5b8416e000; SESSDATA=c35a2a31%2C1662290982%2Ca3c0d%2A31; bili_jct=de750fd4e484b47f40b8bb42a5a72869; DedeUserID=73827743; DedeUserID__ckMd5=9d571d9b5b827b73; sid=c3w73yp7; b_ut=5; hit-dyn-v2=1; nostalgia_conf=-1; PVID=2; innersign=0; b_lsid=B710CBE88_180E5C4ABA4; bp_video_offset_73827743=662643097963855900; CURRENT_FNVAL=80; b_timer=%7B%22ffp%22%3A%7B%22333.1007.fp.risk_52EE1424%22%3A%22180E5C4B0BF%22%2C%22333.337.fp.risk_52EE1424%22%3A%22180E5C521EF%22%2C%22333.999.fp.risk_52EE1424%22%3A%22180E5C5494B%22%7D%7D",
"origin": "https://space.bilibili.com",
"referer": "https://space.bilibili.com/518973111/video?tid=0&page=2&keyword=&order=pubdate",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
}
for _ in range(max_retries):
resp = requests.get(
url=url,
headers=headers
)
if resp.status_code != 200:
print(f"Error: HTTP status code {resp.status_code}")
time.sleep(1) # wait for a while before retrying
continue
if not resp.text:
print("Error: Response is empty")
time.sleep(1) # wait for a while before retrying
continue
try:
js = resp.json()
except JSONDecodeError:
print("Error: Unable to parse JSON, trying again")
time.sleep(1) # wait for a while before retrying
continue
if 'code' in js and js['code'] == -799:
print(f"Error: {js['message']}")
time.sleep(1) # wait for a while before retrying
continue
if 'data' not in js or 'list' not in js['data'] or 'vlist' not in js['data']['list']:
print("Error: Unexpected response")
time.sleep(1) # wait for a while before retrying
continue
vlist = js['data']['list']['vlist']
author = vlist[0]['author']
bvid_list = [x.get('bvid') for x in vlist]
return bvid_list
print(f"Error: Failed to get data after {max_retries} attempts")
return []在当前目录下创建 up 主名字的目录,后续视频都下到这个目录下
1
2
3
4
5current_directory = os.getcwd()
folder_path = os.path.join(current_directory, author)
if not os.path.exists(folder_path):
os.makedirs(folder_path)
print(f"创建文件夹: {folder_path}")调用 you-get 下载每一个视频
1
2
3
4
5
6
7
8
9
10def download_videos(bv):
url = 'https://www.bilibili.com/video/{}'.format(bv)
print("download link", url, flush=True)
command = ['you-get', '-o', folder_path, url]
result = subprocess.run(command)
if result.returncode != 0:
raise RuntimeError(f'Failed to download video {bv}')
print(f'Video {bv} downloaded successfully', flush=True)
time.sleep(5)
return bv直接调用上面下载视频的方法,每次只会下载一个视频,所以要用多线程去提高并发数,这里是同时 5 个线程
1
2
3
4
5
6
7
8
9with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(download_videos, bv): bv for bv in bv_lists}
for future in concurrent.futures.as_completed(futures):
bv = futures[future]
try:
future.result()
except Exception as e:
print(f'Error downloading video {bv}: {e}', flush=True)最后由于 you-get 不仅会下载视频,也会下载弹幕评论,文件以
.xml
为后缀,不需要的话可以批量删掉1
2
3
4
5
6
7xml_files = glob.glob(os.path.join(folder_path, '*.xml'))
for xml_file in xml_files:
try:
os.remove(xml_file)
print(f'Successfully deleted {xml_file}')
except Exception as e:
print(f'Error deleting {xml_file}: {e}')
不足
- 由于 B 站默认不登录的情况,只能下载清晰度较低的视频,如果想下载高画质的视频,需要加上已登录的 Cookie 参数
后续
在寻找如何下载高画质的视频时,发现了一个更好用的工具,地址:https://github.com/HFrost0/bilix
下载指定 up 主视频
1
bilix get_up 'https://space.bilibili.com/up主id' --num [视频数]