比较来自世界各地的卖家的域名和 IT 服务价格

下载大文件 python 有要求

https://requests.readthedocs.io/
- 这真的是一个很好的图书馆。 我想用它来下载大文件。 />1 GB./.
问题是,不可能在内存中保存整个文件,我需要在零件中读取它。 这是以下代码的问题。


import requests

def DownloadFile/url/
local_filename = url.split/'/'/[-1]
r = requests.get/url/
f = open/local_filename, 'wb'/
for chunk in r.iter_content/chunk_size=512 * 1024/:
if chunk: # filter out keep-alive new chunks
f.write/chunk/
f.close//
return


出于某种原因,这不起作用。 在将其保存到文件之前,他仍然加载对内存的响应。

UPDATE

如果您需要一个小客户 /Python 2.x /3.x/, 哪个可以从中载大文件 FTP, 你可以找到它
https://github.com/keepitsimple/pyFTPclient
. 它支持多线程 & 参考 /它监控连接/, 并为下载任务配置套接字参数。
已邀请:

三叔

赞同来自:

使用下一个流式传输代码时,内存使用 Python 无论下载文件的大小如何,有限:


def download_file/url/:
local_filename = url.split/'/'/[-1]
# NOTE the stream=True parameter below
with requests.get/url, stream=True/ as r:
r.raise_for_status//
with open/local_filename, 'wb'/ as f:
for chunk in r.iter_content/chunk_size=8192/:
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write/chunk/
return local_filename


请注意,返回的字节数
iter_content

, 不太平等
chunk_size

; 预计这是一种随机数,通常往往是更多,并且预计每次迭代都会有所不同。


https://requests.readthedocs.i ... kflow

https://requests.readthedocs.i ... ntent
未来使用。

窦买办

赞同来自:

如果你使用,它会更容易
https://requests.readthedocs.i ... e.raw

https://docs.python.org/3/libr ... leobj
:


import requests
import shutil

def download_file/url/:
local_filename = url.split/'/'/[-1]
with requests.get/url, stream=True/ as r:
with open/local_filename, 'wb'/ as f:
shutil.copyfileobj/r.raw, f/

return local_filename


在这种情况下,该文件被发送到磁盘而不使用冗余内存,并且代码很简单。

郭文康

赞同来自:

不是我问的 OP, 但是......这取决于滑稽易于做到
urllib

:


from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve/url, dst/


否则,如果要将其保存到临时文件:


from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen/url/ as fsrc, NamedTemporaryFile/delete=False/ as fdst:
copyfileobj/fsrc, fdst/


我看了这个过程。:


watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'


我看到文件的增长,但使用内存仍然存在于级别 17 MB. 我错过了什么?

小明明

赞同来自:

你的作品的大小可能太大,你试图丢弃它 - 也许, 1024 一段时间的字节? /另外,你可以使用
with

清洁语法/


def DownloadFile/url/:
local_filename = url.split/'/'/[-1]
r = requests.get/url/
with open/local_filename, 'wb'/ as f:
for chunk in r.iter_content/chunk_size=1024/:
if chunk: # filter out keep-alive new chunks
f.write/chunk/
return


顺便问一下,你如何确定答案被加载到内存中的内容?

听起来好像 python 不会将数据重置为来自其他文件的文件
https://coderoad.ru/7127075/
SO 你可以试试
f.flush//


os.fsync//

, 强行录制文件并释放;


with open/local_filename, 'wb'/ as f:
for chunk in r.iter_content/chunk_size=1024/:
if chunk: # filter out keep-alive new chunks
f.write/chunk/
f.flush//
os.fsync/f.fileno///

知食

赞同来自:

根据上面的罗马最受欢迎的评论,这是我的实现,
包括机制 "download as" 和 "retries":


def download/url: str, file_path='', attempts=2/:
"""Downloads a URL content into a file /with large file support by streaming/

:param url: URL to download
:param file_path: Local file name to contain the data downloaded
:param attempts: Number of attempts
:return: New file path. Empty string if the download failed
"""
if not file_path:
file_path = os.path.realpath/os.path.basename/url//
logger.info/f'Downloading {url} content to {file_path}'/
url_sections = urlparse/url/
if not url_sections.scheme:
logger.debug/'The given url is missing a scheme. Adding http scheme'/
url = f'http://{url}'
logger.debug/f'New url: {url}'/
for attempt in range/1, attempts+1/:
try:
if attempt > 1:
time.sleep/10/ # 10 seconds wait time between downloads
with requests.get/url, stream=True/ as response:
response.raise_for_status//
with open/file_path, 'wb'/ as out_file:
for chunk in response.iter_content/chunk_size=1024*1024/: # 1MB chunks
out_file.write/chunk/
logger.info/'Download finished successfully'/
return file_path
except Exception as ex:
logger.error/f'Attempt #{attempt} failed with error: {ex}'/
return ''

要回复问题请先登录注册