一般公開されているWebサイトの合計サイズを計算できますか？

Question

すべての公開ページをダウンロードするか、ウェブサイトwww.psychocats.netのオフラインデータベースを作成するとします。ダウンロードを開始する前に、最初にWebサイトの合計サイズを計算するにはどうすればよいですか？

mariomaric · Answer

同様のQ＆Aに基づいて- wgetを実行する前にファイルのファイルサイズを取得しますか？ -必要な処理を正確に実行するbashシェルラッパースクリプトを作成しました。 :)

最新のコードリポジトリは、Githubのこちらにあります。

https://github.com/mariomaric/website-size

#!/bin/bash # Info: https://github.com/mariomaric/website-size#readme # Prepare wget logfile log=/tmp/wget-website-size-log # Do the spider magic echo "### Crawling ${!#} website... ###" sleep 2s echo "### This will take some time to finish, please wait. ###" wget \ --recursive --level=inf \ --spider --server-response \ --no-directories \ --output-file="$log" "$@" echo "Finished with crawling!" sleep 1s # Check if prepared logfile is used if [ -f "$log" ]; then # Calculate and print estimated website size echo "Estimated size: $(\ grep -e "Content-Length" "$log" | \ awk '{sum+=$2} END {printf("%.0f", sum / 1024 / 1024)}'\ ) Mb" # Delete wget log file rm "$log" else echo "Unable to calculate estimated size." fi exit

また、この回答は非常に役立ちました：整数を合計するシェルコマンド、1行に1つ？