关于漫画下载

kingoftime3 · 发表于 2009-10-2 19:01:40

今天在起点上看到有阳神的漫画版本，想将其下下来，但是从网页的源代码上看到图片的文件名不是连在一起，也没什么规律。想到下面的办法解决：
先得到起点中漫画的目录页
wget http://www.qidian.com/BookReader/1322325.aspx
从上面的文件中得到其中最关键的一行
cat 1322325.aspx | sed -n '/BookReader/w temp.txt'
再分析出图片文件名(1322325是整个漫画的ID)
cat temp.txt | grep -o "1322325,[0-9]*" | awk -F , '{print $2}' > yangshen.txt
现在就可以下载漫画了
for i in `cat yangshen.txt`;do wget http://image.cmfu.com/books/1322325/$i.jpg;done
下载完后清理工作
rm 1322325.aspx temp.txt yangshen.txt

注：上面在下载时可以在后面加&让其同步下载，但数目我不知道怎么限定~~~当对服务器请求过多时会出问题的，所以哪位高手能改进一下

发现gqview看漫画是个不错的选择：）

script(添加了更新功能，即在不删除原来文件时再次运行只下载网上刚更新的章节)

#/bin/bash
# you can modify bookname for your book here, it will be the folder that include pictures
bookname=yangshen
# you can modify url here
directoryurl=http://www.qidian.com/BookReader/1322325.aspx
if [ ! -d $bookname ];then
mkdir $bookname
fi
cd $bookname
baseurl=`basename $directoryurl`
bookid=${baseurl%.*}
if [ -f $baseurl ];then
echo "baseurl alread exits,remove it"
rm $baseurl
fi
wget $directoryurl 2>/dev/null
if [ $? == 0 ];then
echo "Downloading menulist "$baseurl" [SUCCESS]"
else
echo "Downloading menulist "$baseurl" [FAIL]"
exit 1
fi
cat $baseurl | sed -n '/BookReader/w temp.txt'
if [ -f $bookname ];then
echo "book list file has alread exists, backup it first"
mv $bookname $bookname"_bak"
fi
cat temp.txt | grep -o "$bookid,[0-9]*" | awk -F , '{print $2}' > $bookname
if [ -f $bookname"_bak" ];then
echo "this book has alread download, I will only update data"
if [ `wc -l $bookname | awk '{print $1}'` -gt `wc -l $bookname"_bak" | awk '{print $1}'` ];then
echo "there is new capter catched."
diff -ruN $bookname"_bak" $bookname | grep -o ^+[0-9].* | awk -F + '{print $2}' > temp2.txt
else
echo "there isn't new capter catched"
rm $baseurl temp.txt $bookname"_bak"
exit 0
fi
rm $bookname"_bak"
else
echo "this is the first time to download"
cp $bookname temp2.txt
fi
echo "Downloading comics "$bookname"..."
for i in `cat temp2.txt`;do
wget http://image.cmfu.com/books/$bookid/$i.jpg 2>/dev/null;
if [ $? == 0 ];then
echo $i.jpg" [SUCCESS]"
else
echo $i.jpg" [FAIL]"
fi
done
rm $baseurl temp.txt temp2.txt
echo "Download finish!"

复制代码

kookc · 发表于 2009-10-2 21:37:02

嘿嘿，这个shell脚本很实用啊。

殺 · 发表于 2009-10-3 19:42:53

http://www.qidian.com/BookReader/1322325.aspx
页面中链接的代码:
<A href="1322325,24523011.aspx"> 03 草堂笔记</A>
对应图像地址:
http://image.cmfu.com/books/1322325/24523011.jpg

用文本编辑器或地址栏调试js给页面链接做简单的字符替换就能得到图片文件的url列表了...
用脚本写个循环按页面中图片链接出现的顺序依次命名为1.jpg,2.jpg...这样可以直接用看图软件播放...
或者就保持原来的文件名...页面源码再替换一下指向本地文件就可以当做本地htm导航...自己用htm写个frame...左分栏是导航页...图片显示在右分栏...这样用浏览器就可以阅读...

我下载小说漫画啥的都是这么干得...嘿嘿~~

		自动登录	找回密码
密码			注册

关于漫画下载

浏览过的版块