I try to parse a website and generate rss from it. I create this scrapy project which folder struction is :
afferin
├── afferin
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ ├── rssEkle.py
│ ├── rsss.py
│ └── spiders
│ ├── __init__.py
│ ├── taraf.py
│── run.py
└── scrapy.cfg
when I run run.py it creates "pyrss2gen.xml" file in main folder with no problem. run.py is:
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from afferin.spiders.taraf import taraf1
from scrapy.utils.project import get_project_settings
from afferin import rsss
import PyRSS2Gen
import csv
spider = taraf1(domain='http://ift.tt/1EjxVcK')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
rsss.rss.write_xml(open("pyrss2gen.xml", "w"))
there is no problem until here. But I cant move this project to gae. I want this xml file at http://ift.tt/15wsQyB.
I try what this page tells :http://ift.tt/1EjxX4r
It creates http://ift.tt/15wsSXf correctly.
I try to modify run.py based on rss.py which the site's example.
from libs.twisted.internet import reactor
from libs.scrapy.crawler import Crawler
from libs.scrapy import log, signals
from afferin.spiders.taraf import taraf1
from libs.scrapy.utils.project import get_project_settings
from afferin import rsss
from libs import PyRSS2Gen
spider = taraf1(domain='http://ift.tt/1EjxVcK')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
##rsss.rss.write_xml(open("pyrss2gen.xml", "w"))
print 'Content-Type: text/xml'
print rsss.rss.to_xml()
but it does nothing. may the problem is about importing modules or I just cant generate xml like this?
I create a "libs" folder which is:
│
├── scrapy
├── twisted
├── __init__.py
├── datetime.py
├──PyRSS2Gen.py
└── locale.py
so the gae folder is
├── afferin
├── libs
├── app.yaml
app.yaml is
application: rssdeneme
version: 1
runtime: python27
api_version: 1
threadsafe: no
handlers:
- url: /rss.xml
script: afferin/run.py
- url: /static
static_dir: static
these are the imports from each file. I can upload the whole project if you want. taraf.py:
from libs.scrapy.selector import HtmlXPathSelector
from libs import PyRSS2Gen
from afferin import rsss
from libs.scrapy.spider import Spider
from libs.scrapy.selector import HtmlXPathSelector
from afferin.items import AfferinItem
from libs.scrapy import signals
from libs.scrapy.xlib.pydispatch import dispatcher
items.py:
from libs.scrapy.item import Item,Field
rssekle.py
from afferin import settings
from libs import PyRSS2Gen
from afferin import rsss
from libs import locale
from libs import datetime
rss.py
from libs import PyRSS2Gen
from libs import datetime
No comments:
Post a Comment