Required Input
• Parsing code (Python program that defines what data to be extracted from HMTL)
https://docs.scrapy.org/en/latest/intro/tutorial.html
• URL or a list of URLs (You can use a text file as well)
Output/Return Value
• A CSV will be returned. (Preferred)
Headers will be defined in the Parsing code.
• Any string is possible for the user’s purpose.
Advanced Feature
• Parameters
You can pass on “values” to either your Parsing Code or to the URL(s).
Syntax is the standard Python String Named Placeholder rule.
https://riptutorial.com/python/example/13577/named-placeholders
CAUTION
In STU, variables are defined with double curly brackets {{variable.variable}}. The Python String Format Named Placeholders use single curly brackets like {placeholder}. You are able to use both STU variables and Python standard Named Placeholder with this plugin.
How to set parameters.
Text from Image
Sample Spider Code
import sys
import csv
import scrapy
from random import randint
################################################################################
class MySpider(scrapy.Spider):
name = 'finance_yahoo_most_active'
start_urls = START_URLS
custom_settings = {{
}}
header = (
'{symbol}', '{name}', 'price',
'change', 'p_change', 'volume',
'avg_vol_3m', 'market_cap', 'pe_ratio'
)
csv_writer = csv.writer(sys.stdout, lineterminator='\n')
csv_writer.writerow(header)
# --------------------------------------------------------------------------
# noinspection PyMethodOverriding
def parse(self, response):
texts = response.xpath('//*[@id="scr-res-table"]/div[1]/table/tbody/tr//text()').getall()
n_rows = len(texts) // 9
for i in range(n_rows):
row = (
texts[i*9 + 0], texts[i*9 + 1], texts[i*9 + 2],
texts[i*9 + 3], texts[i*9 + 4], texts[i*9 + 5],
texts[i*9 + 6], texts[i*9 + 7], texts[i*9 + 8],
)
row_info = {{
'{symbol}': texts[i*9 + 0],
'{name}': texts[i * 9 + 1],
'price': texts[i * 9 + 2],
'change': texts[i * 9 + 3],
'p_change': texts[i * 9 + 4],
'volume': texts[i * 9 + 5],
'avg_vol_3m': texts[i * 9 + 6],
'market_cap': texts[i * 9 + 7],
'pe_ratio': texts[i * 9 + 8],
}}
self.csv_writer.writerow(row)
yield row_info
- AD LDAP
- Adv Send Email
- Arithmetic Op
- Attach Image
- AWS Textra Rekog
- Bot Collabo
- Chatwork Notification
- Clipboard
- Convert CharSet
- Convert Image
- Create Newfile
- Detect CharSet
- Drag and Drop
- Email IMAP ReadMon
- Email Read Mon
- Env Check
- Env Var
- Excel Advanced
- Excel AdvII
- Excel AdvIII
- Excel Copy Paste
- Excel Formula
- Excel Macro
- Excel Newfile
- Fairy Devices mimi AI
- File Conv
- File Folder Op
- File Status
- Folder Monitor
- Folder Structure
- Google Calendar
- Google Cloud Vision API
- Google Drive
- Google Sheets
- Google Token
- Google Translate
- Google TTS
- Html Extract
- IBM Speech to Text
- IBM Visual Recognition
- JSON Select
- LINE Notify
- MS Azure Text Analytics
- MS Word Extract
- NAVER OCR
- Newuser-SFDC
- PANDAS I
- pandas II
- pandas III
- PANDAS profiling
- Parsehub
- Password Generate
- PDF2Doc
- PDF Miner
- PDF SplitMerge
- PowerShell
- Print 2 Image
- QR Generate
- QR Read
- Regression
- REST API
- Rossum
- Scrapy Basic
- Screen Snipping
- Simple SFDC
- Slack
- Speed Test
- SQL
- SSH Command
- SSH Copy
- String Manipulation
- Telegram
- Tesseract
- Time Stamp
- Web Extract
- Work Calendar
- XML Extract
- Xtracta Get Doc
- Xtracta Tracking
- Xtracta Upload
- ZipUnzip