#returns Tag bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)')) #returns navigableString bsObj.find(text=re.compile('var dtmDataLayer= (.*?)')) #other way; It returns all the text in a document or beneath a tag bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)')).get_text()
[카테고리:] crawling
Setting crawling app on docker
ssh -i <certificate> <address>
docker ps -a #check docker images
docker exec -it <docker image name> /bin/bash
install sudo to install wget to install phantom web driver
apt-get update
apt-get install sudo
install phantom web driver
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo apt-get update | |
sudo apt-get install build-essential chrpath libssl-dev libxft-dev -y | |
sudo apt-get install libfreetype6 libfreetype6-dev -y | |
sudo apt-get install libfontconfig1 libfontconfig1-dev -y | |
cd ~ | |
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64" | |
wget https://github.com/Medium/phantomjs/releases/download/v2.1.1/$PHANTOM_JS.tar.bz2 | |
sudo tar xvjf $PHANTOM_JS.tar.bz2 | |
sudo mv $PHANTOM_JS /usr/local/share | |
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin | |
phantomjs –version |
git status
git fetch
git pull
git diff <file>
git checkout<file>
git branch
git checkout -b feature/crawling origin/feature/crawling
pip3 install beautifulsoup
pip3 install selenium
python3 <python file>.py
to check locale
locale
set encoding to use korean on print()
apt-get clean && apt-get update && apt-get install -y locales
locale-gen ko_KR.UTF-8
django app init
Python manage.py: show commands
Startapp [project name]
Views: api
Models: Library
element click
# element.click() didn't work.
driver.execute_script("arguments[0].click();", element) worked!
scrapy
scrapy crawl article
class ArticleSpider(Spider): name = "article" in spider folder.