bsObj.find(“script”, text=) vs. bsObj.find(text=)

#returns Tag
bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)'))

#returns navigableString
bsObj.find(text=re.compile('var dtmDataLayer= (.*?)'))

#other way; It returns all the text in a document or beneath a tag
bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)')).get_text()

Setting crawling app on docker

ssh -i <certificate> <address>

docker ps -a #check docker images

docker exec -it <docker image name> /bin/bash

install sudo to install wget to install phantom web driver

apt-get update

apt-get install sudo

 

install phantom web driver


sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev -y
sudo apt-get install libfreetype6 libfreetype6-dev -y
sudo apt-get install libfontconfig1 libfontconfig1-dev -y
cd ~
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64"
wget https://github.com/Medium/phantomjs/releases/download/v2.1.1/$PHANTOM_JS.tar.bz2
sudo tar xvjf $PHANTOM_JS.tar.bz2
sudo mv $PHANTOM_JS /usr/local/share
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin
phantomjs –version

git status

git fetch

git pull

git diff <file>

git checkout<file>

git branch

git checkout -b feature/crawling origin/feature/crawling

pip3 install beautifulsoup

pip3 install selenium

python3 <python file>.py

to check locale

locale

 

set encoding to use korean on print()

apt-get clean && apt-get update && apt-get install -y locales

locale-gen ko_KR.UTF-8