#returns Tag bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)')) #returns navigableString bsObj.find(text=re.compile('var dtmDataLayer= (.*?)')) #other way; It returns all the text in a document or beneath a tag bsObj.find("script", text=re.compile('var dtmDataLayer= (.*?)')).get_text()
[카테고리:] internet
Setting crawling app on docker
ssh -i <certificate> <address>
docker ps -a #check docker images
docker exec -it <docker image name> /bin/bash
install sudo to install wget to install phantom web driver
apt-get update
apt-get install sudo
install phantom web driver
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo apt-get update | |
sudo apt-get install build-essential chrpath libssl-dev libxft-dev -y | |
sudo apt-get install libfreetype6 libfreetype6-dev -y | |
sudo apt-get install libfontconfig1 libfontconfig1-dev -y | |
cd ~ | |
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64" | |
wget https://github.com/Medium/phantomjs/releases/download/v2.1.1/$PHANTOM_JS.tar.bz2 | |
sudo tar xvjf $PHANTOM_JS.tar.bz2 | |
sudo mv $PHANTOM_JS /usr/local/share | |
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin | |
phantomjs –version |
git status
git fetch
git pull
git diff <file>
git checkout<file>
git branch
git checkout -b feature/crawling origin/feature/crawling
pip3 install beautifulsoup
pip3 install selenium
python3 <python file>.py
to check locale
locale
set encoding to use korean on print()
apt-get clean && apt-get update && apt-get install -y locales
locale-gen ko_KR.UTF-8
django app init
Python manage.py: show commands
Startapp [project name]
Views: api
Models: Library
element click
# element.click() didn't work.
driver.execute_script("arguments[0].click();", element) worked!
scrapy
scrapy crawl article
class ArticleSpider(Spider): name = "article" in spider folder.
chrome xpath
in developer console,
type
$x(“//a”)
HTTP
메서드
GET: 브라우저의 주소 표시줄
웹 서버에 정보를 요청할 때
POST: 폼 작성 create
서버에 있는 스크립트에 정보를 보낼 때
ex. 로그인
API에 POST 요청-> 그 정보를 데이터베이스에 저장하라
PUT: 객체나 정보 update(POST를 쓸 수도 있음- API마다 다름)
DELETE
인증
토큰
URL
요청 헤더에서 쿠키를 통해 headers={“token”:token}
webRequest = url lib.request.Request(“http://myapi.com”, headers={“token”:token})
html = urlopen(webRequest)
응답
XML Extensible Markup Language
JSON JavaScript Object Notation
Router Mac Address
Router: 공유기.
패킷의 위치를 추출하여
그 위치에 대한 최상의 경로를 지정하여
이 경로를 따라 데이터 패킷을 다음 장치로 전향시키는 장치
*패킷: 정보 기술에서 패킷 방식의 컴퓨터 네트워크가 전달하는 데이터의 형식화된 블록
*MAC is acronym for for Media Access Control address. It is a unique identifier attached to almost most all networking equipment such as Routers, Ethernet cards and other devices.