Posts
-
Scraping pt. 2 - BeautifulSoup vs. Scrapy »
Things learned today:
_______
HN 238
- Authentication cheat sheet
- Some suggestions are probably not best practice, such as enforcing password complexity and generic error messages when username/password is incorrect.
- The HTML5 download attribute seems pretty handy.
- Meteor.js intro, and some nice D3.js charts.
- UIs with interface previews are awesome because users can begin doing stuff that is not reliant on the server. You can take this to different levels: Showing page layout vs. creating placeholder elements/text.
- Cool article on National Geo maps.
_______
- Markdown requires multiple line breaks between bulleted lists and other text.
- Neat tool to help prepare for technical interviews.
- BeautifulSoup: find_all() can be removed and will function the same. Script from today below.
from bs4 import BeautifulSoup import requests url="xxx" second="/yyy/" pagenum=0 while pagenum<100: finalurl=url+second+str(pagenum) pagenum+=10 r = requests.get(finalurl) soup = BeautifulSoup(r.text) content=soup.body for sections in content('section'): if sections.get('id')=="product-index": for link in sections('a','product-thumbnail'): print url+link.get('href')
- Scrapy is another scraping tool, more fully featured than BeautifulSoup.
- Bash aliases need to be defined in ~/.bash_profile, otherwise they are local to the session.
- When using GitHub Pages with Jekyll, it’s easiest to pass an empty string to the –baseurl option if you want to see changes locally. Putting single quotes in a bash alias is a little rough.
- It is easy to be abrupt but better to be understanding.
- Basic pains are still part of the lived experience. We still feel the burn of an arrow if it shoots us in our leg, but mindfully accepting that event just avoids all the extra mental pains we add by going crazy when all we can do is our best.
- Authentication cheat sheet
-
Watches »
Things learned today:
- Not too much to include today. On watches:
- Need a spring bar tool or a small flathead screwdriver to remove a strap.
- End pieces are little metal flaps used for a flush finish. They make it hard to remove a spring bar.
- Never try to undo the middle part of a metal band because you won’t get it back together.
- BeautifulSoup: Easy to use, example script to retrieve URLs below.
from bs4 import BeautifulSoup import requests url = raw_input("Enter a website for extraction: ") r = requests.get("http://" +url) data = r.text soup = BeautifulSoup(data) for link in soup.find_all('a'): print(link.get('href'))
- Not too much to include today. On watches:
-
Quantopian, Jekyll, Boggs, and Body Language »
Things learned today:
- Playing around with Quantopian for next month’s contest.
- Using fetcher for live trading is difficult, and doubly so for the contest. You need a pre-function to adjust the start date on the CSV, otherwise it will only work in either backtests or live trading.
- It is possible to create an algo based on a data feed from Quandl.
- Exploring the possibilities of Jekyll some more.
- Customized the footer, learned about layouts (HTML templates in the _layouts folder).
- Learned how to remove file extensions from a URL (Stick multiple index.html files into folders).
- Based on Josh’s design, created an archive page and added Google Analytics.
- Also followed his advice to shorten blog post URLs (Jekyll docs).
- Modified index.html to display content of posts on the front page because I like it more.
- Found a subreddit on body language. Interesting techniques:
- Using rapport tests such as glancing at watch or touching hair. Mirroring gestures for rapport.
- Diffusing dominance respectfully by circling behind speaker to change the subject.
- Infographic on psychological space, sharing object of attention to build rapport.
- Book recommendation: What Every Body is Saying
- Reading about the life of JSG Boggs, an artist who creates counterfeit bank notes as art.
- A historical and more fraudulent comparison is Emmanuel Ninger.
- Art appraisals are generally not worthwhile unless it is a large value. Bonham’s offers free appraisals. Otherwise look for galleries selling comparable art pieces.
- Enable spell check for markdown in Sublime Text.
- Playing around with Quantopian for next month’s contest.
-
Web Scraping »
Things learned today:
- From minimaxir’s blog post:
- Kimono Labs offers a nice option to make web scraping easier.
- In the case of BuzzFeed, it was unusable because of a 10-page archive limit.
- BeautifulSoup with Mechanize is a more flexible option. (Guide)
- From minimaxir’s blog post:
-
Welcome »
Welcome to my new blog inspired by John. It is a collection of things I learn each day for personal use.
Things learned today:
- How to create a blog with Jekyll and how to host said blog on GitHub Pages.
- Needed to create a new repo, install jekyll there, and create a new branch (gh-pages).
- Will need to both build and commit (to correct branch) with each post.
- Unordered lists need a space after the bullet point in markdown.
- How to use Build System in Sublime Text, configured with Chrome to replace View in Browser.
- From patio11:
- SheetJS is a pretty cool way to make .csv imports easy.
- GenerateData can be used to generate random datasets easily.
- OSS projects should have commercial license fees instead of donations.
- How to create a blog with Jekyll and how to host said blog on GitHub Pages.