2010/SELECT * FROM Internet Using YQL

From Open Source Bridge Wiki
Jump to: navigation, search

Treating the internet and all its sources as a database, YQL seeks to allow developers to explore government, social, api and all other external data in a standardized way. Further allowing developers to manipulate this data and mash different sources together, YQL works to open up the web and all its sources.

Speaker: Jonathan LeBlanc

Return to this session's details

Contributed notes

(Add your notes here!)

Slides: http://www.slideshare.net/jcleblanc/yql-overview

Backend of Yahoo Pipes. Aggregate feeds from various sources. Pull from RSS, Atom or screen scrape and parse with Xpath. Can also access external APIs.

http://developer.yahoo.com/yql/console/ - sandbox for getting started.

Tables map to HTML source, retrieved XML, etc.  Specify a URL and an Xpath in the WHERE clause.

select * from html where url="..." and xpath="..."

HTTP verbs map to SQL CRUD operations.  You can use these for write operations. Specific, predefined APIs also map to tables.  These include community contributed "tables".  You can define your own tables or look at examples of existing tables on github.

Use "desc" command to get info on tables.  Column names map to elements of a given API.

No joins, only subselects.

Submit a query by passing a parameter to a yahoo URL.  Additional parameters control format of returned data, JSONP callback, etc.

Open Data Tables are an XML format for defining a YQL interface to some API.

You can reference an external Open Data Table definition in a YQL query via the "use" command.

YQL Execute - server-side JavaScript component.  Includes native E4X support.  Can augment data, for example converting zip codes to city names.  Also allows embedding JavaScript code in your table definitions and stored YQL procedures.

Conclusion: build applications faster and that run faster.

If you hit rate limits contact the YQL team and they will likely whitelist your IP range.  Limits are on the order of 10,000 requests per hour and 100,000 requests per day.

Honors robots.txt directives.

Table definitions can support OAuth authentication.  They can also support services that require API keys or other signing mechanisms; though some of these may require using YQL Execute.

The future includes support for more data sources, especially government services.

Craigslist once blocked YQL.  Yahoo worked with them to resolve the issue.