Protecting and Licensing Internet Content Databases
Eric Goldman
Marquette University Law School
eric.goldman@marquette.edu
http://eric_goldman.tripod.com
1. Introduction
- The challenge of protecting non-copyrightable data in a digital era
- Information aggregators and scraping, harvesting and extraction
- The age-old build v. buy question; but here building means constructing a way to steal
- For complete protection, clients need to consider law, technology and business models
- Licensing requires foresight
2. Legal Protection—Copyright
- Copyright protects original works of authorship—not facts or ideas
- Some cases find copyright in data that is a product of judgment (e.g., CDN v. Kapes)
- Even if individual items aren’t copyrightable, should be able to protect compilation (selection, arrangement, coordination)
- Ways to “manufacture” copyright protection:
- “Meta” info (classifications/taxonomies)
- Software for formats or transfers
- Copyright mgmt information (17 USC 1202)
3. Legal Protection—Hot News
- Misappropriation of intangible information usually preempted by copyright
- But hot news doctrine:
- Information generated/collected at some expense
- Information is highly time-sensitive
- Defendant free-rides on plaintiff’s efforts
- Defendant’s use directly competes with plaintiff
- Free-riding reduces production incentives so as to substantially threaten production
- Examples: Headlines, scores, weather, prices?
4. Legal Protection—Contract
- Contracts can provide excellent protection (except against after-acquirers)
- Online formation: mandatory non-leaky clickthrough
- Bootscreen process should work
- Other placement can work if notice and call to action done carefully
- Subject to all standard contract defenses
- Incapacities, unconscionable, public policy
5. Legal Protection—Trespass
- Protect information by protecting the servers
- Trespass:
- Use or intermeddling
- Dispossession, impairment, deprivation or harm
- Notification and self-help?
- Computer Fraud & Abuse Act:
- Accessing protected computer without authorization (or in excess of authorization)
- Taking info or causing damage
- Proactive steps: onsite notice, email notice, robot exclusion headers, IP address blocks
6. Non-Legal Protections
- Anti-robot techniques:
- IP address blocks; exclusion headers
- Dynamically-created pages
- Password protection
- Monitor data served; limit amount served to any one user
- Encryption envelopes
- Provide custom interface rather than licensing entire database
- Sell freshness/currency
- Sell organizing info/implementation ease
7. License Grants
- What IPs are being licensed?
- Copyright
- Software, entire database, taxonomy?, teaser portions?, individual items?
- The challenge of weak collection practices
- Trade secret
- Software, proprietary codes, usually NOT the entire database or individual items
- Trademark
- Logos
- Copyright
- Redistribution, co-branding, framing and content serving
- “Derivative works” (edits, summaries, abridgements and commingling)
- Display rules
- Post-termination rights
- Replacement data
- Replacement taxonomy
- Compliance enforcement?
8. Other Licensing Issues
- Transfer protocols and service levels
- Data dump (electronic or physical), on-demand calls or joint page serving; sales tax implications
- Data refreshing/caching
- Anti-scraping obligations
- Pass-throughs to end user
- Contract restrictions against extraction
- Liability disclaimers
- Indemnity
- Being the cheese in a sandwich
- 47 USC 230