Here are the ingredients for the question:
- Data: patent data from the EPO-server
- Quantity: ca. 10000 files per year between 1980-2014
- Format: xml
Example: http://ift.tt/1CzZPvl
Project: Based on keywords such as "labor", "efficiency", "automation", etc. I would like to filter out those patents that are related to the automation of a process and will therefore replace labor force (e.g. supermarkets' self-checkout machines).
Aim: The goal is to obtain a share of patents per year (and per country) that are related to automation.
Question: Excuse me, I am new to machine learning but from I understand, the process requires semi-supervised learning techniques. How do I incorporate the keywords mentioned above into the machine learning algorithm (e.g. K-nearest neighbour) in R? Also: do I need to merge all of the xml-files into a data.frame beforehand? I am only interested in the ID-number, application number, and description.
Any help is highly appreciated. A hands-on example would be amazing.
No comments:
Post a Comment