The amount of natural language text that is available is increasing day by day.The Complexity of Natural Language makes it difficult to access information in text. So, guys dive in to Text Processing.
Named Entity Recognition is one of the important sub-task of Text Processing to classify elements in text into pre-defined categories such as the names of persons, organizations, locations etc. Here I will share a code snippet for Entity Extraction using TextRazor API in Python.
Pre-requisities:
a. Python 2.7
b. Get a free API key from https://www.textrazor.com
c. Install Python SDK for TextRazor. Download zip and run setup.py from https: //github.com/TextRazor/textrazor-python
Named Entity Recognition -ner.py
123456789101112131415161718192021
fromtextrazorimportTextRazor#Text Razor API Detailsclient=TextRazor("<Text Razor API KEY>",extractors=["entities"],do_encryption=True)client.set_do_cleanup_HTML(True)#Resolving Entities via two dictionariesclient.set_entity_dbpedia_type_filters(['Person','Company','Place','Organisation'])client.set_entity_freebase_type_filters(['/people/person','/location/location','/government/politician','/book/periodical','/business/job_title'])#Sort Entitiesresponse=client.analyze_url(<Type-inURLheretobeanalyzed>)entities=list(response.entities())sorted_entities=sorted(response.entities(),key=lambdaentity:entity.starting_position)seen=set()forentityinentities:ifentity.idnotinseen:printentity.id,entity.freebase_types,entity.dbpedia_types,entity.relevance_score,entity.confidence_scoreseen.add(entity.id)
In order to validate a field, you just associate a set of validation descriptors for each input field in the form.
Each field in the form can have zero one or more validations. For example, you can have an input field that should not be empty, should be less than 25 chars and should be alpha-numeric.
Steps to embed the form validation script:–
1.Include gen_validatorv4.js in your html file just before closing the HEAD tag
2.Just after defining your form, create a Validator() object passing the name of the form
12345
<form id='myform' action="">
<!----Your input fields go here -->
</form>
<script type="text/javascript">
var frmvalidator = new Validator("myform"); //where myform is the name/id of your form
3.Now, add the validations required. The format of the addValidation() function is:
For e.g , Adding validation to FirstNanme as a reuired field and limiting maximum length to 40.
12
frmvalidator.addValidation("FirstName","req","Please enter your First Name");
frmvalidator.addValidation("FirstName","maxlen=40","Max length for FirstName is 40");
<?php
include 'connect.php'; # Include your Database connection script
$per_page = 10; # No. of rows to be displyed per page
$pages_query= mysql_query("SELECT COUNT(*) FROM Table_Name"); # Count the total number of rows In table
$pages=ceil(mysql_result($pages_query,0)/$per_page);
$page = (isset($_GET['page'])) ? (int)$_GET['page'] : 1; #to set default page as 1
$start = ($page - 1) * $per_page; # Resolving for next rows to be displayed per page
$query = mysql_query("SELECT Column_Names from Table_name LIMIT $start, $per_page"); # limit the number of rows to be displyed per-page
while ($row = mysql_fetch_assoc($query)) {
echo "$row['column_name']; # echo values
echo "<br>";
}
if($pages>=1 && $page<=$pages){
for($x=1; $x<=$pages ;$x++){
echo ($x == $page) ? '<strong><a href="?page='.$x.'">'.$x.'</a> </strong>' : '<a href="?page='.$x.'">'.$x.'</a> '; # Visualize current link strong
}
}
?>
import xlrd
import MySQLdb
import re
# Open the workbook and define the worksheet
book= xlrd.open_workbook("Worksheet_name.xls")
sheet = book.sheet_by_name("Sheet_Name")
# Establish a MySQL connection
database = MySQLdb.connect (host="localhost", user = User_name"", passwd = "Password", db = "Database_name")
# Get the cursor, which is used to traverse the database, line by line
cursor = database.cursor()
# Create the INSERT INTO sql query
query = """INSERT INTO Table_name VALUES (%s,%s)"""
# Create a For loop to iterate through each row in the XLS file,
for r in range(1, sheet.nrows):
Column_name1=re.sub("\d+","",sheet.cell(r,0).value).strip(")") #removes numeric charcters and special charcters ; "()" here
Column_name1=Column_name1[:-1]
Column_name2=re.sub("\d+","",sheet.cell(r,1).value).strip(")")
Column_name2=Column_name2[:-1]
# Assign values from each row
values =(Column_name1,Column_name2)
# Execute sql Query
try:
cursor.execute(query,values)
except:
pass
# Close the cursor
cursor.close()
# Commit the transaction
database.commit()
# Close the database connection
database.close()
# Print results
print ""
print "All Done! Bye, for now."
print ""
Here is an Amazing Anecdote, to understand few Qualities :
During a robbery in Guangzhou, China, the bank robber shouted to everyone in the bank: “Don’t move. The money belongs to the State. Your life belongs to you.”
Everyone in the bank laid down quietly. This is called “Mind Changing Concept” Changing the conventional way of thinking
When a lady lay on the table provocatively, the robber shouted at her: “Please be civilized! This is a robbery and not a rape!”
This is called “Being Professional” Focus only on what you are trained to do!
When the bank robbers returned home, the younger robber (MBA-trained) told the older robber (who has only completed Year 6 in primary school): “Big brother, let’s count how much we got.”
The older robber rebutted and said: “You are very stupid. There is so much money it will take us a long time to count. Tonight, the TV news will tell us how much we robbed from the bank!”
This is called “Experience.” Nowadays, experience is more important than paper qualifications!
After the robbers had left, the bank manager told the bank supervisor to call the police quickly. But the supervisor said to him: “Wait! Let us take out $10 million from the bank for ourselves and add it to the $70 million
that we have previously embezzled from the bank”.
This is called “Swim with the tide.” Converting an unfavorable situation to your advantage!
The supervisor says: “It will be good if there is a robbery every month.”
This is called “Killing Boredom.” Personal Happiness is more important than your job.
The next day, the TV news reported that $100 million was taken from the bank. The robbers counted and counted and counted, but they could only count $20 million. The robbers were very angry and complained: “We risked our lives and only took $20 million. The bank manager took $80 million with a snap of his fingers. It looks like it is better to be educated than to be a thief!”
This is called “Knowledge is worth as much as gold!”
The bank manager was smiling and happy because his losses in the share market are now covered by this robbery.
This is called “Seizing the opportunity.” Daring to take risks!!
Python Script to Download PDF files Using gspread library retrieving data from spreadsheet and requests for downloading it as a zip file. First you need to install python packages “gspread” and “requests”.
12345678910111213
import requests
import gspread
gc = gspread.login('Username@gmail.com','Password')
spreadsheet = gc.open_by_url("Url of Spreadsheet")
worksheet = spreadsheet.sheet1
cell = worksheet.cell(2, 2)
value = cell.value
print(value)
print "downloading with requests"
r = requests.get(value)
with open("code.zip", "wb") as code:
code.write(r.content)
Python Script to Download PDF files Using gspread library retrieving data from spreadsheet and requests for downloading it as a zip file.
First you need to install python packages “gspread” and “requests”.
123456789101112
import requests
import gspread
gc = gspread.login('Username@gmail.com','Password')
spreadsheet = gc.open_by_url("URL of Spreadsheet")
worksheet = spreadsheet.sheet1
cell = worksheet.cell(2, 2)
value = cell.values
print(value)
print "downloading with requests"
r = requests.get(value)
with open("code.zip", "wb") as code:
code.write(r.content)
You should be comfortable running shell commands and familiar with the basics of GIT.
GIT is a distributed Version Control System(VCS). This allows non-linear development of projects and can handle large amount of data effectively by storing it on Local Server.
Step 1: Install Git
1
$ sudo apt-get install git
sudo –> for administrative previlages.
apt-get-> Package Installer, puts stuff from repositries and install them.
Now it’s time to configure your settings. To do this you need to open an app called Terminal.
USERNAME :–
First you need to tell git your name, so that it can properly label the commits you make.
12
$ git config --global user.name "Your Name Here"
# Sets the default name for git to use when you commit
E-Mail :–
Git saves your email address into the commits you make. We use the email address to associate your commits with your GitHub account.
12
$ git config --global user.email "your_email@example.com"
# Sets the default email for git to use when you commit
To see all settings :–
1
$ git config --list
Step 2: Install RVM
RVM is Ruby Version Manager.It allows us to install and manage different versions and implementations of ruby on one computer including ability to manage different sets of Ruby gems on each.Thus, allowing to test our application with different versions of ruby setups.
Run the following command from your terminal.Be sure to follow any subsequent instructions as guided by the installation process.
The above command will install both RVM and the latest version of Ruby.
curl is a client to get document/file or send to server using any of supported protocols.
If the above command produces an error, run following command from ypur terminal to achieve the same.
1
$ wget --no-check-certificate https://raw.github.com/joshfng/railsready/master/railsready.sh && bash railsready.sh # You, will be asked to choose option (Enter your choice as 1)
Gems contain package information along with files to install RubyGems.It is a package manager which became part of the standard library in Ruby 1.9. It allows developers to search,install and build gems, among other features. All of this is done by using the gem command-line utility.
Run ruby – -version to be sure you’re using Ruby 1.9.3. If you’re having trouble, run the following command on your terminal to set your default version of ruby-1.9.3
1
$ rvm --default use ruby-1.9.3-p429
If ruby —version doesn’t say you’re using Ruby 1.9.3, revisit your RVM installation.
The above command replaces octopress with username.github.com.
Next we need to change current working directory to octopress.
12
$ cd octopress # You'll be asked if you trust the .rvmrc file (say yes).
ruby --version # Should report Ruby 1.9.3
Next, install dependencies.
Using Bundler to manage your gem’s dependencies is also pretty easy.Bundler is a program for managing gem dependencies in your Ruby projects. With Bundler you can specify which gems your program needs.
1
$ gem install bundler
This should be the only gem you need to install yourself should all your programs’ dependencies be managed by Bundler.
Now,Install the dependencies specified in your Gemfile
1
$ bundle install
Install the default Octopress theme.
1
$rake install
Rake is a Make-like program implemented in Ruby. Tasks and dependencies are specified in standard Ruby syntax. A rake file is a collection of tasks, when you call rake with an argument (in this case install) that’s the task that get’s executed.
Step 5: Deploying with Github User/Organization Pages
Creating Github User/Organization Pages
Create a new Github repository and name the repository with your username.github.com or organization.github.com.
Switch over to new repository created in previous step. Click on Settings Tab , scroll down to Github pages and click on “Automatic Page generator” button.
Author your content in the Markdown editor.
Click the Continue To Layouts button.
Preview your content in our themes.
When you find a theme that you like, click Publish
Generating SSH – keys
We can use SSH keys to establish a secure connection between your computer and GitHub.
To generate a new SSH key, enter the code below. We want the default settings so when asked to enter a file in which to save the key, just press enter
1234
$ssh-keygen -t rsa -C "your_email@example.com"
# Creates a new ssh key, using the provided email as a label
# Generating public/private rsa key pair.
# Enter file in which to save the key (/home/you/.ssh/id_rsa):
Now you need to enter a passphrase.
12
Enter passphrase (empty for no passphrase): [Type a passphrase]
# Enter same passphrase again: [Type passphrase again]
1234
Your identification has been saved in /home/you/.ssh/id_rsa.
# Your public key has been saved in /home/you/.ssh/id_rsa.pub.
# The key fingerprint is:
# 01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db your_email@example.com
Add your SSH -key to GitHub
Go to your Account Settings.
Click “SSH Keys” in the left sidebar.
Click “Add SSH key”
Paste your key into the “Key” field.
Click “Add key”.
Confirm the action by entering your GitHub password.
To make sure everything is working you’ll now SSH to GitHub. When you do this, you will be asked to authenticate this action using your password, which for this purpose is the passphrase you created earlier.
12
$ssh -T git@github.com
# Attempts to ssh to github
You may see this warning:
123
# The authenticity of host 'github.com (207.97.227.239)' can't be established.
# RSA key fingerprint is 16:27:ac:a5:76:28:2d:36:63:1b:56:4d:eb:df:a6:48.
# Are you sure you want to continue connecting (yes/no)?
Don’t worry, this is supposed to happen. Verify that the fingerprint matches the one here and type “yes”.
12
# Hi username! You've successfully authenticated, but GitHub does not
# provide shell access.
If that username is correct, you’ve successfully set up your SSH key. Don’t worry about the shell access thing, you don’t want that anyway.
If Not correct ,To Add Authenticity and again check Authenticity.
1
ssh-add
Github Pages for users and organizations uses the master branch like the public directory on a web server, serving up the files at your Pages . As a result, you’ll want to work on the source for your blog in the source branch and commit the generated content to the master branch. Octopress has a configuration task that helps you set all this up.
1
$rake setup_github_pages
This will Ask you for your Github Pages repository url.
Next run:
This will generate your blog, copy the generated files into _deploy/, add them to git, commit and push them up to the master branch. In a few seconds you should get an email from Github telling you that your commit has been received and will be published on your site.