1.loading Data Into Mysql
1.loading Data Into Mysql
mysql -u root -p
Creating tables in mysql and inserting the data into mysql tables
Commit;
For exporting data into HDFS we will first create an user in mysql.
Now let's transfer these tables into HDFS by writing sqoop jobs.
echo -n "myuser">>sqoop_mysql_passwrd
You need to use the option -n. Otherwise, a new line will be created unknowingly and while reading
the password, Sqoop throws an error Access Denied for User.
/*
*/
Above step is not required while performing normal import.
Create database
USE bank;
As this table is an external table, we just need to give the location of the data.
5.Analysis
Decrypting the data for analysis
6.1. Find out the list of users who have at least 2 loan instalments pending.
SELECT decrypt(user_id)
FROM loan_info
WHERE datediff(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'),
decrypt(last_payment_date)) >= 60;
6.2. Find the list of users who have a healthy credit card but outstanding loan account.
Healthy credit card means no outstanding balance.
SELECT decrypt(li.user_id)
FROM loan_info li INNER JOIN credit_card_info cci
ON decrypt(li.user_id) = decrypt(cci.user_id)
WHERE CAST(decrypt(cci.outstanding_balance) AS double) = 0.0
AND datediff(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), decrypt(li.last_payment_date)) >=
30;
6.3. For every share and for every date, find the maximum profit one could have made on the
share. Bear in mind that a share purchase must be before share sell and if share prices fall
throughout the day, maximum possible profit may be negative.
Output
7.Archival
cd /home/acadgild/survey_files
cat *.txt > survey_data
rm *.txt
Output
8.2. Find the details of the survey which received the minimum rating. The condition is that the
survey must have been rated by at least 20 users.
Output
structure_list = []
for each_col in root.findall("column"):
name = each_col.find("name").text
type = each_col.find("type").text
structure_list.append(name + " " + type)
A file with name hive_query.hql and a table will get created in the bank database with name
email_analysis.
cd /home/acadgild/email_files
cat *.txt > email_data
rm *.txt
SELECT id FROM
(
SELECT id, RANK() OVER (ORDER BY datediff(closed_date, opened_date) DESC) AS rank
FROM
(
SELECT id,
MIN(IF(opened="YES",reporting_date,NULL)) AS opened_date,
MIN(IF(closed="YES",reporting_date,NULL)) AS closed_date
FROM email_analysis
GROUP BY id
) inner_1
WHERE opened_date IS NOT NULL AND closed_date IS NOT NULL
) inner_2
WHERE rank = 1;
SELECT id
FROM
(
SELECT id,
MIN(IF(opened="YES",reporting_date,NULL)) AS opened_date,
MIN(IF(closed="YES",reporting_date,NULL)) AS closed_date
FROM email_analysis
GROUP BY id
) inne
WHERE opened_date IS NULL AND closed_date IS NOT NULL;