0% found this document useful (0 votes)

42 views13 pages

Prakhar-Software Engineer Intern

The document outlines the enhancement of Wrangler with new parsers for byte size and time duration units, detailing the tasks and changes made to the ANTLR4 grammar file. It includes a comprehensive list of directives for data transformation, parsing, and output formatting, along with links to documentation and demo videos. The GitHub repository is provided for further reference and collaboration.

Uploaded by

prakhar singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views13 pages

Prakhar-Software Engineer Intern

Uploaded by

prakhar singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Assignment: Enhance Wrangler with Byte Size and Time

Duration Units Parsers

AI Prompts
1. "How to implement custom token parsing in ANTLR4?"

2. "Best practices for implementing aggregate directives in CDAP Wrangler"

3. "Java code to convert between different byte size units"

4. "Example of time duration parsing in Java"

5. “How to test ANTLR grammar changes in a Maven project?”
6. “Fix the errors in the terminal”

GitHub Repository:- https://github.com/prakharrrrrrsingh/Wrangler-ps

Tasks:-
This is an ANTLR4 grammar file that defines the syntax for a directive-based language. The
grammar includes:

1. Parser rules for:

 Recipe structure with statements

 Directives with various parameter types

 Control structures (if-else statements)

 Expressions and code blocks

 Macros and pragmas

 Property lists and value definitions

2. Lexer rules for:

 Basic tokens (braces, operators, punctuation)

 Data types (Bool, Number, String)

 Identifiers and columns

 Comments and whitespace

 Special values for byte sizes and time durations

The grammar is designed for a domain-specific language that appears to be used for
data transformation directives, with support for conditional logic, property configurations, and
various data manipulations.

Changes made according to the assignment

Readme.md-
# Data Prep

![cm-available](https://cdap-users.herokuapp.com/assets/cm-available.svg)
![cdap-transform](https://cdap-users.herokuapp.com/assets/cdap-transform.svg)
[![Build Status](https://travis-
ci.org/cdapio/hydratorplugins.svg?branch=develop)](https://travis-
ci.org/cdapio/hydrator-plugins)
[![Coverity Scan Build
Status](https://scan.coverity.com/projects/11434/badge.svg)](https://scan.cove
rity.com/projects/hydrator-wrangler-transform)
[![Maven Central](https://maven-
badges.herokuapp.com/mavencentral/io.cdap.wrangler/wrangler-
core/badge.svg)](https://mavenbadges.herokuapp.com/maven-
central/io.cdap.wrangler/wrangler-core)
[![Javadoc](https://javadoc-
emblem.rhcloud.com/doc/io.cdap.wrangler/wranglercore/badge.svg)](http://www.javadoc.
io/doc/io.cdap.wrangler/wrangler-core)
[![License](https://img.shields.io/badge/License-
Apache%202.0blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Join CDAP
community](https://cdapusers.herokuapp.com/badge.svg?t=wrangler)](https://cdapusers.
herokuapp.com?t=1)

A collection of libraries, a pipeline plugin, and a CDAP service for performing

data cleansing, transformation, and filtering using a set of data manipulation
instructions
(directives). These instructions are either generated using an interative
visual tool or are manually created.

* Data Prep defines few concepts that might be useful if you are just getting
started with it. Learn about them [here](wrangler-docs/concepts.md)
* The Data Prep Transform is [separately
documented](wranglertransform/wrangler-docs/data-prep-transform.md).
* [Data Prep Cheatsheet](wrangler-docs/cheatsheet.md)
## New Features
More [here](wrangler-docs/upcoming-features.md) on upcoming features.
* **User Defined Directives, also known as UDD**, allow you to create
custom functions to transform records within CDAP DataPrep or a.k.a Wrangler.
CDAP comes with a comprehensive library of functions. There are however some
omissions, and some specific cases for which UDDs are the solution. Additional
information on how you can build your custom directives
[here](wranglerdocs/custom-directive.md).
* Migrating directives from version 1.0 to version 2.0
[here](wranglerdocs/directive-migration.md)
* Information about Grammar [here](wrangler-docs/grammar/grammar-info.md)
* Various `TokenType` supported by system
[here](../api/src/main/java/io/cdap/wrangler/api/parser/TokenType.java)
* Custom Directive Implementation Internals [here](wrangler-
docs/uddinternal.md)

* A new capability that allows CDAP Administrators to **restrict the

directives** that are accessible to their users.
More information on configuring can be found [here](wrangler-
docs/exclusionand-aliasing.md)

## Demo Videos and Recipes

Videos and Screencasts are best way to learn, so we have compiled simple,
short screencasts that shows some of the features of Data Prep. Additional
videos can be found [here](https://www.youtube.com/playlist?list=PLhmsfNvXKJn-
neqefOrcl4n7zU4TWmIr)

### Videos

* [SCREENCAST] [Creating Lookup Dataset and

Joining](https://www.youtube.com/watch?v=Nc1b0rsELHQ)
* [SCREENCAST] [Restricted
Directives](https://www.youtube.com/watch?v=71EcMQU714U)
* [SCREENCAST] [Parse Excel files in
CDAP](https://www.youtube.com/watch?v=su5L1noGlEk)
* [SCREENCAST] [Parse File As AVRO
File](https://www.youtube.com/watch?v=tmwAw4dKUNc)
* [SCREENCAST] [Parsing Binary Coded AVRO
Messages](https://www.youtube.com/watch?v=Ix_lPo-PDJY)
* [SCREENCAST] [Parsing Binary Coded AVRO Messages & Protobuf messages
using schema registry](https://www.youtube.com/watch?v=LVLIdWnUX1k)
* [SCREENCAST] [Quantize a column -
Digitize](https://www.youtube.com/watch?v=VczkYX5SRtY)
* [SCREENCAST] [Data Cleansing capability with send-to-error
directive](https://www.youtube.com/watch?v=aZd5H8hIjDc) * [SCREENCAST]
[Building Data Prep from the GitHub source](https://youtu.be/pGGjKU04Y38)
* [VOICE-OVER] [End-to-End Demo Video](https://youtu.be/AnhF0qRmn24)
* [SCREENCAST] [Ingesting into
Kudu](https://www.youtube.com/watch?v=KBW7a38vlUM)
* [SCREENCAST] [Realtime HL7 CCDA XML from Kafka into Time Parititioned
Parquet](https://youtu.be/0fqNmnOnD-0)
* [SCREENCAST] [Parsing JSON file](https://youtu.be/vwnctcGDflE)
* [SCREENCAST] [Flattening arrays](https://youtu.be/SemHxgBYIsY)
* [SCREENCAST] [Data cleansing with send-to-error
directive](https://www.youtube.com/watch?v=aZd5H8hIjDc)
* [SCREENCAST] [Publishing to
Kafka](https://www.youtube.com/watch?v=xdc8pvvlI48)
* [SCREENCAST] [Fixed length to
JSON](https://www.youtube.com/watch?v=3AXu4m1swuM)

### Recipes

* [Parsing Apache Log Files](wrangler-demos/parsing-apache-log-files.md)

* [Parsing CSV Files and Extracting Column Values](wrangler-demos/parsingcsv-
extracting-column-values.md)
* [Parsing HL7 CCDA XML Files](wrangler-demos/parsing-hl7-ccda-xml-
files.md)

## Available Directives

These directives are currently available:

| Directive |
Description |
| |
|
|
**Parsers** |
|
| [JSON Path](wrangler-docs/directives/json-
path.md) | Uses a DSL (a JSON path expression)
for parsing JSON records |
| [Parse as AVRO](wrangler-docs/directives/parse-asavro.md)
| Parsing an AVRO encoded message - either as binary or json |
| [Parse as AVRO File](wrangler-docs/directives/parse-as-
avrofile.md) | Parsing an AVRO data file
| | [Parse as CSV](wrangler-docs/directives/parse-as-
csv.md) | Parsing an input record as comma-separated
values |
| [Parse as Date](wrangler-docs/directives/parse-asdate.md)
| Parsing dates using natural language processing
|
| [Parse as Excel](wrangler-docs/directives/parse-asexcel.md)
| Parsing excel
file. |
| [Parse as Fixed Length](wrangler-docs/directives/parse-as-
fixedlength.md) | Parses as a fixed length record with specified
widths |
| [Parse as HL7](wrangler-docs/directives/parse-ashl7.md)
| Parsing Health Level 7 Version 2 (HL7 V2) messages |
| [Parse as JSON](wrangler-docs/directives/parse-asjson.md)
| Parsing a JSON object | |
[Parse as Log](wrangler-docs/directives/parse-aslog.md)
| Parses access log files as from Apache HTTPD and nginx servers |
| [Parse as Protobuf](wrangler-docs/directives/parse-aslog.md)
| Parses an Protobuf encoded in-memory message using descriptor |
| [Parse as Simple Date](wrangler-docs/directives/parse-as-
simpledate.md) | Parses date strings
| | [Parse XML To JSON](wrangler-docs/directives/parse-xml-
tojson.md) | Parses an XML document into a JSON
structure |
| [Parse as Currency](wrangler-docs/directives/parse-ascurrency.md)
| Parses a string representation of currency into a number. |
| [Parse as Datetime](wrangler-docs/directives/parse-asdatetime.md)
| Parses strings with datetime values to CDAP datetime type |
| **Output
Formatters** |
| |
[Write as CSV](wrangler-docs/directives/write-ascsv.md)
| Converts a record into CSV format
|
| [Write as JSON](wrangler-docs/directives/write-as-
jsonmap.md) | Converts the record into a
JSON map |
| [Write JSON Object](wrangler-docs/directives/write-as-jsonobject.md)
| Composes a JSON object based on the fields specified. |
| [Format as Currency](wrangler-docs/directives/format-ascurrency.md)
| Formats a number as currency as specified by locale. |
|
**Transformations** |
|
| [Changing Case](wrangler-docs/directives/changingcase.md) |
Changes the case of column values | | [Cut
Character](wrangler-docs/directives/cutcharacter.md) |
Selects parts of a string value | | [Set
Column](wrangler-docs/directives/setcolumn.md) |
Sets the column value to the result of an expression execution |
| [Find and Replace](wrangler-docs/directives/find-andreplace.md)
| Transforms string column values using a "sed"like expression |
| [Index Split](wrangler-docs/directives/indexsplit.md)
|
(_Deprecated_) |
| [Invoke HTTP](wrangler-docs/directives/invokehttp.md)
| Invokes an HTTP Service (_Experimental_, potentially slow) | |
[Quantization](wranglerdocs/directives/quantize.md)
| Quantizes a column based on specified ranges |
| [Regex Group Extractor](wrangler-docs/directives/extract-regexgroups.md)
| Extracts the data from a regex group into its own column |
| [Setting Character Set](wrangler-docs/directives/set-
charset.md) | Sets the encoding and then converts the data to a
UTF-8 String |
| [Setting Record Delimiter](wrangler-docs/directives/set-
recorddelim.md) | Sets the record delimiter
| | [Split by Separator](wrangler-docs/directives/split-byseparator.md)
| Splits a column based on a separator into two columns |
| [Split Email Address](wrangler-docs/directives/split-
email.md) | Splits an email ID into an account and its
domain |
| [Split URL](wrangler-docs/directives/spliturl.md)
| Splits a URL into its constituents
|
| [Text Distance (Fuzzy String Match)](wrangler-
docs/directives/textdistance.md) | Measures the difference between two
sequences of characters |
| [Text Metric (Fuzzy String Match)](wrangler-
docs/directives/textmetric.md) | Measures the difference between two
sequences of characters |
| [URL Decode](wrangler-docs/directives/urldecode.md)
| Decodes from the `application/x-wwwform-urlencoded` MIME format |
| [URL Encode](wrangler-docs/directives/urlencode.md)
| Encodes to the `application/x-wwwform-urlencoded` MIME format |
| [Trim](wranglerdocs/directives/trim.md)
| Functions for trimming white spaces around string data | |
**Encoders and
Decoders** |
|
| [Decode](wranglerdocs/directives/decode.md)
| Decodes a column value as one of `base32`, `base64`, or `hex` |
| [Encode](wranglerdocs/directives/encode.md)
| Encodes a column value as one of `base32`, `base64`, or `hex` |
| **Unique
ID** |
| | [UUID
Generation](wrangler-docs/directives/generateuuid.md) |
Generates a universally unique identifier (UUID) .Recommended to use with
Wrangler version 4.4.0 and above due to an important bug fix [CDAP-
17732](https://cdap.atlassian.net/browse/CDAP-
17732) |
| **Date
Transformations** |
|
| [Diff Date](wrangler-docs/directives/diffdate.md)
| Calculates the difference between two dates |
| [Format Date](wrangler-docs/directives/formatdate.md)
| Custom patterns for date-time formatting
|
| [Format Unix Timestamp](wrangler-docs/directives/format-
unixtimestamp.md) | Formats a UNIX timestamp as a date
|
| **DateTime
Transformations** |
| | [Current
DateTime](wrangler-docs/directives/currentdatetime.md) |
Generates the current datetime using the given zone or UTC by default|
| [Datetime To Timestamp](wrangler-docs/directives/datetime-totimestamp.md)
| Converts a datetime value to timestamp with the given zone |
| [Format Datetime](wrangler-docs/directives/formatdatetime.md)
| Formats a datetime value to custom date time pattern strings |
| [Timestamp To Datetime](wrangler-docs/directives/timestamp-
todatetime.md) | Converts a timestamp value to datetime
|
|
**Lookups** |
|
| [Catalog Lookup](wrangler-docs/directives/catalog-
lookup.md) | Static catalog lookup of ICD-9, ICD-10-2016,
ICD-10-2017 codes |
| [Table Lookup](wrangler-docs/directives/tablelookup.md)
| Performs lookups into Table datasets
|
| **Hashing &
Masking** |
|
| [Message Digest or Hash](wranglerdocs/directives/hash.md)
| Generates a message digest | |
[Mask Number](wrangler-docs/directives/masknumber.md)
| Applies substitution masking on the column values |
| [Mask Shuffle](wrangler-docs/directives/maskshuffle.md)
| Applies shuffle masking on the column values |
| **Row
Operations** |
|
| [Filter Row if Matched](wrangler-docs/directives/filter-row-
ifmatched.md) | Filters rows that match a pattern for a
column |
| [Filter Row if True](wrangler-docs/directives/filter-row-
iftrue.md) | Filters rows if the condition is
true. |
| [Filter Row Empty of Null](wrangler-docs/directives/filter-empty-ornull.md)
| Filters rows that are empty of null. |
| [Flatten](wranglerdocs/directives/flatten.md)
| Separates the elements in a repeated field |
| [Fail on condition](wranglerdocs/directives/fail.md)
| Fails processing when the condition is evaluated to true. |
| [Send to Error](wrangler-docs/directives/send-toerror.md)
| Filtering of records to an error collector
|
| [Send to Error And Continue](wrangler-docs/directives/send-to-error-
andcontinue.md) | Filtering of records to an error collector and continues
processing |
| [Split to Rows](wrangler-docs/directives/split-torows.md)
| Splits based on a separator into multiple records |
| **Column
Operations** |
|
| [Change Column Case](wrangler-docs/directives/change-columncase.md)
| Changes column names to either lowercase or uppercase |
| [Changing Case](wrangler-docs/directives/changingcase.md)
| Change the case of column values
|
| [Cleanse Column Names](wrangler-docs/directives/cleanse-
columnnames.md) | Sanatizes column names, following
specific rules |
| [Columns Replace](wrangler-
docs/directives/columnsreplace.md) |
Alters column names in bulk
|
| [Copy](wranglerdocs/directives/copy.md)
| Copies values from a source column into a destination column |
| [Drop Column](wranglerdocs/directives/drop.md)
| Drops a column in a record | | [Fill
Null or Empty Columns](wrangler-docs/directives/fill-null-orempty.md) |
Fills column value with a fixed value if null or empty |
| [Keep Columns](wranglerdocs/directives/keep.md)
| Keeps specified columns from the record |
| [Merge Columns](wranglerdocs/directives/merge.md)
| Merges two columns by inserting a third column |
| [Rename Column](wrangler-
docs/directives/rename.md) | Renames an existing
column in the record | | [Set Column
Header](wrangler-docs/directives/setheaders.md) | Sets
the names of columns, in the order they are specified |
| [Split to Columns](wrangler-docs/directives/split-to-
columns.md) | Splits a column based on a separator into
multiple columns | | [Swap Columns](wranglerdocs/directives/swap.md)
| Swaps column names of two columns | | [Set
Column Data Type](wrangler-docs/directives/settype.md) |
Convert data type of a column |
Integration Assignment: Bidirectional ClickHouse & Flat File Data
Ingestion Tool

GitHub Repository: - https://github.com/prakharrrrrrsingh/BidirectionalClickHouse

AI Prompts:-
Prompts Used for Development

 Initial Prompt
 "Integration Assignment: Bidirectional ClickHouse & Flat File Data Ingestion Tool - Create a web-based
application with a simple user interface that facilitates data ingestion between a ClickHouse database
and Flat Files. Support bidirectional data flow, JWT token authentication, column selection, and
record count reporting."

 Detailed Project Structure Planning

 "Create a project structure for a Flask web application that will handle bidirectional data transfer
between ClickHouse and flat files, with user authentication via JWT tokens."

 ClickHouse Client Implementation

 "Implement a Python class for interacting with ClickHouse database that supports JWT token
authentication, retrieving table schema, and efficient data ingestion."

 Flat File Handling Implementation

 "Create a Python class for handling flat file operations including reading CSV files with custom
delimiters, column selection, and saving data to files."

 Frontend UI Design
 "Design a responsive HTML/CSS/JS user interface for a data ingestion tool that allows users to select
source/target, configure connections, select columns, and view progress/results."

 API Endpoint Implementation

 "Implement Flask API endpoints for connecting to ClickHouse, uploading flat files, previewing data,
and handling bidirectional data ingestion processes."

 JavaScript Client-Side Logic

 "Write JavaScript code to handle form submission, API calls, UI state management, and data
visualization for a data ingestion web application."

 Testing and Documentation

 "Create comprehensive README documentation explaining how to install, configure and use a
ClickHouse and Flat File data ingestion tool, including examples and testing instructions."

Readme.md: -

# ClickHouse & Flat File Data Ingestion Tool

A web-based application for bidirectional data ingestion between ClickHouse

databases and flat files.

## Features

- Bidirectional data flow:

- ClickHouse to Flat File
- Flat File to ClickHouse
- JWT token-based authentication for ClickHouse
- Column selection for ingestion
- Data preview
- Record count reporting
- Progress tracking

## Requirements

- Python 3.7+
- Flask
- clickhouse-driver
- pandas
- Other dependencies listed in requirements.txt

## Installation

1. Clone the repository:

```
git clone https://github.com/yourusername/clickhouse-flat-file-tool.git
cd clickhouse-flat-file-tool
```

2. Create a virtual environment (optional but recommended):

```
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```

3. Install dependencies:
```
pip install -r requirements.txt
```
## Running the Application

1. Start the Flask application:

```
python app.py
```

2. Open your browser and navigate to:

```
http://localhost:5000
```

## Usage

### ClickHouse to Flat File

1. Select "ClickHouse" as the source and "Flat File" as the target

2. Enter ClickHouse connection details (Host, Port, Database, User, JWT Token)
3. Click "Connect" to establish connection
4. Select a table from the dropdown
5. Click "Load Columns" to view available columns
6. Select columns to ingest
7. (Optional) Click "Preview Data" to see a sample
8. Enter output file name and delimiter for the flat file
9. Click "Start Ingestion" to begin the data transfer
10. View the results including record count

### Flat File to ClickHouse

1. Select "Flat File" as the source and "ClickHouse" as the target

2. Upload a flat file and specify its delimiter
3. Select columns to ingest
4. (Optional) Click "Preview Data" to see a sample
5. Enter ClickHouse connection details if not already connected
6. Enter target table name in ClickHouse
7. Click "Start Ingestion" to begin the data transfer
8. View the results including record count

## Testing

The application can be tested with example ClickHouse datasets:

- uk_price_paid
- ontime

For more information on these datasets, visit:

https://clickhouse.com/docs/getting-started/example-datasets

## Project Structure

```
clickhouse-flat-file-tool/
│
├── app/ # Application package
│ ├── init .py # Flask app initialization
│ ├── main.py # Main routes & API endpoints
│ ├── models/ # Data models
│ │ ├── clickhouse_client.py # ClickHouse client
│ │ └── flat_file.py # Flat file handling
│ ├── static/ # Static assets
│ │ └── main.js # Frontend JavaScript
│ ├── templates/ # HTML templates
│ │ └── index.html # Main UI
│ └── uploads/ # Directory for uploaded files
│
├── app.py # Application entry point
├── requirements.txt # Python dependencies
└── README.md # This file
```

## Notes

- The application creates a directory `app/uploads` to store uploaded and generated

files
- For ClickHouse JWT authentication, provide a valid JWT token
- When ingesting to ClickHouse, tables will be created if they don't exist

IIB Tutorials
100% (4)
IIB Tutorials
68 pages
Macroeconomics 7th Edition Hubbard Test Bank instant download
100% (6)
Macroeconomics 7th Edition Hubbard Test Bank instant download
52 pages
2. Imc113_individual Assignment (Microsoft Power Point)
No ratings yet
2. Imc113_individual Assignment (Microsoft Power Point)
1 page
Illinois Utility Bill PDF
No ratings yet
Illinois Utility Bill PDF
1 page
Acms For Writing Applications PDF
No ratings yet
Acms For Writing Applications PDF
432 pages
Ini File PDF
No ratings yet
Ini File PDF
37 pages
Alcatel NMC Proxy Agent Mib
No ratings yet
Alcatel NMC Proxy Agent Mib
7 pages
Nzitf Velociraptor
No ratings yet
Nzitf Velociraptor
108 pages
19 November
No ratings yet
19 November
3 pages
CMR 2017 Chapter 4
No ratings yet
CMR 2017 Chapter 4
14 pages
FOML ChangedLogs
No ratings yet
FOML ChangedLogs
95 pages
ALARM & WARNING HAGLUNDS in HMI
No ratings yet
ALARM & WARNING HAGLUNDS in HMI
2 pages
Perl Programming: Presented by Mark Bixby Solution Symposium 2001
100% (1)
Perl Programming: Presented by Mark Bixby Solution Symposium 2001
39 pages
vr642d Operating Manual
No ratings yet
vr642d Operating Manual
78 pages
005 WMS JI MI MAU ACS WB XI 22 Work Methode Statement of Antitermite
No ratings yet
005 WMS JI MI MAU ACS WB XI 22 Work Methode Statement of Antitermite
5 pages
SOX Finance Audit Report
No ratings yet
SOX Finance Audit Report
1 page
HMMPY Doc
No ratings yet
HMMPY Doc
14 pages
Asus 1005ha r1.1 Schematics
No ratings yet
Asus 1005ha r1.1 Schematics
49 pages
OpenMP API Specification 5 1
No ratings yet
OpenMP API Specification 5 1
711 pages
Tugas Mike P5-3A
No ratings yet
Tugas Mike P5-3A
6 pages
Software Engineer Intern Assignment
No ratings yet
Software Engineer Intern Assignment
7 pages
Anand Rathi PROJECT
No ratings yet
Anand Rathi PROJECT
64 pages
Logcat
No ratings yet
Logcat
17 pages
Hllapi Macro
No ratings yet
Hllapi Macro
7 pages
Package Readr': October 1, 2020
No ratings yet
Package Readr': October 1, 2020
51 pages
Survey Solutions Application
No ratings yet
Survey Solutions Application
40 pages
Extension Package Programmers Manual
No ratings yet
Extension Package Programmers Manual
147 pages
W3C.xml Path Language (XPath) 2
No ratings yet
W3C.xml Path Language (XPath) 2
105 pages
Employee Data JSON Payload Documentation
No ratings yet
Employee Data JSON Payload Documentation
23 pages
Spectrum Analyzer FSL: Specifications
No ratings yet
Spectrum Analyzer FSL: Specifications
12 pages
A Fish Feeding Robot Prototype Using A Solar Panel and An Internet of Things - Based Monitoring System
No ratings yet
A Fish Feeding Robot Prototype Using A Solar Panel and An Internet of Things - Based Monitoring System
8 pages
Unit Seven PDF
No ratings yet
Unit Seven PDF
16 pages
Rakudo
No ratings yet
Rakudo
17 pages
Perl
No ratings yet
Perl
107 pages
T Fs Administration Certification
No ratings yet
T Fs Administration Certification
39 pages
35598H-00-D-0160-21 - 0 Weld Control Rack
No ratings yet
35598H-00-D-0160-21 - 0 Weld Control Rack
10 pages
CM Introduction
No ratings yet
CM Introduction
43 pages
CQP Tutorial PDF
No ratings yet
CQP Tutorial PDF
54 pages
CQP Tutorial
No ratings yet
CQP Tutorial
67 pages
Calculation Sheet: 71197 Design For Pile Cap - Two Piles PP DJM MVL
50% (2)
Calculation Sheet: 71197 Design For Pile Cap - Two Piles PP DJM MVL
3 pages
Extension Package Programmers Manual
No ratings yet
Extension Package Programmers Manual
149 pages
Ferroelectric Ceramics: Properties, Processing and Applications
100% (1)
Ferroelectric Ceramics: Properties, Processing and Applications
55 pages
TDS¤37942¤Barrier 80 S¤Euk¤GB
No ratings yet
TDS¤37942¤Barrier 80 S¤Euk¤GB
5 pages
HP LaserJet Pro M403 Series Data Sheet
No ratings yet
HP LaserJet Pro M403 Series Data Sheet
4 pages
1888 Lives of Benjamin Harrison & Levi P. Morton
No ratings yet
1888 Lives of Benjamin Harrison & Levi P. Morton
496 pages
Perl Language
100% (1)
Perl Language
141 pages
CQP Query Language Tutorial
No ratings yet
CQP Query Language Tutorial
49 pages
MDP Tutorial
No ratings yet
MDP Tutorial
104 pages
127 PALE Hierro v. Nava
No ratings yet
127 PALE Hierro v. Nava
2 pages
Readr
No ratings yet
Readr
34 pages
Python 201: Intermediate Python (Michael Driscoll)
No ratings yet
Python 201: Intermediate Python (Michael Driscoll)
30 pages
PERDEV - Q1 - Mod7 - Emotional Intelligence
86% (56)
PERDEV - Q1 - Mod7 - Emotional Intelligence
26 pages
Boston Deploying Perl6
No ratings yet
Boston Deploying Perl6
149 pages
The Future of Work in a Rapidly Changing World
No ratings yet
The Future of Work in a Rapidly Changing World
3 pages
Mpi2 Report
No ratings yet
Mpi2 Report
370 pages
Biological Indicators of Soil Health
No ratings yet
Biological Indicators of Soil Health
462 pages
Law of Commerce PDF
No ratings yet
Law of Commerce PDF
124 pages
Into To Nasdaq
No ratings yet
Into To Nasdaq
17 pages
Fa Batch
100% (1)
Fa Batch
190 pages
M.B Chapter 11-2
No ratings yet
M.B Chapter 11-2
27 pages
ArangoDB Technical Reference Guide: Definitive Reference for Developers and Engineers
From Everand
ArangoDB Technical Reference Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CSS Grid Layout
From Everand
CSS Grid Layout
Abdelfattah Ragab
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
(MCTS) Microsoft BizTalk Server (70595) Certification and Assessment Guide: Second Edition
From Everand
(MCTS) Microsoft BizTalk Server (70595) Certification and Assessment Guide: Second Edition
Kent Weare
No ratings yet
NgRx SignalStore: An effortless solution for state management
From Everand
NgRx SignalStore: An effortless solution for state management
Abdelfattah Ragab
No ratings yet
Expert PHP 5 Tools
From Everand
Expert PHP 5 Tools
Dirk Merkel
4/5 (5)
Mastering JavaScript Single Page Application Development
From Everand
Mastering JavaScript Single Page Application Development
Philip Klauzinski
No ratings yet
JDK Tutorials - Herong's Tutorial Examples
From Everand
JDK Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Microsoft BizTalk Server 2010 Patterns
From Everand
Microsoft BizTalk Server 2010 Patterns
Dan Rosanova
2/5 (1)
Mastering Swift
From Everand
Mastering Swift
Jon Hoffman
No ratings yet
Ruby Gems Mastery: 100 Essential Packages for 2024
From Everand
Ruby Gems Mastery: 100 Essential Packages for 2024
Kanto
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Tomcat 6 Developer's Guide
From Everand
Tomcat 6 Developer's Guide
Damodar Chetty
4/5 (1)
Backbase 4 RIA Development
From Everand
Backbase 4 RIA Development
Ghica van Emde Boas
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
IBM WebSphere Application Server 8.0 Administration Guide
From Everand
IBM WebSphere Application Server 8.0 Administration Guide
Steve Robinson
No ratings yet
Mastering MariaDB
From Everand
Mastering MariaDB
Razzoli Federico
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Proxmox - Second Edition
From Everand
Mastering Proxmox - Second Edition
Wasim Ahmed
No ratings yet
XProc 3.0 Programmer Reference
From Everand
XProc 3.0 Programmer Reference
Erik Siegel
No ratings yet
CommonMark Ready Reference
From Everand
CommonMark Ready Reference
V. Subhash
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Oracle SOA Suite 11g Administrator's Handbook
From Everand
Oracle SOA Suite 11g Administrator's Handbook
Ahmed Aboulnaga
No ratings yet
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
From Everand
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
Jane Eslinger
No ratings yet
JBoss Tools 3 Developers Guide
From Everand
JBoss Tools 3 Developers Guide
Anghel Leonard
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet

Prakhar-Software Engineer Intern

Uploaded by

Prakhar-Software Engineer Intern

Uploaded by

Assignment: Enhance Wrangler with Byte Size and Time

Duration Units Parsers

2. "Best practices for implementing aggregate directives in CDAP Wrangler"

3. "Java code to convert between different byte size units"

4. "Example of time duration parsing in Java"

GitHub Repository:- https://github.com/prakharrrrrrsingh/Wrangler-ps

1. Parser rules for:

 Recipe structure with statements

 Directives with various parameter types

 Control structures (if-else statements)

 Expressions and code blocks

 Macros and pragmas

 Property lists and value definitions

2. Lexer rules for:

 Basic tokens (braces, operators, punctuation)

 Data types (Bool, Number, String)

 Identifiers and columns

 Comments and whitespace

 Special values for byte sizes and time durations

Changes made according to the assignment

A collection of libraries, a pipeline plugin, and a CDAP service for performing

* A new capability that allows CDAP Administrators to **restrict the

## Demo Videos and Recipes

* [SCREENCAST] [Creating Lookup Dataset and

* [Parsing Apache Log Files](wrangler-demos/parsing-apache-log-files.md)

These directives are currently available:

GitHub Repository: - https://github.com/prakharrrrrrsingh/BidirectionalClickHouse

 Detailed Project Structure Planning

 ClickHouse Client Implementation

 Flat File Handling Implementation

 API Endpoint Implementation

 JavaScript Client-Side Logic

 Testing and Documentation

# ClickHouse & Flat File Data Ingestion Tool

A web-based application for bidirectional data ingestion between ClickHouse

- Bidirectional data flow:

1. Clone the repository:

2. Create a virtual environment (optional but recommended):

1. Start the Flask application:

2. Open your browser and navigate to:

### ClickHouse to Flat File

1. Select "ClickHouse" as the source and "Flat File" as the target

### Flat File to ClickHouse

1. Select "Flat File" as the source and "ClickHouse" as the target

The application can be tested with example ClickHouse datasets:

For more information on these datasets, visit:

- The application creates a directory `app/uploads` to store uploaded and generated

You might also like