Understanding Different Types of File Formats-En
Understanding Different Types of File Formats-En
formats.
It is important to understand the underlying structure of file formats along with
their
benefits and limitations.
This understanding will support you to make the right decisions on the formats best
suited
for your data and performance needs.
Some of the standard file formats that we will cover in this video include:
Delimited text file formats,
Microsoft Excel Open XML Spreadsheet, or XLSX
Extensible Markup Language, or XML,
Portable Document Format, or PDF,
JavaScript Object Notation, or JSON,
Delimited text files are text files used to store data as text in which each line,
or
row, has values separated by a delimiter;
where a delimiter is a sequence of one or more characters for specifying the
boundary
between independent entities or values.
Any character can be used to separate the values, but most common delimiters are
the
comma, tab, colon, vertical bar, and space.
Comma-separated values (or CSVs) and tab-separated values (or TSVs) are the most
commonly used
file types in this category.
In CSVs, the delimiter is a comma while in TSVs, the delimiter is a tab.
When literal commas are present in text data and therefore cannot be used as
delimiters,
TSVs serve as an alternative to CSV format.
Tab stops are infrequent in running text.
Each row, or horizontal line, in the text file has a set of values separated by the
delimiter, and represents a record.
The first row works as a column header, where each column can have a different type
of data.
For example, a column can be of date type, while another can be a string or integer
type
data.
Delimited files allow field values of any length and are considered a standard
format
for providing straightforward information schema.
They can be processed by almost all existing applications.
Delimiters also represent one of various means to specify boundaries in a data
stream.
Microsoft Excel Open XML Spreadsheet, or XLSX, is a Microsoft Excel Open XML file
format
that falls under the spreadsheet file format.
It is an XML-based file format created by Microsoft.
In an .XLSX, also known as a workbook, there can be multiple worksheets.
And each worksheet is organized into rows and columns, at the intersection of which
is the cell.
Each cell contains data.
XLSX uses the open file format, which means it is generally accessible to most
other applications.
It can use and save all functions available in Excel and is also known to be one of
the
more secure file formats as it cannot save malicious code.
Extensible Markup Language, or XML, is a markup language with set rules for
encoding data.
The XML file format is both readable by humans and machines.
It is a self-descriptive language designed for sending information over the
internet.
XML is similar to HTML in some respects, but also has differences.
For example, an .XML does not use predefined tags like .HTML does.
XML is platform independent and programming language independent and therefore
simplifies
data sharing between various systems.
Portable Document Format, or PDF, is a file format developed by Adobe to present
documents
independent of application software, hardware, and operating systems, which means
it can
be viewed the same way on any device.
This format is frequently used in legal and financial documents and can also be
used to
fill in data such as for forms.
JavaScript Object Notation, or JSON, is a text-based open standard designed for
transmitting
structured data over the web.
The file format is a language-independent data format that can be read in any
programming
language.
JSON is easy to use, is compatible with a wide range of browsers, and is considered
as one of the best tools for sharing data of any size and type, even audio and
video.
That is one reason, many APIs and Web Services return data as JSON.
In this video, we looked at some popular file and data formats.
In the next video, we will learn about the different sources of data.