0% found this document useful (0 votes)
21 views7 pages

Cola Grafos RA34

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Cola Grafos RA34

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

HUFFMAN CODING

Among different types of graph, we have a special one called Tree


• A Tree is a connected graph without cycles where there is only one path between any pair of nodes
• Tree has n - 1 edges // Each edge is a bridge (i.e., its removal increases the number of components) // Adding any edge will form
a cycle, and T will no longer be a tree // T is planar

A very popular tree is the Binary Tree : Each node can have at most 2 children

An interesting application of Binary Tree is for data compression using Huffman Coding

Instead to use 8 bits to represent any character of a text, the idea is to create a new code structure where less bits are used to
represent more frequent characters and more bits are used to represent less frequent characters

1. Count the frequency of the symbols in the text and sort it by frequency 2. Build a binary tree called Huffman Tree grouping the
symbols by frequency 3. Traversal the Huffman Tree using Depth-First Search to build a code dictionary
4. Encode the text using the code dictionary

• To build the tree, we use a priority queue with symbol and frequency // In a loop, we remove from the queue the first two
symbols with lower frequency // Sum the frequencies of both symbols in a parent node, putting the symbol with lower frequency
in the left node and the symbol with higher frequency in the right node // Add the parent node with the resulting sum into the
queue // Stop when we have only one element in the queue

Traversal the tree using Depth-First Search, adding a bit 0 when it goes to a left node or a bit 1 when it goes to a right node until
reach a leaf node

How to encode the text using the code dictionary? We just need to replace the symbols by the codes from dictionary

Compression Ratio = Uncompressed size / Compressed Size Space Saving = 1 – Compressed Size / Uncompressed Size

Text = Huffman gosta de batata e de estudar (36 char) X 8bits = 288 bits Encoded text = 130 bits
Topological Sort
• Topological sort is a method of arranging the nodes of a Directed Acyclic Graph (DAG), as a sequence, such that no vertex
appears in the sequence before its predecessor, i.e., if G contains an edge (u, v), then the node u appears before node v in the
sequence

• It is a linear arrangement of the graph in which all the edges point from left to right TASKS DONE BEFORE OTHERS

• Management of software development projects // Performing some tasks or implementing some routines depends on others
being completed
• Process scheduling in operating system
• Compile a software

Kahn's Algorithm – Based on nodes indegrees


Repeat the following until the graph is empty:
1. Identify a node with indegree = 0 2. Add that node to the list of sorted nodes 3. Remove the node and its edges from the
graph // Return the list as result of the topological sort

Kahn ́s algorithm seems to be an easy solution to obtain the topological sort and to detect cycles in graphs However, what is its
main limitation? To detect cycles, it works only on directed graphs since it depends on indegree information

Tarjan's Algorithm – Based on DFS


Based on Depth-First Search (DFS)
1. Visit all nodes of a graph using DFS 2. Whenever a node is "finished” (i.e., requires backtracking), immediately add it to the
stack 3. Return the stack as the topological sort, from top to bottom.

What happens now that we add the edge (H, G)? Algorithm stops since it finds a visiting node as adjacent of another visiting node.
We can conclude that the topological sort is not possible because the graph contains a cycle

Besides to sort the nodes in a topological order, we can also use Kahn ́s Algorithm to detect cycles in directed graphs and Tarjan ́s
Algorithm to detect cycles in directed or undirected graphs

Strongly Connected Components (SCCs)


A connected graph is an undirected graph in which there is a path from any point to any other point in the graph. A graph that is
not connected is said to be disconnected // A connected component (or just component) of an undirected graph is a subgraph in
which each pair of nodes relates to each other via a path

A directed graph is said strongly connected if there is a path in each direction between each pair of nodes of the graph
A graph could not be strongly connected, but it can contain a subgraph that is. Such subgraph is called strongly connected
component // Formally, a strongly connected component of a digraph G is a subset of vertices C ⊆ G such that for every pair of
vertices from C, there is a path from u to v and from v to u.

Kosaraju ́s Algorithm – use DFS to find SCCs


1. Run DFS on G, storing the [visiting/finished] times of every node 2. Compute Gt (the transpose graph)
3. Run DFS on Gt considering the decreasing order of finished time of G 4. Return the groups obtained by each run of DFS in
the last step
Spanning Tree
• Every undirected and connected graph has at least one or more trees as subgraph that contains all its nodes, but a minimum
number of edges // A Spanning Tree of a graph G is a connected and acyclic subgraph (i.e., a tree) that contains all nodes of G
and a subset of V-1 edges of G // A graph can have multiple Spanning Trees

As we note, a same graph can have multiple Spanning Trees In general, we have interest in two of them for weighted graphs:

Minimum Spanning Tree is the Spanning Tree with the lower cost (considering the sum of all edges) between all possible trees of a
weighted graph Most useful for most problems

Maximum Spanning Tree is the Spanning Tree with the higher cost between all possible trees of an unweighted graph

Prim ́s Algorithm – Main idea MST


1 - Choose a random node of your Graph G as starting node of the Spanning Tree (ST) 2 - While all nodes of G are not included in
the ST: 3 - Find the edges that connect the ST nodes to the other nodes that have not yet been included in the ST 4 - Choose
the edge with the minimum weight 5 - Add the edge and its node to the ST

Kruskal ́s Algorithm – MST


1. Start with an empty tree with only nodes 2. Sort all the edges in increasing order based on their weight 3. Pick the
smallest edge. Check if it forms a cycle with the tree stored so far. If the cycle is not formed, include this edge. Else, discard it.
4. Repeat step 3 until there are (V-1) edges in the spanning tree In which V is the number of vertices

To find maximum spanning tree with kruskals algorithm, just sort edges os DECREASING weight order.
Mathematical models to build graphs
• These models are called Graph Formation Models or Complex Network Models • Models are used to simulate a network or its
process of formation that represents a problem that we do not have enough data or data access • We can study and understand a
problem or related-problems based on analysis of these simulated networks

Common properties of real-world complex networks


• Communities = Clusters with densely connected nodes that sparsely connected with others densely connected nodes
• Resilience = Low impact when one or more nodes/edges are removed
• Small-world phenomenon = Six degrees of separation
• Node degree distribution according to a power law

Erdös number describes the "collaborative distance" between the Hungarian mathematician Paul Erdös (wrote more than 1,475
papers, including some related to Graph Theory) and another researcher measured by paper ́s authorship

Degree distribution according to a power law


Gaussian distribution (for continuous values) / / Poisson distribution (for discrete values)
Many natural phenomena can be described by a normal distribution in which most of values are close to a mean However,
contrary to what was expected, it was noted that many networks representing natural phenomena have a
different distribution related to nodes degree // Pareto principle (80-20 rule)

Complex Network Models


Regular model - All the nodes have the same degree (k)

Random model (Erdös-Rényi) – Set probability of connection between nodes

GNP = ((n(n-1)) / 2 ) * p = number of edges

GNM define a fixed number of nodes (N) and a fixed number of edges (M) Choose the M connections picking the nodes randomly
(without loop and parallel edges)

Both GN,p and GN,M models may be inappropriate for modeling certain real-life phenomena • Absence of communities • Node
degree distribution do not follow a power law distribution • In general, follow a Poisson distribution but it is not scale-free • Low
clustering coefficient (we will understand how to compute it in the next class) • In real complex networks, the neighbors of a node
have a high probability of connecting between them

Small-world model (Watts-Strogatz)


• The graph starts with N disconnected nodes • Connects each node with the next k close nodes (regular graph) in an initial graph
• With probability p, rewire every edge to a random destination
Three parameters: N – Number of nodes k – Nodes ́ degree of the initial graph p – Probability of reconnect one of the
ends of an edge

Scale-free model (Barabási-Albert)

The network starts with a small initial number of n0 nodes and edges with random connections • At each time step, new nodes are
added with k edges that connect to the nodes already present in the network and where k ≤ n0 • To make the connections of the
new nodes with the old ones, a preferential attachment rule is employed • The probability of connecting a node i to a node j is
proportional to the degree of j • This process is repeated until the network reaches up the number of
desired nodes or after t time steps
Analysis of Complex Network
• Path analysis: Examines the relationships between nodes through the paths. Mostly used in shortest distance problems
Measures based on distance of shortest paths: Eccentricity // Diameter // Radius // Average Geodesic Distance

The eccentricity of a node u is the maximum distance between u to all other nodes of the graph (or its longest shortest path) and
denoted by e(u).

The diameter of a graph is defined as the maximum value of eccentricity observed for all nodes

Radius is Similar to diameter, but we consider the minimum value of eccentricity observed for all nodes

The average geodesic distance is the relation between the sum of the minimum paths of all possible pairs of nodes and the
𝑑(𝑖,𝑗)
maximum number of possible edges => ∑𝑖,𝑗∈𝑉 • The smaller the average geodesic distance, the more efficient
𝑁(𝑁−1)/2
𝑖≠𝑗
the network is in terms of number of hops from one node to reach others • In other words, the information is propagated faster in
the network

• Centrality analysis: Estimates how important a node is for the connectivity of the network. It helps to estimate the most
influential people in a social network or most frequently accessed web pages
Importance: If an incident occurs at DEN, it propagates worldwide much faster than an incident at CWB What is the global impact
when the Google or PUCPR website goes down due to a cyber-attack? As when a highly connected node on the Internet breaks
down, the disruption of p53 has severe consequences

Degree centrality - most central node is that one with the highest number of connections / total vertices – 1 // LIMITATIONS • We
can have nodes with higher degree (hubs), but peripheral (peripheral hubs) • The propagation of an information over the entire
network by these nodes is limited or slow • It is a local measure that considers only its neighborhood

Coreness centrality (graph k-core) - • To compute the Coreness centrality, we need to obtain the k-core graph • The k-core is a
subgraph in which all nodes have a degree at least k • The k-core is obtained by iteratively removing all nodes whose degree is
smaller than k. After removing these nodes, the network is re-analyzed to verify whether there are nodes with less than k
connections. If such nodes are present, then they are also removed • The process is repeated until the minimum degree in the
network is k ///////////// LIMITATIONS • In some cases, we can have many nodes with similar number of connections making
difficult to really identify the most "important" nodes • As well as degree centrality, it is a local measure that do not consider the
global position of node in the network structure • Discards nodes who serve as bridges between communities

Closeness centrality - A node is considered central by this measure if it is close to most of the other nodes of the network
• Nodes with a smaller average distance to reach all other nodes receive a value close to 1 for the closeness centrality • These
nodes are considered important because the information transmitted by it reaches to the other nodes faster

CALC UNDIRECTED: N-1 / (sum of other nodes degrees) CALC DIRECTED: Sum of 1/degrees of every node it is connected to / (N-1)

Betweenness centrality - • A node is considered central if it is part of most of the shortest paths of all possible pairs of nodes in
which it is not at the beginning or end of the path • The removal of a node with high betweenness centrality will significantly
impact the information flow in the network since it is part of most of shortest paths

CALC: SUM( number of shortest paths from u to v that pass through i) / (number of shortest paths from u to v // standardize by
dividing by ((N-1)(N-2)/2) = Max number of paths

• Community analysis: Distance and density of relationships can be used to find groups of people interacting frequently with
each other in a social network or to understand the patterns of a group over time
Clustering coeficiente
Clustering coefficient is a measure that indicates how much the nodes of a graph/network tend to form groups In general, we call
these groups as clusters. In social networks, it is usual to say communities Many evidence suggests that in most real-world
networks, nodes tend to create high density connected groups that are sparsely connected to other groups.

The global clustering coefficient of a graph is based on triplets of nodes A triplet is an ordered set of three nodes that are
connected by two undirected edges (open triplet) or three undirected edges (closed triplet) A triangle graph therefore includes
three closed triplets, one centered on each of the nodes The global clustering coefficient C is defined as:
𝐶 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 / 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 (𝑜𝑝𝑒𝑛 𝑎𝑛𝑑 𝑐𝑙𝑜𝑠𝑒𝑑)

The local clustering coefficient Ci of a given node i is defined according to the equation:
ki is the degree of node I // L i is number of links among the neighbors of node i

average of the local clustering coefficients / Sum of all / divide by N

REVISÂO CHATGPT
2. Algoritmos de Kahn e Tarjan

Os algoritmos de Kahn e Tarjan foram desenvolvidos para resolver o problema de ordenação topológica em grafos direcionados
acíclicos (DAGs).

• Algoritmo de Kahn: Funciona removendo nós sem predecessores iterativamente e construindo a ordem topológica.

• Algoritmo de Tarjan: Usa uma busca em profundidade (DFS) para encontrar a ordem topológica.

Ambos podem ser usados para resolver problemas de detecção de ciclos e planejamento de tarefas com dependências.

3. Características de Árvores

Uma árvore é um tipo de grafo com as seguintes características:

1. Conectividade: Qualquer dois nós estão conectados por exatamente um caminho.

2. Arestas: Uma árvore com nnn nós sempre possui n−1n-1n−1 arestas.

3. Acilicidade: Não possui ciclos.

4. Único Caminho: Existe exatamente um caminho entre qualquer par de nós.

4. Árvores Geradoras Mínima e Máxima

a. Definição:

• Árvore Geradora: Subgrafo que conecta todos os vértices do grafo original com o menor número possível de arestas.

• Árvore Geradora Mínima (AGM): Árvore geradora com o menor peso total.

• Árvore Geradora Máxima: Árvore geradora com o maior peso total.

b. Algoritmo para AGM:

Algoritmo de Kruskal:

1. Ordena todas as arestas por peso.


2. Adiciona as arestas em ordem, desde que não formem ciclos, até conectar todos os nós.

Algoritmo de Prim:

1. Começa com um nó e adiciona arestas de menor peso que conectem nós fora da árvore.

2. Repete até todos os nós estarem conectados.

c. Problema Real:

Um exemplo é o planejamento de redes elétricas, onde a AGM minimiza o custo de instalação das linhas de transmissão.

6. Inferno de Dependências

A teoria dos grafos, especialmente a ordenação topológica em DAGs, pode auxiliar na instalação ordenada de pacotes e
bibliotecas.

7. Algoritmo para Inferno de Dependências

Ordenação Topológica:

1. Identifica-se todos os nós sem predecessores e adiciona-os a uma lista.

2. Remove-se um nó da lista, adiciona-se à ordem topológica e reduz as dependências dos sucessores.

3. Repete-se até que todos os nós estejam ordenados.

9. Propriedades de Redes Complexas

1. Distribuição de Grau: Em muitas redes complexas, poucos nós têm muitos vizinhos, enquanto a maioria tem poucos.

o Exemplo: Redes sociais, onde influenciadores têm muitas conexões.

2. Pequeno Mundo: A maioria dos nós pode ser alcançada a partir de qualquer outro nó por um pequeno número de passos.

o Exemplo: Redes de colaboração científica.

3. Alta Clusterização: Nós tendem a criar grupos altamente interconectados.

o Exemplo: Redes de amizade, onde amigos de um indivíduo são frequentemente amigos entre si.

4. Robustez e Fragilidade: Redes complexas são robustas contra falhas aleatórias, mas frágeis a ataques direcionados.

o Exemplo: Infraestruturas de internet.

10. Modelo de Formação de Redes Complexas

Um modelo de formação de redes complexas descreve como redes reais podem crescer e evoluir ao longo do tempo.

11. Modelos de Formação de Redes Complexas

1. Modelo de Erdős–Rényi: Nós são conectados aleatoriamente.

2. Modelo de Barabási–Albert: Preferência por ligação, onde novos nós tendem a se conectar aos nós mais conectados.

3. Modelo Watts-Strogatz: Introduz aleatoriedade em uma rede regular para criar propriedades de "pequeno mundo".

4. Modelo de Configuração: Preserva a distribuição de grau da rede.

12. Modelo para Voos entre Aeroportos

O modelo de Barabási–Albert é mais adequado, pois aeroportos maiores tendem a ter mais conexões, representando a preferência
por ligação.

You might also like